Back to overview
06. March 2021

Data Management vs. GDPR: Solving the Right to Be Forgotten Problem

GDPR has dramatically shifted the way we should be thinking about data management. Specifically, the Right to Be Forgotten poses an interesting challenge for your data archives and backups.

GDPR has dramatically shifted the way we should be thinking about data management. Specifically, the GDPR Right to Be Forgotten poses an interesting challenge for your GDPR backups and data archives.

GDPR is not the only regulation that pokes a new potential hole in your data protection strategy. Many states and even the federal government are looking at similar regulations. The time has come to plan for data privacy in our data management. You need to answer questions like “does the GDPR apply to backup and or archived data?”

In “Data Management vs GDPR and Data Privacy – How to Solve the Right to Be Forgotten Problem” George and Victoria discuss the new problems caused by GDPR for your data management.

Transcript: Solving GDPR’s Right to be Forgotten Problem

George: Hello and welcome. I’m George Crump, Lead Analyst for Storage Switzerland. Thank you for joining us today. Today, we’re gonna be talking about data management and GDPR and data privacy. And specifically, we’re going to focus in on how to solve the Right to be Forgotten problem, which is something that comes up a lot in these conversations. Specifically, there’s components of these regulations that allow users to request that they be removed, and that creates a particular problem when we talk about backups and archives and things like that. Joining me on the webinar is Vicki Grey. Vicki is with Aparavi. Vicki, thanks for joining us today.

Vicki: Sure. Good to talk to you, George.

George: Glad to have you. Hey, before we jump too far in the presentation, you want to give a quick 30 seconds on what Aparavi does?

Vicki: Absolutely. Happy to be here to be talking about GDPR and data privacy. That is actually something that we can address, but it’s not the primary focus of what brought us to market. Aparavi is intelligent multi-cloud data management, which can index, classify, retain, and retrieve unstructured data for long-term retention and everything that goes around that. And, of course, I’ll be talking a lot more about it, but that’s the short answer.

George: So basically, how to find all the stuff that you’ve been keeping?

Vicki: Yes, right. And as we talk to people, and you were just telling me this the other day, how many people keep everything?

George: Exactly. Yeah.

Vicki: There’s a ton of clients out there that never get rid of anything. Well, how do you organize that and retain it for all kinds of reasons, and then find it when you need it? We solve that problem.

George: EAwesome. All right, so let’s kind of jump into the meat of the presentation here. I think the big thing that I’d like to get across, Vicki, is this is just not uniquely a European problem, right? That we’re seeing these types of…there’s a couple of things that I’m seeing happening. Number one, the awareness on the part of the consumer, I think, is increasing exponentially. I think we’ve all been kind of a little bit naïve and just assumed that people will do the right things with our data and things like that. It seems like Facebook gets hammered every other day for doing something wrong.

And so I think that from a consumer perspective, we’re starting to see people become really in tune to their data and data privacy. And then, of course, I think, organizations are beginning to understand, really, the value of the asset and being able to make sure it’s protected. And like you said, being able to find that stuff. Are you guys seeing similar type of awareness?

Vicki: We are. And, as you know, we’re a California company. And so, it’s become front of mind in California just recently because of the California Consumer Privacy Act that you mentioned here. There’s just over a year before that goes into effect. But it mirrors a lot of what GDPR does. And I understand that even in Congress, there’s some legislation working its way through that may do a similar thing. So even though at the moment, GDPR is in Europe, I think a lot of the world is looking at that and thinking of implementing similar things, just like California already has.

George: Yeah. And I had a friend who is based in Germany kind of teasingly say to me, “Look, at least in Europe these cases are gonna be settled by, essentially, professional judges. In the States, it’s gonna be settled by a jury of your peers.” And that could be a scary thought if you think about it. So it’s…I think the pressure in the U.S. will actually end up being harder, and especially if we if we don’t have federal legislation, imagine having 50 states with 50 translations of what data privacy means. It could be a real mess.

Vicki: Yeah, that could be a real mess. And, actually, isn’t it interesting that California is the first state to implement this? And it’s the home of Google and Facebook and a lot of the others that have been in the news around this issue.

George: Yeah, exactly. Yeah, so I think that clearly we as an IT industry have to really take this seriously and really have to get our arms around it. So let’s talk a little bit about what this is really about. And, you know, it’s interesting, as I started to research, a couple of years ago now, GDPR and what it really meant, you’ve known me a long time and I sort of have a resistance toward anything to do with the government, and I expected the worst. And I’ve gotta tell you that GDPR…I mean, there’s some language I wish they would tighten up and explain what “in a timely manner” means and stuff like that. But really, as a first pass, it’s actually a pretty decent regulation. It’s not crazy talk. It’s really just good data management. What are your thoughts?

Vicki: Well, as you and I were talking just recently, both of us have been around the data protection world for more years than we want to admit. And I think this is a reasonable expectation. I think that, as we’re going to talk about, a number of the companies are gonna struggle, or a number of the vendors are gonna struggle to actually deliver against the challenge, but particularly with what’s been in the news, it’s a reasonable reaction to what’s happened with people’s data. So, as you say, all of us in the IT world have got to get our acts together to be able to deliver what people are demanding.

George: I think if you were to boil any of this stuff down—and this was my reaction when I first read the GDPR document—really, this is just good, solid data management. If you had been doing good, solid data management best practices that have been in place for years, you could be in a good position to follow this. I think the challenge is now, some of these aspects of these regulations really change the focus. It will change the way we protect data and will also change the way we retain data. And so I think that those two aspects are worth exploring a little bit, specifically retention because retention is interesting to me.

Vicki, you and I were talking yesterday: I recently gave a talk on this exact subject. There were about 45 people or so in the room, and I asked, “How many people retain their backup data for more than five years?” And every hand in the room went up. And I don’t think that’s unusual. I think that a lot of people count on backup as retention. And I think, as we’ll talk about, that might not be a good idea anymore, or you need a really, really smart data protection product to be able to pull this off.

And so, I think if you look at a data protection policy, what we’ve always focused on there was really from an IT aspect. It was how long is the organization going to store data? And then, what is a justifiable deletion of data? I think, to what you just said, that what we see increasingly is that data is increasingly considered an asset, and people now want to keep it for either a very long time or forever, which also loosely translates into a very long time.

Like you said, we’ve been around the data protection space for a while. Is that as interesting as well as that people just are retaining data? I mean, the whole policy, to me, now, basically says, “Don’t delete it ever.”

Vicki: Well, yes, and as long as I’ve been doing this, whenever I ask an audience—or individual customers—how long they keep their data, it always shocked me how of them said, “All data, forever.” It’s a huge trend. And I think that years ago, when your volume of unstructured data—and we haven’t really gotten into that yet—but when the amount of unstructured data was a much smaller part of your overall data, you could try to use backup for retention. It wasn’t as much of a stretch as it is today. But if you look at today’s world, with the huge amount of unstructured data vastly taking over the total amount, it’s just untenable to try to use backup for long-term data retention as well. But people are still trying to do it.

George: I totally agree with you. I think that last bullet is really important too, that GDPR requires us to protect data, retain it, and also remove it. And the problem is that we typically are guilty in the industry of treating those three words as three separate processes. That we would protect one way, we will retain another way, and then we will remove a third way. And if you really read what GDPR and these other data privacy laws are really communicating, what they’re asking for will require either an integration of those processes or at least a high level of communication between the processes. You alluded to this earlier: I think a lot of vendors are really going to struggle in pulling those three aspects together. What are your thoughts there?

Vicki: Well, I know you’re going to jump into this a little bit more about the challenges of a lot of the legacy architectures in backup to be able to do this. And it illustrates why you actually have to think about this and your long-term retention of unstructured data in a different format, in a different way, because you won’t be able to meet some of the requirements in GDPR using your classic backup architectures. So let’s get into that, George.

George: Okay. So I think the big problem, because the slide says so, is the Right to be Forgotten. And so, let me explain to our viewers what we’re talking about there. So under GDPR, let’s say John Smith can call up and say, “I’d like you to remove all my data from your system.” And then you have to respond to that request in a timely manner. Now, this is an area where we need some case law, frankly, to understand what is meant by “timely manner,” and what we mean by “all data.”

I’m reading the law to say “all data” means all data. Other people, who are particularly damaged by this, are reading the law to say, “all data” doesn’t mean, for example, data on tapes, which I can’t figure out where they come up with that, but I’m not a lawyer.

What’s interesting about this is it nullifies user agreements. There’s many user agreements that you probably agree to that you haven’t read. And this, for all intents and purposes, gives you the ability to nullify that agreement and force the organization to remove the data. So it’s kind of an interesting aspect. And what’s interesting is, my translation of this is the removal doesn’t have to be instantaneous, meaning that John Smith calls up and 30 seconds later, all his data is gone. But the phrase is “timely manner,” which isn’t clearly defined. But I think we can agree it’s not six months and it’s not 60 seconds. If you have a policy that says a week or a day or something like that, I think that’s generally considered reasonable.

And so, I think that’s areas where this really becomes a challenge, as we start to explore what the impact is from a backup perspective. I have some of those examples there. So if you think about this, put yourself in a position of an administrator receiving this request. And you can, depending on how you use John Smith’s data and what of his data you stored, it’s generally to remove it from primary storage. There’s plenty of tools out there that would search for that. It is important that it would be a separate step, though.

I’m going skip to the last one. It’s relatively straightforward to remove it from an archive because archives, generally, are set up and tagged and we can do things there. But the real problem is, and where we’ve a lot of discussion around, is it’s really hard to remove it from a backup. You know, Vicki, as we’ve said a couple of times now, you and I have a pretty good history in the whole backup space. And backups have always been sort of these job-driven Neanderthals, and so data isn’t stored in a way that’s really logical to a human. They are stored, “Here’s where I backed up Saturday night, and here’s where I backed up Sunday night, and here’s where I backed up Monday night.” And the ability to search across all those different jobs becomes more difficult the older the job gets.

Vicki: Well, and for anybody who’s still using tape, and there’s a lot, think about all your off-site tapes. How do you search those? It’s a very cumbersome procedure.

George: You and I, at the company we actually met at, I can remember that we would advise customers to prune their data, the metadata that was being tracked after a period of time to save storage space because the metadata database got so big, right? And those things still exist and so you’re actually making it more difficult, because there’s also no time limit on when John Smith can call you up. If John Smith calls you up eight years from now and wants all of his eight years of data removed, as I read the regulation, you have to comply. And so imagine trying to find John Smith’s data from eight years ago. It’s just going to be really difficult.

Vicki: And you have to show that you actually did do it.

George: Yes, yes. And that’s kind of an interesting thing to me. How do you prove that you did something that no longer exists? It becomes a real challenge. Proving that something didn’t happen is much harder than proving that something did happen.

So if you look at it again—we kind of touched on this—is most organizations count on backup for data retention. And so I’ll ask you that are viewing this: how long are you keeping your backup data? And if you think about it, just purely from a backup process, we can hit you with survey after survey that shows that 80% to 90% of restore requests come from either the most recent or the next most recent backup. The amount of times you actually pull data from a backup that’s three or four years old is almost immeasurable. So I think part of it is to just rethink that part of the process.

And this, I think, really gets to where this becomes really difficult. In the ’90s when we had a file server, we were on the lookout for file servers with hundreds of thousands of files because it would take a while to back them up and because all backups, at that time, were filed by file. The unstructured data growth, where now it’s incredibly common to see a file server with a million files on it, and especially in big data and IoT environments, we have servers that are pushing a billion files. That unstructured data growth has led most backup vendors, I would say at this point, to do something called an image-level backup. In an image-level backup what we are doing is, essentially, saying, “Look, I’m not gonna inspect what’s in the volume; I’m just going to backup the volume, and I’m going to look for blocks of data that are changing and just pull those blocks over.”

And that creates a problem when we go to look for data, right? Now, most of these vendors—in fact, today I would say comfortably all of them—have the ability to do a single file restore. But in almost every case, you have to know exactly what backup job—there’s that word again—contains the file you’re looking for. It’s very difficult to say, “Give me this version of this file and let me know what jobs hold it.” And most (at least that I know of) applications can’t do that. And so this unstructured data growth is not really necessarily the source of the problem, but it’s really compounding the situation as far as being able to find and identify this data.

Vicki: Yes, it is. And, in fact, what we’ve been seeing besides the millions and millions of discrete files each which need to be handled and managed. The unstructured data growth is unbelievable. So back when, in the early days, when you and I were doing this, structured data typically would represent three-quarters or more of overall data. And most IT managers had a fairly flippant attitude towards the unstructured data, like, “Oh that’s just user data.” Remember that?

George: Yep.

Vicki: That’s flipped on its head now. Mostly analysts are projecting that unstructured data within just a few years here is gonna represent 90% or more of all the data out there. And organizations have already come to recognize that unstructured data has huge value. It’s not just your office documents, but video files, audio files, anything else, for a variety of reasons. Not just compliance, but company history, business intelligence… There’s a huge amount of value there.

George: Yeah, absolutely. And I think that’s a really interesting point. If you think about in the ’90s and even early 2000s, when we talked about data protection, it was, “How fast can I back up Oracle?” I’ve now given presentations in this decade where I say, “Look, if you’re backing up Oracle, if you’re counting on your backup of your Oracle database as your primary recovery thing, you’re doing it wrong.” In most of these applications, we’re doing something else. And so, actually, data protection now should be much more focused on unstructured data because that’s the stuff that it really can manage and maintain.

Now, I’m not saying, by the way, I didn’t officially say don’t back up your Oracle data, but it should be less of a…if you’re going to your backup to recover your Oracle environment, it’s because something really, really bad has happened to replication and high availability that you might have in the situation.

The other trend that we’re tracking, Vicki, which is kind of interesting is we saw a group of vendors come to market from 2005 to 2012-ish that were focused, typically, on a particular environment like maybe VMware or NoSQL or something like that. And what’s interesting there we see is that metadata that these guys track has less value or less detail than the really old guard guys that used to have to manage tapes and things like that. And so the semi-modern solutions, I guess I would call them, have actually gotten worse at the metadata they track. So this all becomes a real problem when we go to recover information and fulfill that, “Hey, John Smith just called up. How do I get his data back?”

So we’re going to talk about two “solutions” that we hear a lot—and I’m using solutions in air quotes-but that we hear a lot from vendors. And, I think, functionally, they work to some level. I think there’s no guarantee that they meet the letter of the law, so to speak, in GDPR. There are some assumptions that they will. So let’s first talk about delete on restore. So what we’re talking about here is the ability to…when a restore request comes in, you keep a list of people that have been forgotten. As the restore is occurring, you’re comparing to that list to see if that person is on that list. If the person is on the list, you essentially restore to device null or don’t restore at all. And then, if they’re not on that list, then you go ahead and restore as normal.

Now, the problem we see with this is a couple of things. First of all, there is no clarification on whether or not you have the right to keep a list. There is some language in these regulations that says that you can maintain some data to protect the business. So the assumption is that that list would be okay. And I would tend to agree with that because I think it’s hard to come up with a way to prove that you’ve deleted somebody if you don’t have a record that they ever requested to be deleted. So, I think there’s some legitimacy there, but clearly there is no case law, at this point anyway, to be totally clear on that.

Probably, the bigger thing that I would be concerned here—and, Vicki, we’ve always talked about speed of restore and how important it is. I think the vendors that talk about this aren’t thinking about, or talking about…a couple of the companies that we have worked with specifically when dealing with GDPR issues are tracking thousands, not dozens, but thousands of users that have requested to be removed. And so what that means on the recovery process is for every single one you’re going to have to do this check a thousand times. And my assumption would be that’s just going crush backup recovery times, right?

Vicki: Well, I mean, if you think about it, you backup at leisure and you restore in a panic.

George: That’s a good point.

Vicki: That’s highly problematic, right? When the business is down for some reason and you’re trying to restore it, you need to be back up and running as fast as possible, which is why the backup vendors have spent so much time trying to ensure that restore is a fast job, and this throws a big wrench into it.

George: I do believe, by the way, that the, legacy backup vendors, this will probably be the path that many of them choose. What we see as the kind of immediate—again I’m using air quotes here—but as the immediate solution is this “restore to sandbox” idea. And to be very frank, this is horrible. So basically, what happens here is you restore to an isolated area of your data center. So there’s risk of exposure because there’s isolation and then there’s isolation. And then, you have to manually check to make sure that the forgotten users’ data isn’t in there as opposed to, with delete on restore, a computer is doing the work.

So the challenge there is what we see is vendors being kind of short-sighted in that because they’re thinking a dozen people have asked to be forgotten. Well, if you’re dealing with thousands of people that have asked to be forgotten, and you have to do this on every single restore, you could throw hours, if not days, into the restore process. Now, I assume, at some level, you could probably script some of this. But, again, then you gotta write all the scripts and you gotta maintain all the scripts. It’s just a very, very problematic situation. And so, I think, that that really becomes a challenge. And again, I go back to the regulation, I don’t know if recovering data that you were supposed to forget to an isolated area works. I don’t know if that follows regulation.

And, I think, Vicki, the big problem that I’m gonna let you comment on, is we don’t even know…these two things assume that it’s okay to keep forgotten users’ data within the backup umbrella itself. And clearly, both regulations are at least, at best, unclear in whether or not that’s a possibility, correct?

Vicki: Well, and we talked about it before. I said that I find that highly problematic that if the regulation says that you have the right to be forgotten, imagine if you requested this and an organization’s response was, “Okay, sure. But we’re gonna keep all your data unless and until there is a restore, and then we’ll promise to get rid of it.” I’m not feeling so good about the fact that you’ve actually forgotten me, right? Because all that data is still out there on all those backup jobs.

George: Right.

Vicki: So I question the fundamental premise behind both of these, that you can actually keep it until there is a restore.

George: Yeah, and I certainly wouldn’t argue that point. I think that’s a reasonable thing to be questioning. The other thing to look at as an option—and this is one, frankly, that at Storage Switzerland, we’ve done just because it was actually before we had met the folks at Aparavi—but it’s replacing backup with archive. So everything we’ve talked about is really a data management function. It is knowing where your data is, what’s in your data, it’s being tagged properly, there’s a rich metadata history being developed. Well, all of that is—well, I shouldn’t say all—but many archive solutions have that ability. And so what we had to do with these is not retain. We actually set the backup retention policy to seven days, and then everything goes to an archive. What we’re counting on is that research data shows that most recoveries happen from last night’s backup. But this still creates a problem. This gets you closer, I think, but the problem that you have is how do we get everything to the archive?

If you look at most archiving solutions, first of all, archiving solutions tends to be very siloed. There’s a group of vendors that offer archiving hardware in the terms of object stores or tape libraries or cloud storage. And then, there’s the archive software guys, well on the other side. And you have to kind of put these things together, which is difficult. The other challenge you have is that archive software vendors are not backup vendors, typically. And so they don’t design their software to move data rapidly to anything, so you reintroduce a performance problem.

The other thing you’ve introduced now is you’re also transferring data twice. You’ve got your backup job running and then you’ve got an archive job running behind it in some form or fashion. And so there’s all kinds of network impact issues, there’s all kinds of concerns around that.

It’s closer, but what we really need, I think, and this is what we’ll kind of wrap up with here, is to integrate the two processes. So, I think, it’s reasonable to say, “Look, let’s do something.” We’re going put structured databases, in particular, applications into their own bucket. We’re going to have replication and high availability and things like that. We might want a copy in the backup for sort of long-term strategy. But, specifically, what we want to focus on is the unstructured data. And so, what we need to actually return to is file by file backup.

But, Vicki, we don’t want to do that and make backup windows go through the roof again, right? And so this has to be much more intelligent than the way we used to do file-by-file backup in the ’90s, right?

Vicki: Yep. But you’re right as well in that you don’t want to have to run two jobs on the same set of data. Wwe have to have a way to integrate, just as you say here, you’ve gotta have the archive capabilities, but you’ve gotta integrate backup into it as well so that you’re not running two processes and two jobs on the same set of data.

George: Right. There’s a saying, this mantra, that’s been around for a while in the industry that, “Backup isn’t archive.” And I would tend to agree with that, and that gets in that seven-year retention thing. But we’ve never said that archive can’t be backup, right? And so what we’re really talking about here is merging the two processes. And at the back-end, you really count on rich metadata, and it really becomes more than archive; it becomes data management. And so that’s the rich metadata tracking, being able to know a lot more about the individual files than what we’ve known in the past.

And then, that point, if you think about it…going back to our John Smith request, removing John Smith from this type of arrangement is relatively straightforward. In fact, it should take seconds if not minutes, right? So it should be a relatively straightforward process to identify that person and remove them from that secondary storage area in addition to the primary storage area.

Vicki, you brought up the point that the problem with delete on restore and restore to a sandbox still may be in violation of the core tenet of the Right to be Forgotten. This is, essentially, totally compliant with it at that point, correct?

Vicki: That’s right. So, yeah, let’s jump into the how we’re looking at this.

George: Just to kind of introduce you, we ran into your guys at about the beginning of this year, after we went through that one project, which was kind of brutal, and I’m like, “This is perfect,” right? And now, obviously, I don’t want people to leave this webinar thinking that Aparavi is solely focused on GDPR, and I know you’ll talk about that. You guys do a lot, but it’s really focused on this multi-cloud data management. So I’ll let you run through your stuff here and I’ll come back in a little bit.

Vicki: Okay, great. Yeah. Thanks, George. So, right, we are intelligent multi-cloud data management. And it just so happens that because of the intelligence that we designed into the product that it turns out it works really well for GDPR and the Right to be Forgotten. But it does more than that, as well.

So let me start with just a little bit about Aparavi, because George said he came upon us early this year. We had our formal launch of the product in May. Before that, we were having some early adopters try out the platform and work with them to refine it. We were founded in 2016. The company is privately funded. And we have a SaaS-based platform, and we intelligently and actively archive unstructured data with the idea that you can not only access it and use it today, but you can intelligently keep it forever for all kinds of uses.

I want to jump right in. I’ll tell you more about the different functions that Aparavi can do, but I want to address this idea of GDPR and that we have a completely different approach. So first of all, when we bring all the data into Aparavi, the files and the increments are stored as individual objects, and that’s what enables us to create a rich metadata database around what is in there and therefore be able to act on it. We have a very fast upload to the cloud, and I’ll talk a little bit more about what the destinations are. And then you can have configurable, concurrent transfers to increase the speed and have it go very fast. But it gives you this granular control over the data.

And we have this data classification and tagging capability that is based on both the content and a rich set of metadata. And we actually deliver a set of pre-defined classifications, so if you think about ones, like say confidential or legal are good pre-defined ones, but they are not the only ones that an administrator can use. You can actually define your own classifications. And so that starts to get extremely rich. You can even think about them from a business standpoint to say all sales data is tagged or all finance data is tagged. There’s just an infinite different ways that you can classify your data now, which provides much greater control and access over the long-term.

Aparavi is also fully content aware. And this is key to the whole GDPR piece because, as my little image here shows, it says, “Find all instances of XXX-XX-XXXX.” Either a specific number for, say, social security number—which doesn’t cover Europe, but it’s very appropriate here in the U.S—and retrieve each of those. So you can have a specific instance or a generic, and retrieve all social security numbers. Or, say, find all instances of our friend John Smith who’s asked to be forgotten and retrieve those. And because we are fully content aware and can do a full content search, it can do that.

So you can then set rules for storage location based on data types. So you could put all of this PII, personally identifiable information, into a particular storage destination, whether it be on-premises or in a particular cloud, or other ways that you could modify it. Now, George, when you and I were talking, you thought this was quite interesting. Did you want to comment on this data classification and search?

George: One of my comments is if you can’t find it, you haven’t stored it. And so the fact that you keep data forever, for example, if you don’t know where it is or if takes you too long to find that data, you might as well not even have stored it in the first place, right? And so, I think we’re guilty of storing data just in case, and that’s fine. But if you can’t find it, then I don’t know why you did it in the first place. So the classification and being able to understand what you have, where you have it, and why you have it becomes critical. We’ve obviously focused on data privacy here, but part of it is just straight out data growth. And it’s not necessarily the cost of hard disks or even SSDs, it’s the cost of data centers. When you start to run out of floor space, building a new data center gets really, really expensive.

Vicki: Well, and you can have multiple destinations for your data. But in terms of what you were just saying about people are storing data just in case. With Aparavi, what we do is we take what has been, essentially, an opaque black hole of data, those images that you don’t know what is there or where it is, and made it fully content aware and essentially available for actionable insights or specific retrieval for exactly like the Right to be Forgotten, but other uses as well.

Okay, so let’s jump on to and talk a little bit more about different functions and features that we have. We offer dynamic, hybrid cloud storage. So you can actually define any destination for your storage. It can be any data path, so on-premises, private cloud, or any public cloud. And you can have a multitude of them as well.

And the next that I want to talk about is data-centric retention policy. So this is that we have automated content and metadata classification that we were just talking about, that helps you to define the different classes of data and where you want it stored and how you want it stored. This piece is very important: our Cloud Active Data Pruning allows you to set retention policies at either the file or the sub-file level so that in the future, should you want to get rid of certain data, you can.

If you talk to the lawyers, they’ll tell you that after a certain period of time, maybe 5 years, 7 years, 10 years, in fact, it’s valuable to get rid of data. Well, Aparavi allows the administrator to set policies based on what legal or compliance may be telling you for removal of data. So saving your data, protecting your data, retaining your data, is really critical. But having an intelligent method to prune and remove and delete data, based on company compliance rules is critical. And Aparavi allows you to do that.

We mentioned there’s Advanced Archive Search, so you can search by full content anything. It can search for words before, words after. So it gives you really great control over how to use your data long-term. We also have auditing reporting that gives you logs on who changed what and when, which is really valuable as you’re trying to track what’s happened with the data, and to be able to verify that, in fact, you did execute on this Right to be Forgotten.

So this is an architecture slide that talks a little bit about our data sources. And there is a software appliance that takes the data from the data sources and moves it out, based on your policies, to the various destinations, so you can have different data types or different data sources go to different locations. We’ll also help you to move from one location to the next over time, based on your policies. And this gives the user storage and cloud mobility. So as new cloud vendors come on the market with, potentially, better economics, you can, over time, migrate your data from one place to the next to take advantage of that.

We also have an open data format, which means that there’s no software or cloud lock-in. Many years from now, if you’re no longer using Aparavi, but you had data stored, we give you a published format for that data so you’re not forever locked into having the application, as is typical with all your legacy backup vendors.

Intelligent data management needs: we talked a bit about that and that has to do with the classification, the full content search. So we are now storing the data in an intelligent way that allows you to manage it over time. Also, we have an innovative SaaS-based model that means that you’ve got no upfront cost. So you don’t have to invest in expensive enterprise license over many years with its huge upfront cost. You can purchase as you want, over time, and grow it over time.

We have an extensive partner eco-system that we are always adding to. We can run on Windows, Linux, Ubuntu, Red Hat. Our storage destinations include both on-premises such as Caringo or Scality or Cloudian, but also the major cloud vendors. And we keep adding to these. So besides these, just keep an eye. If you’ve got a cloud vendor that is your favorite or your own private cloud that you need us to certify, it’s a simple process, and we do it all day long.

We’ve recently enjoyed a bunch of recognitions since launching the product earlier this year. We have gotten quite a number of these “data storage startups to watch” from enterprise storage and CRN and others. And we are really proud of those. So with that, George, I think I’d like to turn it over to you so we have time for some questions.

George: By the way, I do want to point out that that’s the Santa Monica Pier right there in the picture.

Vicki: We are just down from the pier.

George: Yup. When I was visiting your office, I went for a run and saw that and I said, “You know, I think that’s supposed to be like famous or something.” But I failed geography. So, yeah, we will go through and prioritize a few of those questions that have come in. We do have quite a few in the queue. Before we get to those questions, just a couple of housekeeping items. At the bottom of your player, there’s an attachments button. You can click on that and there’s all kinds of additional information where we’ve done a couple of deeper dives. There’s also an overview of the Aparavi products. You can get all that there, so feel free to click on those. You don’t need to register or download anything.

Vicki: Let me just say, George, I want to chime in and say that we have a free trial, as well. Two things: we are happy to give anybody a demo, just visit the website and click on “Request a Demo.” But we also offer a free trial, so anybody who wants to check it out and try it, just log in, go to our website, and click on “Start Your Free Trial.”

George: Okay. And then, if you’re on-demand viewer, we’ll add a link to that demo right in the attachment section so you can click right on it. Also, after the questions are done and before you leave, if you would do us a favor and give us some feedback on the session today, what you thought of it. Star rating system, five stars being the best. So that is all the housekeeping. And I did have a question come in: “Will the webinar be available on demand?” The answer is, “Yes.” It should be five minutes after we are done today. So if you missed a section or you got in late, feel free to come back and listen.

If you think this information would be valuable to a colleague or a friend in the industry, please capture the link. The exact same link that you used to come to the live event will also work for the on-demand event, so feel free to send that to them. Under GDPR laws, I’m pretty sure that it’s better for you to send it to them than I am, so there you go. So, Vicki, are you ready for a few questions here?

Vicki: Yeah.

George: Okay. Let’s see, there’s one that I saw come in that I wanted to ask, and I’ll let you kind of jump in there, if you want to. It’s, “If I set my current backup solution and do a file-by-file backup instead of an image backup, will that cover you in GDPR?”

No, for a couple of reasons. First of all…again, I’m only going within the scope of backup software solutions that I know, which I think is pretty good. You could do a file-by-file backup, but it still would be job-based, and you would still have trouble removing an individual set of files from within those jobs. In fact, it’s probably impossible in most cases. The other problem you’re going to have is time. There’s a very good reason that the legacy solutions all went to imaged-based backups, because it would take them days, literally, to back up a server with a million or a billion files on it. The other thing, my other part of that would be, many of the modern next generation—I don’t even know what I would call them—products brought to market after 2005 and before 2016, may not even have the option to do a file-by-file back up. So you may want to do one, but you’re not going to be able to, and even if you could, I don’t think it would help you much. Anything you would add to that, Vicki?

Vicki: Yeah, I agree with you. And one of the things that you need that we built into the solution is the ability to easily retrieve data wherever it is and not require the administrator to know that, to know which job it’s in or where it’s stored. So Aparavi makes that completely seamless. You just ask for the data and Aparavi goes and finds it. You don’t have to know where it is.

George: There you go. Good. So we’ve got time for a couple of more questions. But before I do that, I want to put the contact information up on the screen. Feel free to reach out to us by any of those. If you’re an on-demand viewer and want to ask your questions, you can tweet us questions @storageswiss. You can also email me directly. That is my legitimate email address. I do not, however, need any offers for goldmines in foreign countries. I’m all set on goldmines right now. So let’s see. Let’s take this question. I think you touched on this a little bit, Vicki, but what on-premises storage systems do you support?

Vicki: You can actually define any data path and that will work just fine. It’s funny, we do talk to customers—I’d be interested to see what you’d say about this, George—some customers are absolutely committed to the cloud and have an active multi-cloud strategy going, but it always surprises me how many are still saying they have a corporate policy that they will not put data in the cloud. And so, for Aparavi, that’s perfectly fine. Many of them are setting up their own private clouds, and we can support those or any data path that they define.

So although we market ourselves as enabling a multi-cloud strategy—and by the way, this is just a funny aside, if you talk to Amazon, their idea of multi-cloud is S3 and Glacier, but we don’t mean them. So while we enable you to do that, you’re not restricted to that, you can keep everything on-premises if you like.

George: Yeah, and I can argue both points. The cheapskate in me that has a 10-year old car says, “Do the math. Leasing or renting 100 petabytes of storage for the next 20 years is going to be way more expensive than owning it.” However, you may not want to be in that space, which I get. And the other thing, like I’d mentioned earlier, is you do have to factor in the cost that you may run out of data center floor space at some point. And building new data centers, you think there’s regulations around data? Wait until you see the regulations around building a data center, plus the cost, of course. Let’s do one last question because I think it’s really important for the viewers, Vicki. How does your SaaS pricing model work?

Vicki: Great. We’ve got a variety of flexibility here. First of all, it’s based on the amount of source data that anybody is putting into Aparavi, but we give you a variety of options. You can start with a pay-as-you-go model. That is just a monthly bill based on how much data Aparavi is managing. Or you can have pre-set plans that are monthly or annually. And our pricing model is right on our website. So we are very transparent about this. You can just visit the website and you can click on “Pricing” to see what it looks like. And as with most fast pricing models, you get a little bit of a discount if you go to, say, an annual, a little bit of a discount if you’re on monthly over pay-as-you-go, and a little bit more of a discount on annual, if you go with an annual fee. So it’s very, very flexible and it’s very OPEX, not CAPEX-centric.

George: And what I like about it is it allows people to kind of grow into the solution, right? They don’t have to jump in with both feet. They can have you run a bit of their data, and then as they start to build trust and see that you can do everything you said you can do, they can add to it, right?

Vicki: And, in fact, that’s what we typically see. We see, typically, customers will start with maybe a server or a set of servers that are primary storage for their unstructured data. And it could be terabytes and it typically is many terabytes. It’s not the petabytes that the company has, but they grow into that over time.

George: Yeah, exactly. All right. Well, we’re going to end it there, give folks an extra five minutes of their day. I want to thank everybody for attending today. Again, if you want to send this webinar to a friend, just send the same URL that you used to get in; it will take them to the on-demand event. And by the time you send that email and they receive it, the on-demand event will be available. Again, look for the attachments and then, lastly, please give us feedback as you leave the presentation. Vicki, thank you so much for joining us today. I appreciate your help with the webinar.

Vicki: Well, thanks for inviting us.

George: Glad to do it. And I wanna thank all of you guys for attending. For now, though, I am George Crump, Lead Analyst for Storage Switzerland. Thank you for joining us.