Data is king! Data is power! Data is the new oil! Data is the new gold! Data is the fuel of digital transformation! Data is the lifeblood of organizations! Data is everything ─ except when it’s useless. That’s the case with redundant, obsolete and trivial (ROT) data.
In this blog, the first of three-part series, we’ll discuss what ROT data is and why it accumulates so quickly. Part 2 and Part 3 will cover the costs associated retaining ROT data and the benefits that can be derived from removing it.
There’s A lot of ROT
Industry analysts and technology writers like to talk about the value of data, as well as its explosive growth. What they don’t typically talk about is the fact that much of that data is classified as ROT and has little or no value.
Depending on the source consulted, anywhere from 25 to 80 percent of data is likely:
- Redundant - duplicate data
- Obsolete – data that no longer has business value
- Trivial – data that has little or no business value
Given the statistics for data growth in general, and unstructured data specifically, it’s easy to see how quickly ROT data can add up. Consider that the total amount of data is expected to grow to more than 180 zettabytes by 2025, up from 64.2 zettabytes in 2020. If 80 percent of that data in 2025 is unstructured, that’s 144 zettabytes. If even only 25 percent of unstructured data is ROT, that’s still 36 zettabytes of low value or no value data. Given what we know about ROT data, chances are the percentage of unstructured data is comprises will only increase.
Nonetheless, organizations keep gathering it, storing it and wading through it to find something useful.
The Power of Unstructured Data
Before discussing how and why of ROT data accumulation, it’s important to first understand the massive amounts of unstructured data where the majority of ROT data exists.
Unstructured data is information that isn’t arranged according to a pre-set data model or schema. Unlike structured data, it doesn’t fit neatly into relational databases that make organization and use relatively easy.
It exists in numerous formats and across as many, if not more, locations, making it difficult to locate and consolidate. According to multiple industry analyst estimates, anywhere from 80 to 90 percent of the data generated is unstructured information in the form of text, video, audio, emails, web server logs, social media, information from Internet of Things (IoT) devices, business apps and other content types.
There’s great potential value in unstructured data for data mining, business intelligence, predictive analytics and more. But with so much of unstructured data classified as ROT, it can be expensive in terms of both storage space and the human resources required to deal with it to find what actually has value.
So why do we keep letting it build up ─ and why is it building up in the first place?
How ROT Accumulates
There are many reasons that ROT data accumulates. Among them:
Data hoarding at the human level. People have a propensity for not throwing things away, and that includes data. There’s a tendency to keep all versions of digital assets, as well as files that have since been updated or are no longer relevant. Whether at home or on the job, the ROT data can add up. Sometimes, failing to get rid of ROT data is simply a matter of forgetting to do so. If employees are busy, that can easily happen. They could also have a fear of getting rid of something that might be needed later on. Some may simply not think deleting unused, old or error-filled files is necessary if they aren’t told to do so.
Employee habits. Employee behaviours can also contribute to the accumulation of ROT data. It’s common for employees to create multiple copies of the same file. Those multiple copies can get saved in various storage systems. They also can get shared and stored across departmental boundaries, across company and affiliate locations and in numerous storage locations. Changes made to the file by one department may not be made by the other departments, resulting in inconsistent, error-filled files. (See data silos.) Employees also often save files that aren’t related to their organizations. The 2016 Veritas Global Databerg Report noted that over 25% of employees store – and often forget about – personal data on work devices, ranging from personal, legal and identification documents (57%), photos (57%), music (47%) down to video (33%) and games (26%). Six years later, it’s likely the percentage of employees storing personal data on company assets is still the same if not higher.
Data hoarding at the business level. Businesses hoard data too. Many may not have clearly defined or enforced data retention policies in place, so they hang on to data longer than necessary. Often there’s a fear that data may be needed in the future for legal, historical or other purposes, despite what’s specified in a retention policy or required by regulations or laws.
Lack of policies, processes and priority. Many organizations don’t make data lifecycle management a priority. They not only lack policies for data retention. They also don’t have policies or processes in place for formally dealing with data at any stage or across all necessary scenarios. For example, it’s common for some organizations to retain files owned and used by former employees. A policy for reviewing and clearing out these files could eliminate unnecessary data. Companies also may not be able to delete certain kinds of data easily and quickly or to move it to less expensive storage locations due to a lack of the right technology and automated processes. And while they often include data privacy and security in employee training, companies typically don’t cover data management. There are few if any guidelines for what should and shouldn’t be stored. Nor are clear, logical naming conventions provided to help identify document age and relevance. In addition, businesses may not go through periodic data reviews to clear out ROT data, whether due to lack of time, lack of resources or failure to make doing so a priority.
Data silos. Data silos regularly occur when business units are decentralized and managed as separate entities or as a result of company culture. Business growth, mergers and acquisitions also contribute to silos. Rather than sharing data, silos create and control their own data. There’s no single “source of truth,” and lots of similar, duplicate data or inconsistent data that falls under the category of ROT.
Backup practices. Backing up data is critical for business continuity and disaster recovery. The problem is that many organizations don’t know what they’re backing up. That makes it easy – and likely – for ROT data to accumulate quickly.
Consider organizations that use the 3-2-1 rule of data backup: create one primary backup and two copies of data; save backups to two different types of media; and keep at least one backup file offsite. It’s a great idea because it helps ensure that at least one copy (if not more) is available in the event of a downtime-causing incident. Now consider the scenario in which a company backs up 20 GB of files daily and follows the 3-2-1 rule. That translates into 60 GB of data. If 80% of that data is ROT, that means 48 GB of data with little or no value is taking up storage space and budget dollars.
Lack of migration planning. Whether embarking on a storage migration or cloud migration, data has to be moved. The amount of data migrated can affect many aspects of the process, including scheduling, project management, testing and budget. Ideally, thorough pre-migration planning should ensure that the right data, in the right format, is moved. If the data being migrated is coming from several sources, the migration plan should also specify what happens to the original source files. However, lack of communication with all stakeholders, time crunches and other factors can prevent that from happening. The result: existing ROT data can compromise migration success and continue to build up.
The ROT Story Continues
Unfortunately, there are numerous other reasons ROT data accumulates; many are difficult if not impossible to eliminate. In other words, ROT data is inevitable. That doesn’t mean companies should accept its existence. In fact, they can’t afford not to take action. Part 2 of this blog series explains why.