Back to overview
04. June 2022

Unstructured Data is Risky Business

Identify and take appropriate actions on sensitive data and PII that exists in unstructured data to reduce compliance and data privacy risks.

Unstructured Data is Risky Business

All data is at risk ─ from ransomware, accidental exposure, social engineering, advanced persistent threats, noncompliance with regulations and more. According to the Identity TheftResource Center's 2021Data Breach Report, there were 1,862 data breaches reported last year.The Internet Crime Complaint Center (IC3)estimates that only one in ten cybercrimes are reported, which means most crimes aren’t publicized.

Unstructured data, in particular, is highly vulnerable to cyber threats. Cybercriminals are well aware that many organizations don’t know what’s contained in their unstructured data, including sensitive or proprietary data personally identifiable information (PII). So, they’re constantly looking for ways to exploit that highly valuable data.

The Trouble with Unstructured Data

Before delving into what the Aparavi Platform can do, it’s important to understand what makes unstructured data so vulnerable to cybercrime.

Unstructured data can include anything from emails and FedEx receipts to sensor data and social media feeds. Unlike structured data, it doesn’t fit easily into pre-set data model or schema. Unstructured data is usually configured in ways that make it difficult for conventional software to ingest, process, search and analyze.

It can easily ─ and often does ─ contain files with personal or sensitive data, like payroll information, personnel files, school records, credit card information, medical history and other files that could have personally identifiable information (PII).

Generated by both people and machines ─including endpoints like sensors, unstructured data is growing rapidly. It’s estimated that it makes up approximately 80% of the 2.5 quintillion bytes of data created daily. That data is constantly streaming through on-premises infrastructure, the cloud and big data environments, and is stored in diverse repositories such as NoSQL databases, data lakes or applications.

With so much data that’s unstructured and located is so many places, it’s difficult for organizations to know what they have, much less which files are sensitive or subject to data privacy or security requirements. They also don’t know who has access to the sensitive and regulated files, if that access is necessary, and what authorized users are doing with that information.

Industries With Most at Risk Data

There’s a potential for sensitive information to exist within the unstructured data of any organization in any industry ─ think human resources and payroll. It’s critical to know all data privacy and other industry standards or regulatory requirements that may affect data that’s relevant to your specific organization.

Some industries are more likely than others to process and store sensitive data. Many are also held to higher standards and are more heavily audited, putting them at risks for regulatory non-compliance violations as well as cyberattacks. Among the industries where sensitive data is likely to be hidden in unstructured data:


In healthcare, organizations handle substantial amounts of patient data and are subject to HIPAA/HITECH and other data security laws and regulations.

Financial services

This covers banks, credit unions, insurance companies, investment firms and other types of institutions that handle and store financial records and other sensitive data. Regulations include the BankSecrecy Act, Right to Financial Privacy Act, the Gramm-Leach-Bliley Act, and the Fair Credit Reporting Act.


Education deals with data in the form of student records, PHI from on-campus clinics, financial data in administrative offices, and more. FamilyEducational Rights and Privacy Act (FERPA) and Children's Online PrivacyProtection Act (COPPA) are among the regulations that apply to educational institutions.


In retail, organizations regularly process and store credit card or other payment information, and are subject to PCI DSS and other regulations and industry standards. That may include new regulations governed by CCPA.


This is where sensitive data runs the gamut from product cost information and customer data to intellectual property rights and marketing strategy. CCPA, CPRA, GDPR and ISO/IEC 27001 are among the regulations and industry standards to know about.


Communication deals with transactional data, call recordings and more that must comply with multiple security regulations. There’s the FCC, theElectronic Communications Privacy Act (ECPA) and federal laws such as theTelecommunications Act. Numerous states have specific laws in place regarding consumer data privacy, website privacy policies, privacy of online book downloads and reader browsing information, personal information held byInternet service providers, online marketing of certain products directed to minors, and employee email monitoring.


Government includes local, state and federal organizations that regularly collect a tremendous amount of personal data from citizens and store information such as tax returns, patent applications, Social Security records, law enforcement records and more. HIPAA, FCRA, FERPA, GLBA, ECPA, COPPA, andVPPA, among many others, come into play.

Locate Sensitive Data and PII

Armed with an understanding of unstructured data, what comprises sensitive data and the regulations that may apply, you now need to find your unstructured data – and the sensitive data within it. (If you’re an MSP, this something you can do on behalf of your client as a managed service.) These basic steps can help, particularly when used in conjunction with the Aparavi Data Intelligence and Automation Platform.

1. First, search across all systems to locate your data, including unstructured data. By using the Platform’s quick scan capability, you can locate unstructured data wherever it exists throughout your organization — on premises, at the edge and in the cloud, and across multiple departments, facilities and geographies. The results will provide information about your data by location, owner, events, creation data, last access data, extension type and modification date.

2. You’ll have a lot of data to assess, so it will be easier to do that by eliminating redundant, obsolete and trivial(ROT) data. You can reduce anywhere from 25 to 80% of your data in this step alone.

If you’re using the Aparavi Platform, the information generated by the quick scan will enable you to take a first pass at eliminating ROT data. For example, creation data, last access data and modification date can help determine obsolete data that’s no longer needed. You can also use the search feature to identify duplicate files.

In addition, the Platform allows you to define custom tags that can be applied to help identify and clean up ROT data. It also offers a data actions feature that allows you to delete (individually orin a batch) specific files, or move them to a location where they can undergo data cleaning as needed.

3. With less data to review, you can now search specifically for sensitive data. Again, the Aparavi Platform can help with its extensive collection of classification policies that can be used to identify specific types of data that fall under the sensitive, proprietary and/or PII categories. Among them: bank account number policies, specific regulatory compliance policies, SWIFT codes policies and others.

Act Upon the Sensitive Data

Once you know what sensitive data you have, you can determine what to do with it. For example, it may require special handling to meet compliance requirements. Or it may need to be moved to secure storage. Again, this is something MSPs may wish to do for their clients as a managed service.

The Aparavi Platform’s Data Actions feature comes in handy here as well. You can move or copy sensitive information into a data target with limited access rights. (Copy actions can be automated.) For added security, the files will be encrypted and compressed into the proprietary Aparavi format (which is still readable to the Aparavi Platform). The data can also be moved to specific location for data hygiene or special handling to meet compliance requirements.

You’ll need to identify which users should have access to any sensitive data. Strive for least-privilege access control.You’ll also need a way to track when changes are made to sensitive data and permissions. If a user copies, moves, deletes, renames or modifies a file containing sensitive data or PII, you need to know about it ─ and be able to reverse it.

Reduce Unstructured Data Risks

To learn more about mitigating the risks associated with unstructured data – and meeting compliance and data security and privacy requirements, read the solution brief or visit here. Free demos are available too. Just contact us.