DC Blog

Learn how to keep your business, its staff and your devices protected with our free resources

By Handd Admin

27 Feb, 2020 5 min read

Report

How Machine Learning is enhancing data classification

It’s 2020 and the terms ‘Machine Learning’ and ‘Artificial Intelligence’ are being bandied around the security industry like light-up yoyos in a ‘90s school playground. Their popularity is no great surprise when you consider how Machine Learning and Artificial Intelligence can help businesses to overcome the biggest challenge to the quality of their data – the human factor.

The big data challenge

Every day, we create 2.5 quintillion bytes of data[i]. Some structured, some unstructured,

Classifying this data is essential if it is to be controlled and protected. Classification has historically been the responsibility of the data owner, however, this approach brings with it a key challenge – the quality of data classification is only as good as those classifying it. Users are still only human, and confident, robust decision-making takes time and effort. This introduces inherent risk into imperative classification decision-making.

Over the years, keyword searches and regular expression (Regex) searches have taken the legwork, and some of the guesswork, out of the classification of documents.

Based on finding patterns, these classification techniques can make the process faster and easier than user-driven classification, especially when integrating a body of legacy data. However, keyword and Regex searches have been used since the ‘70’s and, as they depend on short strings of data rather than a holistic view of the data, there is still a great deal of scope for data to be incorrectly categorised.

How Machine Learning delivers more robust data classification

Machine Learning is enhancing the accuracy of data classification by introducing automated classification that uses the entire document to determine the nature of the data and its level of security.

By feeding a dataset, often referred to as a corpus, into a machine for analysis, a Machine Learning algorithm is able to identify key characteristics of particular types of data as it sees or saves new files.

This data then drives the machine’s ability to classify new data based on what it’s learned from the information fed to it. To do this, it references the full document, searching for a variety of clues from which it can determine the classification to apply and then applying it.

With the full document referenced before a decision is determined, Machine Learning is able to classify data with fewer false positives.

The opportunity to use Machine Learning in data classification isn’t simply limited to saving new documents. It can also substantially enhance data security, reducing the opportunity for sensitive information to be sent in error. Using Machine Learning, emails and documents can be automatically classified according to the data they contain. Resulting classifications can then govern the application of the appropriate protection policy and by doing so, protect against accidental data loss.

This additional check on user-selected classifications can substantially reduce the risk of human error, and with autocorrection capabilities, it can do so without extending or changing existing workflows.

Increased classification accuracy makes it possible to enhance the performance of Rights Management software, Data Loss Prevention solutions, Cloud Access Security Brokers (CASBs) and even next generation firewalls (NGFWs) with more robust classification selections.

To find out more about how Artificial Intelligence and Machine Learning could add a whole new time-saving, reliability-enhancing dimension to your Data Classification solutions, contact the team of Data Classification specialists at HANDD on 0845 643 4063.

[i] https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/