Inaccuracies of Machine Learning

Inaccuracies of Machine Learning

 

Automation has found its way into every major technical industry.

 

And it’s no wonder why.

 

Streamlining operations with machines increases productivity and efficiency, especially for fields where large volumes of information are a factor.

 

When it comes to data loss prevention however, the wrong type of machine-learning tools can cost a network just as much as it benefits.   Administrators often find themselves in an approach-avoidance conflict.

The Problems

Many DLP programs with machine learning functions rely on pre-set algorithms and regular expression patterns in order to function.   These algorithms determine what “sensitive data” is and determine what controls and safety measures are activated in any given scenario.   “This leads to serious issues in identification accuracy, the key to effective DLP coverage.”

First and foremost, false positives are produced by an overly expansive and generalized identification approach.  Markers that are meant to be specific, but in the context of terabytes of data, end up being highly generic, and cause the false positives to pile-up.  Additionally, false negatives allow important files and data streams to slip through the cracks.  Pre-set programs are incapable of detecting subtleties in content beyond their own algorithmic structure.   Factors such as context, timing, and the users involved in a particular data transfer, are often not factored into the programs assessment.

“In the Box” thinking

The “in-the-box” thinking of many machine-learning programs often leads to loopholes to circumvent protocols.   Files that have already been flagged as sensitive can find their way through the program’s security protocols simply by having various elements of the file changed such as format conversion, copying, extracting, embedding, re-typing, compression, or file extension changes.

One major problem in particular pointed out by cyber security researchers is the issue of the constant data “switch-up” in the contemporary business world. Today’s companies introduce new confidential or proprietary data into their systems frequently. Many DLP platforms are not equipped to deal with the constant flow of diverse information.

The Solution: Math & Science

Using patented, scientific and mathematical models, the GTB data protection detection engines use an intelligent approach system to manage sensitive data.  Rather than rely on set models, GTB programs regularly analyze data with intelligent algorithms.   This approach virtually eliminates false positives by honing in on relevant data and only real exfiltration threats.  False negatives are also prevented with these methods.  Based on the indicators learned from already identified files and data streams, GTB can track sensitive content even when elements of a file or data stream are changed.

GTB’s Data detection that Works allows administrators to have the best of both worlds: High security assurance, along with the benefits of an automated platform.

adroll_adv_id = “UIOFH72HVBDSPBBLAJUZE6”;
adroll_pix_id = “HNO2CUNA4BAINCHLEPH2JH”;
/* OPTIONAL: provide email to improve user identification */
/* adroll_email = “username@example.com”; */
(function () {
var _onload = function(){
if (document.readyState && !/loaded|complete/.test(document.readyState)){setTimeout(_onload, 10);return}
if (!window.__adroll_loaded){__adroll_loaded=true;setTimeout(_onload, 50);return}
var scr = document.createElement(“script”);
var host = ((“https:” == document.location.protocol) ? “https://s.adroll.com” : “http://a.adroll.com”);
scr.setAttribute(‘async’, ‘true’);
scr.type = “text/javascript”;
scr.src = host + “/j/roundtrip.js”;
((document.getElementsByTagName(‘head’) || [null])[0] ||
document.getElementsByTagName(‘script’)[0].parentNode).appendChild(scr);
};
if (window.addEventListener) {window.addEventListener(‘load’, _onload, false);}
else {window.attachEvent(‘onload’, _onload)}
}());

Visibility: Accurately, discover sensitive data; detect and address broken business process, or insider threats including sensitive data breach attempts.

Protection: Automate data protection, breach prevention and incident response both on and off the network; for example, find and quarantine sensitive data within files exposed on user workstations, FileShares and cloud storage.

Notification: Alert and educate users on violations to raise awareness and educate the end user about cybersecurity and corporate policies.

Education: Start target cyber-security training; e.g., identify end-users violating policies and train them.

  • Employees and organizations have knowledge and control of the information leaving the organization, where it is being sent, and where it is being preserved.
  • Ability to allow user classification to give them influence in how the data they produce is controlled, which increases protection and end-user adoption.
  • Control your data across your entire domain in one Central Management Dashboard with Universal policies.
  • Many levels of control together with the ability to warn end-users of possible non-compliant – risky activities, protecting from malicious insiders and human error.
  • Full data discovery collection detects sensitive data anywhere it is stored, and provides strong classification, watermarking, and other controls.
  • Delivers full technical controls on who can copy what data, to what devices, what can be printed, and/or watermarked.
  • Integrate with GRC workflows.
  • Reduce the risk of fines and non-compliance.
  • Protect intellectual property and corporate assets.
  • Ensure compliance within industry, regulatory, and corporate policy.
  • Ability to enforce boundaries and control what types of sensitive information can flow where.
  • Control data flow to third parties and between business units.