In A CISO’s Guide to Artificial Intelligence, we view artificial intelligence as providing advisory, enhanced service, and semiautonomous cybersecurity defense functionality based on a range of structured and unstructured data, including logs, device telemetry, network packet headers, and other available information.
Simply put, AI is the application of applied statistics to solve cybersecurity problems. The goal is to create analytics platforms that capture and replicate the tactics, techniques, and procedures of the finest security professionals; democratize the traditionally unstructured threat detection and remediation process; or complete a range of near-real-time automated detection and response techniques that theoretically can be replicated, but by the time the security professional completed the task, it would be far too late.
As AI continues to promise simplicity in the face of the complexity of today’s security environment, it will be helped by the homogeneity of data.
Frank Dickson – Group Vice President, Security and Trust
However, our collective focus is in the wrong place in our opinion. The hype and conversation focus are on AI. Why not? The possibilities of AI inspire the imagination, illuminating the possible. The key to enabling outcomes in security is not about the AI; it is about the data. Many children are inspired by the power and girth of locomotives. The potential of the locomotive, though, relies on the boring and tedious process of laying the tracks and the enabling infrastructure. Likewise, data is the enabling infrastructure for security AI. Three characteristics are deterministic of success:
- Data framework structures
- Data management
- Data curation
Data Framework Structures
As we look to unlock the potential of artificial intelligence to unlock the potential and promise of – for example – extended detection and response (XDR) creating frameworks and structures is critical. The most basic definition of XDR is:
- The collecting of telemetry from multiple security tools
- The application of analytics to the collected and homogenized data to arrive at a detection of maliciousness
- The response to and remediation of that maliciousness
As we look to apply analytics to the collected and homogenized data to detect maliciousness, AI needs structure to be able to look at the data at scale. Afterall, AI is really no more than a mathematical model that implies the relationship of the data. Telemetry optimized for a point use case, such as the perimeter-centric defense of network perimeters of a firewall, is of little use if you cannot relate it with other data sets, such as identity, and if it is not framed in a way to achieve an end goal.
As we discussed the value of event sequencing as a core attribute of most detection and response offerings, much of the value was unlocked by application of the MITRE ATT@CK framework. Not only does the framework provide structure to the task of threat detection by mapping to the cyber kill chain, but it also creates a manner in which different tools from different vendors can structure data and prepare it for analysis.
Data Management
Data has weight and gravity. Security data has a lot of weight. For example, a typical endpoint protection platform agent will produce 150-200MB of data a day. Movement, storage, and management of such data quickly creates a problem of scale. Data retention policies thus can become quickly divisive topics.
In addition, only with AI can the increasing pools of telemetry be put to the very best use. ML has limits, but using AI to train for previously unseen patterns and lens on the data can (time-to-X) be reduced in a truly significant way.
Data weight has become a competitive differentiating tool. For example, the move by the infrastructure-as-a-service (IaaS) vendors to retain their own cloud logs at no or very low cost is significant, as SIEM is often priced based on the volume of data ingested, and the SIEM vendors cannot simply “eat” the cost of ingesting and storing voluminous cloud logs. Analysis needs to happen on the native format in a predictable manner. The entire business model of SIEM, XDR, and other analysis platforms thus is increasingly challenged and is changing based on the weight of data.
Data Curation
In a world where every vendor has a different data structure, curating heterogeneous data sets to create data homogeneity to enable analysis is an extra step, a potentially ominous step depending on the calculus and scale required. As AI continues to promise simplicity in the face of the complexity of today’s security environment, it will be helped by the homogeneity of data. In a world where every vendor has a different data structure, curating heterogeneous data sets to create data homogeneity to enable analysis is an inhibitor.
Restructuring data takes time and costs money. Thus, large vendors with broad portfolios have the advantage as multiproduct but single platform offerings save time and cost due to having a larger percentage of multi-technology homogeneous data sets.
Overcoming the issue of data curation is the objective of many standards. For example, Structured Threat Information Expression (STIX) and Trusted Automated eXchange of Indicator Information (TAXII) were developed by MITRE as the U.S. Department of Homeland Security FFRDC. STIX is a common language for threat intelligence, so it can be shared and machine-read by any tool supporting it. TAXII is the application layer protocol designed to simplify the transmission of threat intel data. In 2015, STIX/TAXII development was moved to the OASIS international standard organization. Today, the work is free, open, and community driven.
We would be remiss if we did not mention Open Cybersecurity Schema Framework (OCSF) here and its significance to AI. Normalization of hybrid multicloud security telemetry is needed before any converged data is useful. The goal of OCSF is to simplify the exchange of data between the tools that ingest it, manage it, and enrich it because every organization has a cornucopia of solutions purchased over the past half dozen years. OCSF means a single format to make it easy for those getting started instead of writing data connectors to a lot of solutions. The real story here is one of simplicity, which is the holy grail of cybersecurity solutions.
So what? What Does This Mean to YOU?
Look. Every cybersecurity vendor is going to roll-out a generative AI interface for their tools, and they should. It is the fourth generation of the user interface; it is significant. A vendor will be conspicuous without one. By the end of 2023, every tool of relevance will have one; tools without one will likely become irrelevant or subservient to those that do. The ability of the tool to create outcomes in your environment however will be determined not by the power of generative AI but in the data and the predictive AI models behind the generative AI. It’s Not About the AI; It’s About the Data.