Orchestration & Automation (O&A) Methodology
November 03, 2017
O&A is at the heart of working with big data in an automated and efficient fashion. It involves two important elements:
- Orchestration: Planning and coordination of elements, variables, and process.
- Automation: Automating a process or task.
The role of designing and managing O&A for an organization is much like that of an orchestra conductor, making sure each part is playing its part in an integrated musical piece that is dynamic and changing, requiring core excellence in each role but also coordination through the conductor. Elements of musical orchestration include the management of timing, volume and technique, not unlike that of how one designs and manages O&A within managed security.
Data Needs Context to be Meaningful
Traditional security dashboards, metrics, and similar information displays “counts” and data, but not necessarily with context. Pretend, for example, the total number of vulnerabilities found in a vulnerability scan of an organization totals 76. That number alone does not provide any context or interpretive meaning. If we find that it is a very small organization with only 20 hosts, this number means something quite different from 76 total vulnerabilities in a larger organization with 500 hosts (think ratios and risk). If we find that 76 total vulnerabilities exist within a larger organization of 500 hosts but 13 are critical, and two of those are on critical hosts, we now have a picture of priorities and risk management with great meaning and actions that follow.
Automation Alone is Not O&A
Simply automating basic counts of security instrumentation data to a dashboard is not O&A. This example involves just the “A” in O&A. True O&A involves the orchestration component, where coordination, correlation, ‘timing,’ ‘weight’ in scoring algorithms, and similar constructs are integrated into a solution. This is a complex task when one considers how to best drive efficiency, value and high-confidence outcomes to avoid false positives in results.
An example of this challenge is seen in how cyber risk is calculated. Today that is often seen through various proprietary values by security vendors and solutions, such as endpoint, anti-virus, and so on. It is common to see a diverse set of results come back on the same threat from such solutions, and they often don’t agree. How does one make sense of all of this data?
A human likely has various levels of trust and confidence in sources and knows how to interpret each. For example, a web forum may be an interesting source of leads and information but has low confidence and/or may be known for reporting incorrect, latent or poor-quality data to date, while a professional website that is trusted may provide high confidence based data for the same issue. O&A must take into account these types of sources of data, and the associated weight (lower weight for a lower trusted/confidence based source) in order to best automate how risk is then assigned to the threat.
O&A is Built Upon Known Process and Data
O&A is best built upon known, tested, tried and true process and data. For example, attempting to develop a risk rating system based upon variables from SIEM alerts is a solid concept, but not without a baseline of confidence, trust and “what works.” An O&A developer can easily create a scoring algorithm based off a few variables. But how do you know if it will be successful? If a human, who does that same job heuristically today scores the same alerts, will you come up with the same results through O&A? It is mission critical to get a baseline foundation of what is already working before one attempts to then automate and mature such tasks.
As an O&A developer starts to dive into existing processes and data it is common to uncover the “oh-no” factor. All too often things are more ad-hoc and immature or inaccurate than we thought. Take for example the highly debated world of security metrics and statistics. Understanding exactly how metrics are collected and used is essential to creating context and meaning of such data. All too often the O&A developer has to uncover, and then address the gaps or errors in existing process before maturation and automation are possible. This often leads to major process improvement and efficiency, and involves a higher level of effort.
Develop a Methodology for O&A
A recursive practice of process identification, documentation, analysis and maturation must take place for O&A to be successful. As aforementioned, this often involves increased levels of effort when existing processes are found to be flawed, ad-hoc or incomplete. Clearly identifying the desired outcome of a process and the associated O&A is where the methodology begins.
Developing a validation methodology is important to ensuring your O&A works before attempting to fully automate it. Take for example the concept of rating risk related to enrichment of an indicator of compromise (IOC), like a suspect file MD5 cryptographic hash value. A variety of data variables exist to identify if the file is benign, suspect, or clearly malicious, such as how many anti-virus vendors detect the sample as malicious, if it writes files and if it writes them to specific locations on a host that is commonly associated with malware, if it is associated with infrastructure of existing known malware campaigns, reputational scores from other sources and vendors, and so on. The first one mentioned, detection by AV vendors, seems simple, but it is actually complex. As you perform real-world validation of this as a potential indicator of maliciousness, you may find that emergent and real-time threats have very low detection rates while those that have been in the wild for a few months have very high detection rates. This can also be true of the same sample, with few detections up front, and more as time progresses. This is all too common in the world of malware, so how do you then tune that data point to potentially include it within O&A to identify potential maliciousness? I didn’t even bring up false positives…so have fun with that one! Validation of O&A is not trivial and must be done with a sample size that is large enough, and real, to validate your attempted automation and value outcome.
Once a POC has been created for a potential O&A element, pilot it within a development environment. Again, tried, tested and true is important if we are going to be successful at big data efficiency. The last thing we want to do is attempt to automate something that generates a large number of false positives, extra noise and things to look at, etc., increasing our work load instead of making it more valued and efficient. This requires carefully constructing how one identifies when the O&A is ready for prime-time production. Involving multiple stakeholders and documenting this process is essential to avoiding disruption to production.
O&A is not a trivial subject. As I mentioned at the beginning of this article, it’s wrought with challenges that can derail a project or negatively impact business. Develop your methodology and mature it as you work to automate other processes within your organization.