Orchestrator Risks

January 28, 2020

Orchestrator Risks

Gaining Visibility into NIST SP 800-190, Part Five

In the previous blog post in our series, Dan Kiraly described how native AWS tools and third-party solutions can address registry risks identified in section 3.2 of the NIST SP 800-190 Application Container Security Guide. This post will explore Orchestrator Risks and Countermeasures (sections 3.3 and 4.3) based on our lab environment, which utilized AWS EKS.

Orchestration technologies such as Kubernetes provide a way to automate and deploy multi-container applications, like Docker, across multiple hosts without the need for managing each container separately. For example, if a DevOps teams wanted to assemble a containerized microservices application, Kubernetes provides a way to help achieve the deployment without having to manage each container separately.

With that in mind, the health and security of the cluster is of critical importance and should not be understated. Areas like Unbounded Administrator Access, Unauthorized Access, Poorly Separated Inter-Container Network Traffic, Mixing of Workload Sensitivity Levels and Orchestrator Node Trust are key container orchestration elements that should be reviewed when planning an orchestration architecture.

The following sections expand on NIST SP 800-190 guidance with Palo Alto Networks’ Prisma Cloud (formerly Twistlock) factored in as a countermeasure to the risks listed.

3.3.1 Unbounded Administrator Access: Many organizations running container orchestration technologies assign full cluster administrator privileges to their users for day-to-day operational requirements. Common pitfalls of this approach (such as accidentally deleted clusters and secrets, as well as insider threat risks) highlight its limitations.

NIST SP 800-190 offers this guidance on Unbounded Administrative Access:

3.3.1 Example Risks

Single orchestrators running many different apps, managed by different teams and with different sensitivity levels
Without proper scope, a malicious user could affect or subvert the operation of other containers managed

4.3.1 Countermeasures

Due to wide ranging span of control, orchestrators should use least privilege model
Example: Members of a test team should only be given access to the images used in testing and hosts for running them. No access should be provided to production systems

In addressing unauthorized access and the concept of least privilege, migrating to an RBAC-based position with the clusters is highly recommended. For a more comprehensive approach with RBAC, all applications should receive a service account that isn’t the default account. Specifying a role and a corresponding RoleBinding for the account ensures that only API resources required are accessed. Enforcing the RBAC on the user side is done through the cluster authenticator. For example, for AWS EKS clusters, the AWS aws-iam-authenticator uses mapRoles to map a roleArn to a set of groups. Mapped roles can be broken down into reader, writer and administrator.

Palo Alto Networks’ Prisma Cloud offers a feature called Access Control which contains a default deny-all access control rule for Docker and Kubernetes commands. When enabled, any permitted activity must be explicitly whitelisted.

Orchestrator Risks 0

Prisma Cloud - Access Control for Docker & Kubernetes Commands

Kubectl-Who-Can, an open source plugin from Aqua Security, displays which users, groups and service accounts are bound to Kubernetes cluster roles with a given set of permissions.

Orchestrator Risks 1

Aqua Security – Kubectl-Who-Can

Open source RBAC Manager by Fairwinds is an operator that supports declarative configuration for RBAC with new custom resources. Instead of managing role bindings or service accounts directly, you can specify a desired state and RBAC Manager will make the necessary changes to achieve that state. Fairwinds also offers an open source tool, RBAC Lookup, that will find, and display roles and cluster roles attached to any user, service account, or group name in a Kubernetes cluster. Both RBAC Manager and RBAC Lookup are available on GitHub.

3.3.2 Unauthorized Access: Lack of visibility into account governance within the cluster is a key risk worth noting. NIST SP 800-190 offers this guidance on Unauthorized Access:

3.3.2 Example Risks

Orchestrators often include their own authentication directory service, which may be separate from the typical directories already in use within an organization
Disparate systems can lead to weaker account management practices due to less rigorous enforcement
Leakage of highly privileged accounts can result in systemwide compromise
Containers typically use data storage volumes that are managed by the orchestration tool and are not host-specific. Because a container may run on any given node within a cluster, the data required by the app within the container must be available to the container regardless of which host it is running on

4.3.2 Countermeasures

Access to cluster-wide administrative accounts should be tightly controlled as these accounts provide ability to affect all resources in the environment
Organizations should implement single sign-on to existing directory systems where applicable
Organizations should use tools for encrypting data used with containers that allow the data to be accessed properly from containers regardless of the node they are running on

Prisma Cloud supports multiple forms of directory service (e.g. Active Directory, OpenLDAP, SAML) and identity provider integrations. Access granted to orchestrator commands can be done on either a user-by-user or group-by-group basis. Access control rules are defined based on filters and pattern matching expressions for host names, image names, container names and/or labels. Policies can be configured that raise alerts or block commands. Prisma Cloud also supports multi-factor authentication based on x.509 certificates (e.g smart cards).

Orchestrator Risks 2

Prisma Cloud – Authentication view

3.3.3 Poorly Separated Inter-container Network Traffic: Traffic overlays between individual nodes and lack of visibility for this traffic represents a significant risk for orchestrated systems. Typical network traffic monitoring tools lack the ability to monitor the overlay networks in use. Encrypted traffic between nodes only exacerbates the visibility issue organizations face.

NIST SP 800-190 offers this guidance on Poorly separated inter-container network traffic:

3.3.3 Example Risks

Traffic between individual nodes is routed over a virtual overlay network, resulting in an overlay network that is typically managed by the orchestrator and is often opaque to existing network security and management tools
Example: instead of seeing database queries being sent from a web server container to a database container on another host, traditional network filters would only see encrypted packets flowing between two hosts, with no visibility into the actual container endpoints, nor the traffic being sent
Encrypted overlay network can create a “blindness” scenario where organizations are unable to effectively monitor traffic within their own networks

4.3.3 Countermeasures

Orchestrators should be configured to separate network traffic into discrete virtual networks by sensitivity level
Per-app segmentation is also possible; for most organizations and use cases, simply defining networks by sensitivity level provides sufficient mitigation of risk with a manageable degree of complexity
Example: public-facing apps can share a virtual network, internal apps can use another and communication between the two should occur through a small number of well-defined interfaces

Prisma Cloud provides basic visibility insights into activity within a Kubernetes data plane. Visibility options include connections within containers, connections between apps within a namespace and connections between serverless applications and microservices (e.g. AWS, Azure, GCP).

Orchestrator Risks 3

Prisma Cloud - Radar view of Hosts Visibility

Orchestrator Risks 4

Prisma Cloud - Radar View of Container Visibility

For enforcement, Prisma Cloud contains the Cloud Native Network Firewall (CNNF). CNNF operates as an east-west firewall between containers, minimizing potential damage by preventing attackers from moving laterally through an enterprise when they’ve already compromised a segment of it.

Prisma Cloud automatically maps, identifies, and allows valid traffic flows in environments based on its proximity to applications and knowledge of how they behave. Prisma Cloud dynamically creates filters that automatically allow valid connections and drop suspicious connections, regardless of where containers are running in the cluster.

Orchestrator Risks 5

Prisma Cloud – Monitor > Events (Cloud Native Firewall view)

3.3.4 Mixing of Workload Sensitivity Levels: A potential risk factor is the practice of sharing virtual networks between applications. For example, in the case of two applications with different sensitivity levels (e.g. one public facing, the other internal), sharing a single network could expose the internal application to increased levels of attacks and a potential trust for attackers to exploit. Another aspect to keep in mind is that orchestrators may place a sensitive workload on the same node as a public facing workload due to available resources at the time. For regulatory compliant systems, this could also be a major impact due to assessment scoping requirements.

NIST SP 800-190 offers this guidance on Mixing of workload sensitivity levels:

3.3.4 Example Risks

Orchestrators are typically focused primarily on driving the scale and density of workloads. This means that, by default, they can place workloads of differing sensitivity levels on the same host
Example: in a default configuration, an orchestrator may place a container running a public-facing web server on the same host as one processing sensitive financial data, simply because that host happens to have the most available resources at the time of deployment
In the case of a critical vulnerability in the web server, this can put the container processing sensitive financial data at significantly greater risk of compromise

4.3.4 Countermeasures

Orchestrators should be configured to isolate deployments to specific sets of hosts by sensitivity levels
Segmenting containers by purpose, sensitivity and threat posture provides additional defense in depth

To address the risk of mixing workload sensitivity levels, architectural considerations should be made. Workloads can be pinned to specific nodes through the use of labels, allowing for isolated deployments. Container security products, like Prisma Cloud, can use attributes like labels to monitor and enforce security policy. Another architectural alternative is to break out workloads into their own isolated clusters based on sensitivity levels.

In the case of isolated architectures, Prisma Cloud can secure each environment separately. Its granular permission enforcement capability lets you segregate and manage each protected cluster by the appropriate groups as needed. For shared clusters, Prisma Cloud leverages the labels and naming schemas assigned to the workloads. Attributes such as image name, container name, host name and labels can be targeted to enforce policy and secure the environment. Resources (e.g images, containers, hosts, labels) can be grouped for visualization and management purposes. Prisma Cloud can also append Docker image and Kubernetes labels to Prisma Cloud events.

Orchestrator Risks 6

Prisma Cloud – Monitor > Compliance listing for Trusted Images

3.3.5 Orchestrator Node Trust: The trust relationship for nodes in an orchestrated environment is vital and represents another risk factor that should be taken into consideration.

NIST SP 800-190 offers this guidance on Orchestrator node trust:

3.3.5 Example Risks

Environments with weak orchestrator security controls can expose the orchestrator node, subsequent nodes and related container technology components to increased risk. Examples include:

Unauthorized hosts joining the cluster and running containers
The compromise of a single cluster host implying compromise of the entire cluster—for example, if the same key pairs used for authentication are shared across all nodes
Communications between the orchestrator and DevOps personnel, administrators and hosts being unencrypted and unauthenticated

4.3.5 Countermeasures

Orchestrators should ensure the following:

All nodes are securely introduced into the infrastructure
Have persistent identity throughout their lifecycle
Can provide an accurate inventory of nodes and their connectivity states
Are designed specifically to be resilient to compromise of individual nodes without compromising the overall security of the cluster

A number of options are available to address the issue of node security within a cluster. Prisma Cloud can assess compliance of the nodes as well as registry images and also scan for vulnerabilities with the nodes and docker environment. Hosts, nodes and registry images are assessed based on compliance benchmarks including CIS Docker Benchmark, CIS Kubernetes Benchmark and CIS General Linux Benchmark. Prisma Cloud can also assess the services running on the host system.

Orchestrator Risks 7

Prisma Cloud – Monitor / Vulnerabilities (Hosts View)

Orchestrator Risks 8

Prisma Cloud – Monitor / Vulnerabilities (Hosts View - Detailed)

Orchestrator Risks 9

Prisma Cloud – Monitor / Vulnerabilities (Registry View)

Orchestrator Risks 10

Prisma Cloud – Monitor / Vulnerabilities (Registry View - Detailed)

For AWS EKS users, CloudWatch can be configured to provide additional visibility into the health of clusters using Container Insights, which was released earlier this year. Container Insights is available after users have created a new Kubernetes clusters namespace and installed Fluentd, an open source data collector.

Documentation of this process can be found at AWS.

Orchestrator Risks 11

AWS CloudWatch – Container Insights

Out-of-the-box dashboards for Container Insights provide administrators visibility into EKS Clusters, Nodes, Services, Namespaces, Pods, ECS Clusters, ECS Services and ECS Tasks. Information displayed includes, but is not limited to, CPU and memory utilization as well as network TX and RX. Statistics can be filtered by time and date. Additional filters can be applied to further narrow the scope of information displayed with in the dashboards. Administrators have additional flexibility to pivot to the raw AWS logs and filter as needed.

I hope this blog proves helpful to those seeking to obtain visibility into orchestrator security risks. Stay tuned for the next blog in this series, which will cover container risks and countermeasures.

By:

Rob Brooks

Senior Research Scientist

Rob Brooks has been involved in Information Security for 20 years and has served as a CISO, Senior Architect, Sysadmin and Engineer along the way. Rob currently works as a Sr. Research Scientist in Optiv's R&D group, managing the company’s private cloud and helping research security products.

How Can We Help?

Let us know what you need, and we will have an Optiv professional contact you shortly.

Orchestrator Risks

Orchestrator Risks

Gaining Visibility into NIST SP 800-190, Part Five

NIST

Containers

CyberOps

NIST Series

How Can We Help?