Containers, isolation, and operating systems for network spaces

Overview

The infrastructure for network spaces needs to derive from a secure, trustworthy base. One of the questions that arises in considering how to design and engineer useful network spaces is whether this base can be provided by a language run time (i.e., entirely in userspace), or whether support from the operating system is required. In this treatise, it is argued that the operating system must be involved, and the most benefit is gained when programs have access to the security mechanisms provided by the operating system. Particularly, the operating system is required to implement a security kernel, which will facilitate the key properties of these systems.

The W7 security kernel as described in [1] will be used as a reference point. The security kernel is the part of a computer system that enforces and assures safe cooperation between other components. There are three features of W7 that are explicitly pointed out:

  1. Isolated environments,

  2. Inter-environment communications (IEC), and

  3. Access mediation to ensure safe cooperation.

This post discusses the first two of these points.

A common trend in network spaces now is deployment via a containerisation system. One common system is the Docker1 container system; this is shown to often be a poor choice from a security perspective. New approaches are also being made in the form of the unikernels, MirageOS, and rump kernels.

Another problem is that of verifying the trustworthiness of programs. Some work has been done in this area primarily through the use of remote attestation vis-à-vis trusted computing, though these solutions are not well adopted. The TPM is still primarily a feature of Intel hardare; the ARM TrustZone presents another approach to the problem.

Finally, a discussion of a direction forward is made. A system implementing the security features described here is must be as easy to deploy (if not easier) than current solutions, it must be usable on commodity hardware, and it must provide utility to the users.

Isolation

The first feature is isolated environments2; environments should not be able to access each others’ envronments. The scope of an environment should be considered; typically, each thread and each process should have its own environment. A process or thread can be spawned with a pre-initialised environment that sets the initial values; this does not imply that the child has access to the parent’s environment, only that it has the value for a name at program initialisation. This name may not even be the actual value for the parent. For this document, the term isolation will refer to environment isolation.

A language runtime can provide only limited isolation. If the program compiles to native code, its environment is still governed by the operating system kernel. On a Unix system, the superuser can attach a debugger to a process to inspect the various environments. On a single-tenant3 system, this may be acceptable. On a multi-tenant system, this may not be. Even on a single-tenant system, however, a malfunctioning or poorly-written program4 may be able to affect the memory of other programs if the operating system doesn’t provide strict enough protections.

One tool that is often used for some form of isolation is containerisation. Some solutions provide a full guest operating system; for example, Docker runs a Linux installation on top of the host operating system as a lightweight virtual machine. These systems require that each container ensure that they have security updates regularly applied, making them difficult to administrate from a security perspective. They do offer some ease of deployment, but at the cost of bringing in an entire Linux installation.

More recent approaches include the Unikernels (e.g. Mirage OS5) and rump kernels6. These build compartmentalised kernels that run a single program as an isolated system. This is a viable option, moreso if the hypervisor is running only such unikernels.

Inter-environment communications

An environment that cannot share values to other environments isn’t very useful: some mechanism for sharing values outside the environment is needed. This is analogous to the problem of interprocess communication; the two notable methods for IPC relevant to distributed network spaces are:

Most distributed systems end up using message queues: Erlang uses them (thought it calls them “mailboxes”), for example. Each Erlang process gets its own mailbox; senders pick a process to send the message to, and processes must explicitly call receive to get the next message. Distributed message queues, intended for use by network applications, include the standard AMQP standard.

The operating system that supported a slightly more intelligent environment message queue (along the lines of Erlang named environments or including some kind of environment discovery), with support for local and remote message queues, if implemented on top of an open message queue protocol, could be an interesting approach for the network spaces IEC problem.

Full environment isolation, including network interfaces and filesystems, provides the best guarantee of security for a given environment. A model in which environments cannot share any information (including files) except via message passing, and in which even network interfaces are not shared (e.g. by providing a virtual network interface for each environment), and in which the operating system is a minimal kernel (maybe a nanokernel) providing the minimum level of hardware access, environment access control, and any other security features (such as the attestation discussed below), would provide a solid basis for a network space system.

Trust and environments

A secure system must have some notion of trust; the infamous Orange Book termed the trusted foundation for a system the “trusted computing base”, or TCB. The TCB for a modern computing system is exceedingly complex. For a system running unikernels on top of Xen, the TCB includes:

  1. The hypervisor (Xen)
  2. USB controller(s)
  3. Network interface card controller(s)
  4. Disk drive controller(s)
  5. Any remote management (ILOM, IPMI, etc)
  6. CPU microcode

The compelxity of these systems far exceeds the ability of groups of people to clearly audit and assess the security of. This is compounded by the typically low quality of firmware7. The techniques at our disposal cannot account for any of these threat vectors except the first.

There are some options for verifying the integrity of the operating system, however; though this only covers the first element in the TCB, it serves as a foundation. The process for an Intel-based system is generally:

  1. Secure bootloader: this is the starting point for the security. On Intel-based systems, this is secure boot via UEFI, in which the UEFI image that is used to load the boot image contains a cryptographic signature key; a boot image must be signed by this key for the bootloader to accept it.
  2. Measured boot: on Intel-based systems, a Trusted Platform Module (TPM) can be used to measure the current state of the system. This is done with special volatile registers that can only be extended, in the data to be extended is cryptographically hashed with the current contents of the register (also a cryptographic hash). These hashes can be compared to a known value, and if they do not match, the system can refuse to boot. These registers are known as Platform Configuration Registers, or PCRs.
  3. Attestation: the TPM is used to sign a message attesting to the current state of the system (i.e. the value of the relevant PCRs).

There are analogues to this; notably, some ARM-based systems have TrustZone. This splits the processor into a secure and insecure area, and a secure monitor handles switching between the two. Programs running in the insecure area cannot access the secure area except via the secure monitor. These areas (called “worlds”) are implemented as virtual processors on top of the actual hardware, and the two worlds can run independently of each other.

The use of cryptographic keys presents another problem: managing these keys. This problem involves public key infrastructure, and is an area of much contention. The problem is this: trust in a key is generally proven via cryptographic signature; the keys therefore form a tree of sorts. There is one key that is the root of trust for this tree, but there is no one-size-fits-all trust model in which the root key can be trusted by any given participant. TLS uses the certificate authority (CA) model, in which the roots of trust ship with the browser. The requirements for inclusion in a browser are governed by the CA/Browser Forum and include a standard audit of the CA’s policies and implementation. It seems unavoidable that some trusted third party be involved.

For a TPM to be trusted, its public key must be registered with some PKI mechanism, and must be signed by some root of trust. There are standard ways of doing this, but they are rarely used in practice8. This is an area where an audited, open-source implementation is sorely needed.

Attesting an environment running on the host machine poses more challenges. Typically, there is one TPM per physical machine; the operating system needs to expose some interface for an application running on the machine (e.g. an environment) to request an attestation of some sort. This requires that the operating system have some mechanism for proving the integrity of the environment; perhaps the problem can be elided by merely attesting to the integrity of the host operating system.

Storing cryptographic keys in an environment also raises the question of access control: the environments may be isolated properly, such that no environment running on the system has access to these keys, but the operating system will have access, even if only via the fact that it controls the memory on the system. This is an open area for research. The application could request that the operating system attest to its integrity to the application, but the known values for the PCRs must be in memory and therefore could be subject to manipulation by the operating system. There aren’t any good answers at this level of trust if the application cannot trust the operating system.

Conclusions

Full environment isolation, including network interfaces and filesystems, provides the best guarantee of security for a given environment. A model in which environments cannot share any information (including files) except via message passing, and in which even network interfaces are not shared (e.g. by providing a virtual network interface for each environment), and in which the operating system is a minimal kernel (maybe a nanokernel) providing the minimum level of hardware access, environment access control, and any other security features (such as attestation), would provide a solid basis for a network space system. The kernel should provide an interface to the security system to language runtimes that allow them to take advantage of the security features of the operating system (such as capabilities, access to the TPM or secure monitor, etc…).

This system needs to be easy to deploy on existing hardware; it should be at least as easy to install as a Linux base using Docker for containers. It should also support some interface for configuration management (e.g. via software such as SaltStack or Puppet) and provide administrators with a clear, simple interface. It should also be able to run on commodity hardware, which means it should be able to accomodate systems that may be lacking certain security features such as TPMs. The system should finally be virtualisable to make testing and integration easier.

Much work remains to be done in this space.

References

[1]. J. Rees. “A Security Kernel Based on the Lambda Calculus”, A.I. Memo 1564, MIT, 1996.

Footnotes

1. https://www.docker.com/

2. The term environment here means the lexical environment; it shouldnn’t be confused with the concept of Unix environments. A lexical environment is the mapping of names to their values.

3. A tenant is a user on the machine. This can be defined at various scopes; in one view, it might be a system that primarily runs one program. It might also be the case where a tenant is a user. Cloud computing services, such as Amazon’s EC2, are examples of multi-tenant systems where multiple users who do not trust each other run programs on the same machine. In the remainder of this document, a tenant will be used in this definition. More precisely, for this document, a tenant is a user on the machine who cannot implicitly trust the other users on the machine.

4. Security assists in protecting an application not only from intentionally malicious users, but from possibly unwittingly rogue applications also running. While they might not intend to pose a threat, they may be able to corrupt memory or otherwise degrade shared resources (such as the file system or network interface).

5. “Mirage OS is a library operating system that constructs unikernels for secure, high-performance network applications across a variety of cloud computing and mobile platforms.” The home page is at http://openmirage.org/.

6. “Rump kernels provide free, portable, componentized, kernel quality drivers such as file systems, POSIX system call handlers, PCI device drivers, a SCSI protocol stack, virtio and a TCP/IP stack.” The home page is at http://rumpkernel.org/.

7. The author once worked as a security engineer at company produced satellite television set top boxes, where it was decided that it was necessary to stop using the bootloader TFTP code due to security concerns. This was the most visible and highest impact such decision, but there were plenty of others.

8. One such mechanism is Direct Anonymous Attestation. In looking at implementing this at work, the author found virtually no current information on organisations using this, and the only PKI software for this (termed a Privacy CA) was an incomplete code example in a book.