Table of Contents
The Open Container Initiative develops specifications for standards on Operating System process and application containers.¶
To provide context for users the following section gives example use cases for each part of the spec.¶
Application bundle builders can create a bundle directory that includes all of the files required for launching an application as a container. The bundle contains an OCI configuration where the builder can specify host-independent details such as which executable to launch and host-specific settings such as Section 6.3, “Mounts”, Section 6.10, “Hooks”, Section 6.7.2, “Namespaces” and Section 6.7.5, “Control groups”. Because the configuration includes host-specific settings, application bundle directories copied between two hosts may require configuration adjustments.¶
Hook developers can extend the functionality of an OCI-compliant runtime by hooking into a container’s lifecycle with an external application. Example use cases include sophisticated network configuration, volume garbage collection, etc.¶
Runtime developers can build runtime implementations that run OCI-compliant bundles and container configuration, containing low-level OS and host specific details, on a particular platform.¶
In the specifications in the above table of contents, the keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119 (Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997).¶
The keywords "unspecified", "undefined", and "implementation-defined" are to be interpreted as described in the rationale for the C99 standard.¶
An implementation is not compliant for a given CPU architecture if it fails to satisfy one or more of the MUST, REQUIRED, or SHALL requirements for the protocols it implements. An implementation is compliant for a given CPU architecture if it satisfies all the MUST, REQUIRED, and SHALL requirements for the protocols it implements.¶
config.json
file in a bundle which defines the intended container and container process.
Protocols defined by this specification are:¶
This section defines a format for encoding a container as a filesystem bundle - a set of files organized in a certain way, and containing all the necessary data and metadata for any compliant runtime to perform all standard operations against it. See also OS X application bundles for a similar use of the term bundle.¶
The definition of a bundle is only concerned with how a container, and its configuration data, are stored on a local filesystem so that it can be consumed by a compliant runtime.¶
A Standard Container bundle contains all the information needed to load and run a container. This MUST include the following artifacts:¶
config.json
: contains configuration data.
This REQUIRED file MUST reside in the root of the bundle directory and MUST be named config.json
.
See Section 6, “Container Configuration” for more details.
rootfs
.
This directory MUST be referenced from within the config.json
file.
While these artifacts MUST all be present in a single directory on the local filesystem, that directory itself is not part of the bundle. In other words, a tar archive of a bundle will have these artifacts at the root of the archive, not nested within a top-level directory.¶
Barring access control concerns, the entity using a runtime to create a container MUST be able to use the operations defined in this specification against that same container. Whether other entities using the same, or other, instance of the runtime can see that container is out of scope of this specification.¶
The state of a container MUST include, at least, the following properties:¶
ociVersion
id
status
(string) is the runtime state of the container. The value MAY be one of:
Additional values MAY be defined by the runtime, however, they MUST be used to represent new runtime states not defined above.¶
pid
bundlePath
annotations
When serialized in JSON, the format MUST adhere to the following pattern:¶
{ "ociVersion": "0.2.0", "id": "oci-container1", "status": "running", "pid": 4422, "bundlePath": "/containers/redis", "annotations": { "myKey": "myValue" } }
See Section 5.1.4.1, “State” for information on retrieving the state of a container.¶
The lifecycle describes the timeline of events that happen from when a container is created to when it ceases to exist.¶
config.json
.
If the runtime is unable to create the environment specified in the configuration, it MUST generate an error.
While the resources requested in the configuration MUST be created, the user-specified code (from Section 6.4, “Process”) MUST NOT be run at this time.
Any updates to the configuration after this step MUST NOT affect the container.
In cases where the specified operation generates an error, this specification does not mandate how, or even if, that error is returned or exposed to the user of an implementation. Unless otherwise stated, generating an error MUST leave the state of the environment as if the operation were never attempted - modulo any possible trivial ancillary changes such as logging.¶
OCI compliant runtimes MUST support the following operations, unless the operation is not supported by the base operating system.¶
These operations are not specifying any command line APIs, and the parameters are inputs for general operations.
state <container-id>
¶
This operation MUST generate an error if it is not provided the ID of a container. Attempting to query a container that does not exist MUST generate an error. This operation MUST return the state of a container as specified in Section 5.1.1.1, “State”.¶
create <container-id> <path-to-bundle>
¶
This operation MUST generate an error if it is not provided a path to the bundle and the container ID to associate with the container.
If the ID provided is not unique across all containers within the scope of the runtime, or is not valid in any other way, the implementation MUST generate an error and a new container MUST NOT be created.
Using the data in config.json
, this operation MUST create a new container.
This means that all of the resources associated with the container MUST be created, however, the user-specified code MUST NOT be run at this time.
If the runtime cannot create the container as specified in the configuration, it MUST generate an error and a new container MUST NOT be created.¶
Upon successful completion of this operation the status property of this container MUST be created.¶
The runtime MAY validate config.json
against this spec, either generically or with respect to the local system capabilities, before creating the container (step 2).
Runtime callers who are interested in pre-create validation can run bundle-validation tools before invoking the create operation.¶
Any changes made to the configuration after this operation will not have an effect on the container.¶
start <container-id>
¶
This operation MUST generate an error if it is not provided the container ID. Attempting to start a container that does not exist MUST generate an error. Attempting to start an already started container MUST have no effect on the container and MUST generate an error. This operation MUST run the user-specified code as specified by Section 6.4, “Process”.¶
Upon successful completion of this operation the status property of this container MUST be running.¶
kill <container-id> <signal>
¶
This operation MUST generate an error if it is not provided the container ID. Attempting to send a signal to a container that is not running MUST have no effect on the container and MUST generate an error. This operation MUST send the specified signal to the process in the container.¶
When the process in the container is stopped, irrespective of it being as a result of a kill
operation or any other reason, the status property of this container MUST be stopped.¶
delete <container-id>
¶
This operation MUST generate an error if it is not provided the container ID. Attempting to delete a container that does not exist MUST generate an error. Attempting to delete a container whose process is still running MUST generate an error. Deleting a container MUST delete the resources that were created during step 2. Note that resources associated with the container, but not created by this container, MUST NOT be deleted. Once a container is deleted its ID MAY be used by a subsequent container.¶
Many of the operations specified in this specification have "hooks" that allow for additional actions to be taken before or after each operation. See Section 6.10, “Hooks” for more information.¶
By default, only the stdin
, stdout
and stderr
file descriptors are kept open for the application by the runtime.
The runtime MAY pass additional file descriptors to the application to support features such as socket activation.
Some of the file descriptors MAY be redirected to /dev/null
even though they are open.¶
After the container has /proc
mounted, the following standard symlinks MUST be setup within /dev/
for the IO.¶
Table 1. Required symbolic links
Source | Destination |
---|---|
|
|
|
|
|
|
|
|
The configuration contains metadata necessary to implement standard operations against the container. This includes the process to run, environment variables to inject, sandboxing features to use, etc.¶
The canonical schema is defined in this document, but there is a JSON Schema and Go bindings.¶
ociVersion
{ "ociVersion": "1.0.0-rc2", … }
root
The following properties can be specified:¶
path
/
) or a relative path (not starting with /
), which is relative to the bundle.
For example (Linux), with a bundle at /to/bundle
and a root filesystem at /to/bundle/rootfs
, the path
value can be either /to/bundle/rootfs
or rootfs
.
A directory MUST exist at the path declared by the field.
readonly
{ "root": { "path": "rootfs", "readonly": true }, … }
mounts
mount(2)
system call.
For Solaris, the mounts corresponds to fs resource in zonecfg(8).
Entries have the following properties:¶
destination
c:\\foo
and c:\\foo\\bar
).
For the Solaris operating system, this corresponds to "dir" of the fs resource in zonecfg(8).
type
source
\\?\Volume\{GUID}\
(on Windows source is called target).
Solaris: corresponds to "special" of the fs resource in zonecfg(8).
options
mount(8)
.
Solaris: corresponds to "options" of the fs resource in zonecfg(8).
Linux Example.
{ "mounts": [ { "destination": "/tmp", "type": "tmpfs", "source": "tmpfs", "options": ["nosuid","strictatime","mode=755","size=65536k"] }, { "destination": "/data", "type": "bind", "source": "/volumes/testing", "options": ["rbind","rw"] } ], … }
{ "mounts": [ "myfancymountpoint": { "destination": "C:\\Users\\crosbymichael\\My Fancy Mount Point\\", "type": "ntfs", "source": "\\\\?\\Volume\\{2eca078d-5cbc-43d3-aff8-7e8511f60d0e}\\", "options": [] } ], … }
See links for details about mountvol and SetVolumeMountPoint in Windows.¶
{ "mounts": [ { "destination": "/opt/local", "type": "lofs", "source": "/usr/local", "options": ["ro","nodevices"] }, { "destination": "/opt/sfw", "type": "lofs", "source": "/opt/sfw" } ], … }
process
The process
schema has the following properties:¶
terminal
consoleSize
(object, OPTIONAL) specifies the console size of the terminal if attached, containing the following properties:
cwd
env
_
as outlined in IEEE Std 1003.1-2001.
args
$PATH
is interpreted to find the executable.
For Linux-based systems the process structure supports the following process specific fields:¶
capabilities
rlimits
soft
limit for a resource while the hard
limit acts as a ceiling for that value that could be set by an unprivileged process.
Valid values for the type
field are the resources defined in the man page.
apparmorProfile
selinuxLabel
noNewPrivileges
noNewPrivileges
to true prevents the processes in the container from gaining additional privileges.
The kernel doc has more information on how this is achieved using a prctl
system call.
The user for the process is a platform-specific structure that allows specific control over which user the process runs as.¶
For Linux and Solaris based systems the user structure has the following fields:¶
uid
gid
additionalGids
Symbolic name for uid
and gid
, such as uname
and gname
respectively, are left to upper levels to derive (i.e. /etc/passwd
parsing, NSS, etc.).
For Solaris, uid
and gid
specify the uid
and gid
of the process inside the container and need not be same as in the host.
{ "process": { "terminal": true, "consoleSize": { "height": 25, "width": 80 }, "user": { "uid": 1, "gid": 1, "additionalGids": [5, 6] }, "env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm" ], "cwd": "/root", "args": [ "sh" ], "apparmorProfile": "acme_secure_profile", "selinuxLabel": "system_u:system_r:svirt_lxc_net_t:s0:c124,c675", "noNewPrivileges": true, "capabilities": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_BIND_SERVICE" ], "rlimits": [ { "type": "RLIMIT_NOFILE", "hard": 1024, "soft": 1024 } ] }, … }
{ "process": { "terminal": true, "consoleSize": { "height": 25, "width": 80 }, "user": { "uid": 1, "gid": 1, "additionalGids": [2, 8] }, "env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm" ], "cwd": "/root", "args": [ "/usr/bin/bash" ] }, … }
For Windows based systems the user structure has the following fields:¶
username
{ "process": { "terminal": true, "user": { "username": "containeradministrator" }, "env": [ "VARIABLE=1" ], "cwd": "c:\\foo", "args": [ "someapp.exe", ] }, … }
hostname
{ "hostname": "mrsdalloway", … }
platform
The following properties can be specified:¶
os
os
.
Bundles SHOULD use, and runtimes SHOULD understand, os
entries listed in the Go Language document for $GOOS
.
If an operating system is not included in the $GOOS
documentation, it SHOULD be submitted to this specification for standardization.
arch
arch
.
Values for arch
SHOULD use, and runtimes SHOULD understand, arch
entries listed in the Go Language document for $GOARCH
.
If an architecture is not included in the $GOARCH
documentation, it SHOULD be submitted to this specification for standardization.
{ "platform": { "os": "linux", "arch": "amd64" }, … }
platform.os is used to lookup further platform-specific configuration.¶
linux
platform.os
is linux
.
solaris
platform.os
is solaris
.
windows
platform.os
is windows
.
{ "platform": { "os": "linux", "arch": "amd64" }, "linux": { "namespaces": [ { "type": "pid" } ] }, … }
The Linux container specification uses various kernel features like namespaces, cgroups, capabilities, LSM, and filesystem jails to fulfill the spec.¶
The Linux ABI includes both syscalls and several special file paths. Applications expecting a Linux environment will very likely expect these file paths to be setup correctly.¶
The following filesystems MUST be made available in each application’s filesystem:¶
A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes. For more information, see the man page.¶
namespaces
Entries have the following properties:¶
type
(string, REQUIRED) - namespace type. The following namespace types are supported:
pid
network
mount
ipc
uts
user
cgroup
path
If a path is specified, that particular file is used to join that type of namespace.
If a namespace type is not specified in the namespaces
array, the container MUST inherit the runtime namespace of that type.
If a new namespace is not created (because the namespace type is not listed, or because it is listed with a path
), runtimes MUST assume that the setup for that namespace has already been done and error out if the config specifies anything else related to that namespace.
If a namespaces
field contains duplicated namespaces with same type
, the runtime MUST error out.¶
Example.
{ "linux": { "namespaces": [ { "type": "pid", "path": "/proc/1234/ns/pid" }, { "type": "network", "path": "/var/run/netns/neta" }, { "type": "mount" }, { "type": "ipc" }, { "type": "uts" }, { "type": "user" }, { "type": "cgroup" } ], }, … }
uidMappings
gidMappings
Each entry has the following structure:¶
hostID
containerID
size
The runtime SHOULD NOT modify the ownership of referenced filesystems to realize the mapping. There is a limit of 5 mappings which is the Linux kernel hard limit.¶
{ "linux": { "namespaces": [ { "type": "user" } ], "uidMappings": [ { "hostID": 1000, "containerID": 0, "size": 32000 } ], "gidMappings": [ { "hostID": 1000, "containerID": 0, "size": 32000 } ] }, … }
devices
Each entry has the following structure:¶
type
c
, b
, u
or p
.
More info in mknod(1).
path
major
,
minor
p
) - major, minor numbers for the device.
fileMode
uid
gid
{ "linux": { "namespaces": [ { "type": "mount" }, ], "devices": [ { "path": "/dev/fuse", "type": "c", "major": 10, "minor": 229, "fileMode": 438, "uid": 0, "gid": 0 }, { "path": "/dev/sda", "type": "b", "major": 8, "minor": 0, "fileMode": 432, "uid": 0, "gid": 0 } ], } … }
In addition to any devices configured with this setting, the runtime MUST also supply:¶
/dev/null
.
/dev/zero
.
/dev/full
.
/dev/random
.
/dev/urandom
.
/dev/tty
.
/dev/console
is setup if terminal is enabled in the config by bind mounting the pseudoterminal slave to /dev/console
.
/dev/ptmx
.
A bind-mount or symlink of the container’s /dev/pts/ptmx
.
Also known as cgroups, they are used to restrict resource usage for a container and handle device access. cgroups provide controls (through controllers) to restrict cpu, memory, IO, pids and network for the container. For more information, see the kernel cgroups documentation.¶
The path to the cgroups can be specified in the Spec via cgroupsPath
.
cgroupsPath
can be used to either control the cgroup hierarchy for containers or to run a new process in an existing container.
If cgroupsPath
is:¶
/
), the runtime MUST take the path to be relative to the cgroup mount point.
/
), the runtime MAY interpret the path relative to a runtime-determined location in the cgroup hierarchy.
Runtimes MAY consider certain cgroupsPath
values to be invalid, and MUST generate an error if this is the case.
If a cgroupsPath
value is specified, the runtime MUST consistently attach to the same place in the cgroup hierarchy given the same value of cgroupsPath
.¶
Implementations of the Spec can choose to name cgroups in any manner. The Spec does not include naming schema for cgroups. The Spec does not support per-controller paths for the reasons discussed in the cgroupv2 documentation. The cgroups will be created if they don’t exist.¶
You can configure a container’s cgroups via the resources
field of the Linux configuration.
Do not specify resources
unless limits have to be updated.
For example, to run a new process in an existing container without updating limits, resources
need not be specified.¶
A runtime MUST at least use the minimum set of cgroup controllers required to fulfill the resources
settings.
However, a runtime MAY attach the container process to additional cgroup controllers supported by the system.¶
{ "linux": { "cgroupsPath": "/myRuntime/myContainer", "resources": { "memory": { "limit": 100000, "reservation": 200000 }, "devices": [ { "allow": false, "access": "rwm" } ] } }, … }
devices
Each entry has the following structure:¶
allow
type
a
(all), c
(char), or b
(block).
null
or unset values mean "all", mapping to a
.
major
,
minor
null
or unset values mean "all", mapping to *
in the filesystem API.
access
r
(read), w
(write), and m
(mknod).
{ "linux": { "resources": { "devices": [ { "allow": false, "access": "rwm" }, { "allow": true, "type": "c", "major": 10, "minor": 229, "access": "rw" }, { "allow": true, "type": "b", "major": 8, "minor": 0, "access": "r" } ] } }, … }
disableOOMKiller
contains a boolean (true
or false
) that enables or disables the Out of Memory killer for a cgroup.
If enabled (false
), tasks that attempt to consume more memory than they are allowed are immediately killed by the OOM killer.
The OOM killer is enabled by default in every cgroup using the memory
subsystem.
To disable it, specify a value of true
.
For more information, see the memory cgroup man page.¶
disableOOMKiller
{ "linux": { "resources": { "disableOOMKiller": false } }, … }
oomScoreAdj
sets heuristic regarding how the process is evaluated by the kernel during memory pressure.
For more information, see the proc filesystem documentation section 3.1.
This is a kernel/system level setting, where as disableOOMKiller
is scoped for a memory cgroup.
For more information on how these two settings work together, see the memory cgroup documentation section 10. OOM Contol.¶
oomScoreAdj
{ "linux": { "resources": { "oomScoreAdj": 100 }, }, … }
memory
memory
and it’s used to set limits on the container’s memory usage.
For more information, see the memory cgroup man page.
The following parameters can be specified to setup the controller:¶
limit
reservation
swap
kernel
kernelTCP
swappiness
{ "linux": { "resources": { "memory": { "limit": 536870912, "reservation": 536870912, "swap": 536870912, "kernel": 0, "kernelTCP": 0, "swappiness": 0 } } }, … }
cpu
cpu
and cpusets
.
For more information, see the cpusets cgroup man page.
The following parameters can be specified to setup the controller:¶
shares
quota
period
realtimeRuntime
realtimePeriod
cpus
mems
{ "linux": { "resources": { "cpu": { "shares": 1024, "quota": 1000000, "period": 500000, "realtimeRuntime": 950000, "realtimePeriod": 1000000, "cpus": "2-3", "mems": "0-7" } } }, … }
blockIO
blkio
which implements the block IO controller.
For more information, see the kernel cgroups documentation about blkio.
The following parameters can be specified to setup the controller:¶
blkioWeight
blkioLeafWeight
blkioWeight
for the purpose of deciding how much weight tasks in the given cgroup has while competing with the cgroup’s child cgroups.
The range is from 10 to 1000.
blkioWeightDevice
(array, OPTIONAL) - specifies the list of devices which will be bandwidth rate limited. The following parameters can be specified per-device:
major
,
minor
weight
leafWeight
You must specify at least one of weight
or leafWeight
in a given entry, and can specify both.¶
blkioThrottleReadBpsDevice
,
blkioThrottleWriteBpsDevice
,
blkioThrottleReadIOPSDevice
,
blkioThrottleWriteIOPSDevice
(array, OPTIONAL) - specify the list of devices which will be IO rate limited. The following parameters can be specified per-device:
major
,
minor
rate
{ "linux": { "resources": { "blockIO": { "blkioWeight": 10, "blkioLeafWeight": 10, "blkioWeightDevice": [ { "major": 8, "minor": 0, "weight": 500, "leafWeight": 300 }, { "major": 8, "minor": 16, "weight": 500 } ], "blkioThrottleReadBpsDevice": [ { "major": 8, "minor": 0, "rate": 600 } ], "blkioThrottleWriteIOPSDevice": [ { "major": 8, "minor": 16, "rate": 300 } ] } } }, … }
hugepageLimits
hugetlb
controller which allows to limit the HugeTLB usage per control group and enforces the controller limit during page fault.
For more information, see the kernel cgroups documentation about HugeTLB.
Each entry has the following structure:¶
pageSize
limit
{ "linux": { "resources": { "hugepageLimits": [ { "pageSize": "2MB", "limit": 9223372036854771712 } ] } }, … }
network
net_cls
and net_prio
.
For more information, see the net_cls
cgroup man page and the net_prio
cgroup man page.
The following parameters can be specified to setup the controller:¶
classID
priorities
(array, OPTIONAL) - specifies a list of objects of the priorities assigned to traffic originating from processes in the group and egressing the system on various interfaces. The following parameters can be specified per-priority:
name
priority
{ "linux": { "resources": { "network": { "classID": 1048577, "priorities": [ { "name": "eth0", "priority": 500 }, { "name": "eth1", "priority": 1000 } ] } } }, … }
pids
pids
.
For more information, see the pids cgroup man page.
The following parameters can be specified to setup the controller:¶
limit
{ "linux": { "resources": { "pids": { "limit": 32771 } } }, … }
sysctl
{ "linux": { "sysctl": { "net.ipv4.ip_forward": "1", "net.core.somaxconn": "256" } }, … }
Seccomp provides application sandboxing mechanism in the Linux kernel.
Seccomp configuration allows one to configure actions to take for matched syscalls and furthermore also allows matching on values passed as arguments to syscalls.
For more information about Seccomp, see Seccomp kernel documentation.
The actions, architectures, and operators are strings that match the definitions in seccomp.h
from libseccomp and are translated to corresponding values.
A valid list of constants as of libseccomp v2.3.0 is shown below.¶
Architecture Constants:¶
SCMP_ARCH_X86
SCMP_ARCH_X86_64
SCMP_ARCH_X32
SCMP_ARCH_ARM
SCMP_ARCH_AARCH64
SCMP_ARCH_MIPS
SCMP_ARCH_MIPS64
SCMP_ARCH_MIPS64N32
SCMP_ARCH_MIPSEL
SCMP_ARCH_MIPSEL64
SCMP_ARCH_MIPSEL64N32
SCMP_ARCH_PPC
SCMP_ARCH_PPC64
SCMP_ARCH_PPC64LE
SCMP_ARCH_S390
SCMP_ARCH_S390X
Action Constants:¶
SCMP_ACT_KILL
SCMP_ACT_TRAP
SCMP_ACT_ERRNO
SCMP_ACT_TRACE
SCMP_ACT_ALLOW
Operator Constants:¶
SCMP_CMP_NE
SCMP_CMP_LT
SCMP_CMP_LE
SCMP_CMP_EQ
SCMP_CMP_GE
SCMP_CMP_GT
SCMP_CMP_MASKED_EQ
{ "linux": { "seccomp": { "defaultAction": "SCMP_ACT_ALLOW", "architectures": [ "SCMP_ARCH_X86" ], "syscalls": [ { "name": "getcwd", "action": "SCMP_ACT_ERRNO" } ] } }, … }
rootfsPropagation
slave
, private
, or shared
.
The kernel doc has more information about mount propagation.
{ "linux": { "rootfsPropagation": "slave" }, … }
maskedPaths
{ "linux": { "maskedPaths": [ "/proc/kcore" ] }, … }
readonlyPaths
{ "linux": { "readonlyPaths": [ "/proc/sys" ] }, … }
Solaris application containers can be configured using the following properties, all of the below properties have mappings to properties specified under zonecfg(8) man page, except milestone.¶
The SMF(Service Management Facility) FMRI which should go to "online" state before we start the desired process within the container.¶
milestone
{ "solaris": { "milestone": "svc:/milestone/container:default" }, … }
The maximum set of privileges any process in this container can obtain.
The property should consist of a comma-separated privilege set specification as described in priv_str_to_set(3C)
man page for the respective release of Solaris.¶
limitpriv
{ "solaris": { "limitpriv": "default" }, … }
The maximum amount of shared memory allowed for this application container. A scale (K, M, G, T) can be applied to the value for each of these numbers (for example, 1M is one megabyte). Mapped to max-shm-memory in zonecfg(8) man page.¶
maxShmMemory
{ "solaris": { "maxShmMemory": "512m" }, … }
Sets a limit on the amount of CPU time that can be used by a container. The unit used translates to the percentage of a single CPU that can be used by all user threads in a container, expressed as a fraction (for example, .75) or a mixed number (whole number and fraction, for example, 1.25). An ncpu value of 1 means 100% of a CPU, a value of 1.25 means 125%, .75 mean 75%, and so forth. When projects within a capped container have their own caps, the minimum value takes precedence. cappedCPU is mapped to capped-cpu in zonecfg(8) man page.¶
ncpus
{ "solaris": { "cappedCPU": { "ncpus": "8" } }, … }
The physical and swap caps on the memory that can be used by this application container. A scale (K, M, G, T) can be applied to the value for each of these numbers (for example, 1M is one megabyte). cappedMemory is mapped to capped-memory in zonecfg(8) man page.¶
physical
swap
{ "solaris": { "cappedMemory": { "physical": "512m", "swap": "512m" } }, … }
anet is specified as an array that is used to setup networking for Solaris application containers. The anet resource represents the automatic creation of a network resource for an application container. The zones administration daemon, zoneadmd, is the primary process for managing the container’s virtual platform. One of the daemons is responsibilities is creation and teardown of the networks for the container. For more information on the daemon check the zoneadmd(1M) man page. When such a container is started, a temporary VNIC(Virtual NIC) is automatically created for the container. The VNIC is deleted when the container is torn down. The following properties can be used to setup automatic networks. For additional information on properties check zonecfg(8) man page for the respective release of Solaris.¶
linkname
lowerLink
allowedAddress
configureAllowedAddress
defrouter
macAddress
linkProtection
{ "solaris": { "anet": [ { "allowedAddress": "172.17.0.2/16", "configureAllowedAddress": "true", "defrouter": "172.17.0.1/16", "linkProtection": "mac-nospoof, ip-nospoof", "linkname": "net0", "lowerLink": "net2", "macAddress": "02:42:f8:52:c7:16" } ] }, … }
The Windows container specification uses APIs provided by the Windows Host Compute Service (HCS) to fulfill the spec.¶
You can configure a container’s resource limits via the OPTIONAL resources
field of the Windows configuration.¶
memory
is an OPTIONAL configuration for the container’s memory usage.¶
The following parameters can be specified:¶
limit
reservation
{ "windows": { "resources": { "memory": { "limit": 2097152, "reservation": 524288 } } }, … }
cpu
is an OPTIONAL configuration for the container’s CPU usage.¶
The following parameters can be specified:¶
count
shares
percent
{ "windows": { "resources": { "cpu": { "percent": 50 } } }, … }
storage
is an OPTIONAL configuration for the container’s storage usage.¶
The following parameters can be specified:¶
iops
bps
sandboxSize
{ "windows": { "resources": { "storage": { "iops": 50 } } }, … }
network
is an OPTIONAL configuration for the container’s network usage.¶
The following parameters can be specified:¶
egressBandwidth
{ "windows": { "resources": { "network": { "egressBandwidth": 1048577 } } }, … }
hooks
Prestart
, Poststart
and Poststop
.
The following properties can be specified:¶
Hooks allow one to run code before/after various lifecycle events of the container. Hooks MUST be called in the listed order. The state of the container is passed to the hooks over stdin, so the hooks could get the information they need to do their work.¶
Hook paths are absolute and are executed from the host’s filesystem in the runtime namespace.¶
The pre-start hooks are called after the container process is spawned, but before the user supplied command is executed. They are called after the container namespaces are created on Linux, so they provide an opportunity to customize the container. In Linux, for e.g., the network namespace could be configured in this hook.¶
If a hook returns a non-zero exit code, then an error including the exit code and the stderr is returned to the caller and the container is torn down.¶
The post-start hooks are called after the user process is started. For example this hook can notify user that real process is spawned.¶
If a hook returns a non-zero exit code, then an error is logged and the remaining hooks are executed.¶
The post-stop hooks are called after the container process is stopped. Cleanup or debugging could be performed in such a hook. If a hook returns a non-zero exit code, then an error is logged and the remaining hooks are executed.¶
{ "hooks": { "prestart": [ { "path": "/usr/bin/fix-mounts", "args": ["fix-mounts", "arg1", "arg2"], "env": [ "key1=value1"] }, { "path": "/usr/bin/setup-network" } ], "poststart": [ { "path": "/usr/bin/notify-start", "timeout": 5 } ], "poststop": [ { "path": "/usr/sbin/cleanup.sh", "args": ["cleanup.sh", "-f"] } ] }, … }
path
is REQUIRED for a hook.
args
and env
are OPTIONAL.
timeout
is the number of seconds before aborting the hook.
The semantics are the same as Path
, Args
and Env
in golang Cmd.¶
annotations
com.example.myKey
.
Keys using the org.opencontainers
namespace are reserved and MUST NOT be used by subsequent specifications.
If there are no annotations then this property MAY either be absent or an empty map.
Implementations that are reading/processing this configuration file MUST NOT generate an error if they encounter an unknown annotation key.
{ "annotations": { "com.example.gpu-cores": "2" }, … }
Implementations that are reading/processing this configuration file MUST NOT generate an error if they encounter an unknown property. Instead they MUST ignore unknown properties.¶
Here is a full example config.json
for reference.¶
{ "ociVersion": "0.5.0-dev", "platform": { "os": "linux", "arch": "amd64" }, "process": { "terminal": true, "user": { "uid": 1, "gid": 1, "additionalGids": [ 5, 6 ] }, "args": [ "sh" ], "env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm" ], "cwd": "/", "capabilities": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_BIND_SERVICE" ], "rlimits": [ { "type": "RLIMIT_CORE", "hard": 1024, "soft": 1024 }, { "type": "RLIMIT_NOFILE", "hard": 1024, "soft": 1024 } ], "apparmorProfile": "acme_secure_profile", "selinuxLabel": "system_u:system_r:svirt_lxc_net_t:s0:c124,c675", "noNewPrivileges": true }, "root": { "path": "rootfs", "readonly": true }, "hostname": "slartibartfast", "mounts": [ { "destination": "/proc", "type": "proc", "source": "proc" }, { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] }, { "destination": "/dev/pts", "type": "devpts", "source": "devpts", "options": [ "nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5" ] }, { "destination": "/dev/shm", "type": "tmpfs", "source": "shm", "options": [ "nosuid", "noexec", "nodev", "mode=1777", "size=65536k" ] }, { "destination": "/dev/mqueue", "type": "mqueue", "source": "mqueue", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/sys", "type": "sysfs", "source": "sysfs", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/sys/fs/cgroup", "type": "cgroup", "source": "cgroup", "options": [ "nosuid", "noexec", "nodev", "relatime", "ro" ] } ], "hooks": { "prestart": [ { "path": "/usr/bin/fix-mounts", "args": [ "fix-mounts", "arg1", "arg2" ], "env": [ "key1=value1" ] }, { "path": "/usr/bin/setup-network" } ], "poststart": [ { "path": "/usr/bin/notify-start", "timeout": 5 } ], "poststop": [ { "path": "/usr/sbin/cleanup.sh", "args": [ "cleanup.sh", "-f" ] } ] }, "linux": { "devices": [ { "path": "/dev/fuse", "type": "c", "major": 10, "minor": 229, "fileMode": 438, "uid": 0, "gid": 0 }, { "path": "/dev/sda", "type": "b", "major": 8, "minor": 0, "fileMode": 432, "uid": 0, "gid": 0 } ], "uidMappings": [ { "hostID": 1000, "containerID": 0, "size": 32000 } ], "gidMappings": [ { "hostID": 1000, "containerID": 0, "size": 32000 } ], "sysctl": { "net.ipv4.ip_forward": "1", "net.core.somaxconn": "256" }, "cgroupsPath": "/myRuntime/myContainer", "resources": { "network": { "classID": 1048577, "priorities": [ { "name": "eth0", "priority": 500 }, { "name": "eth1", "priority": 1000 } ] }, "pids": { "limit": 32771 }, "hugepageLimits": [ { "pageSize": "2MB", "limit": 9223372036854772000 } ], "oomScoreAdj": 100, "memory": { "limit": 536870912, "reservation": 536870912, "swap": 536870912, "kernel": 0, "kernelTCP": 0, "swappiness": 0 }, "cpu": { "shares": 1024, "quota": 1000000, "period": 500000, "realtimeRuntime": 950000, "realtimePeriod": 1000000, "cpus": "2-3", "mems": "0-7" }, "disableOOMKiller": false, "devices": [ { "allow": false, "access": "rwm" }, { "allow": true, "type": "c", "major": 10, "minor": 229, "access": "rw" }, { "allow": true, "type": "b", "major": 8, "minor": 0, "access": "r" } ], "blockIO": { "blkioWeight": 10, "blkioLeafWeight": 10, "blkioWeightDevice": [ { "major": 8, "minor": 0, "weight": 500, "leafWeight": 300 }, { "major": 8, "minor": 16, "weight": 500 } ], "blkioThrottleReadBpsDevice": [ { "major": 8, "minor": 0, "rate": 600 } ], "blkioThrottleWriteIOPSDevice": [ { "major": 8, "minor": 16, "rate": 300 } ] } }, "rootfsPropagation": "slave", "seccomp": { "defaultAction": "SCMP_ACT_ALLOW", "architectures": [ "SCMP_ARCH_X86" ], "syscalls": [ { "name": "getcwd", "action": "SCMP_ACT_ERRNO" } ] }, "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" }, { "type": "user" }, { "type": "cgroup" } ], "maskedPaths": [ "/proc/kcore", "/proc/latency_stats", "/proc/timer_stats", "/proc/sched_debug" ], "readonlyPaths": [ "/proc/asound", "/proc/bus", "/proc/fs", "/proc/irq", "/proc/sys", "/proc/sysrq-trigger" ], "mountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c715,c811" }, "annotations": { "com.example.key1": "value1", "com.example.key2": "value2" } }