How Executables Run Inside Linux Containers: A High-Level Mental Model
This note gives a high-level but concrete explanation of how executables run inside Linux containers. It walks through the execution path from binary loading and linking, to libc and system calls, and finally to how the host kernel manages resources. Along the way it clarifies the roles of dynamic and static linking, explains why containers can use different distributions on the same host, and briefly shows how isolation is enforced using namespaces and cgroups rather than separate kernels.
A Linux container does not run its own operating system in the same way a virtual machine does. Instead, it runs a complete user space on top of the host’s Linux kernel. To understand how executables run inside containers, it is essential to separate the kernel, user space, and the binary interfaces that connect them.
At the bottom of the stack is the Linux kernel. The kernel is the only component allowed to directly manage hardware and global resources such as CPU time, memory, storage, networking, and devices. This authority is enforced by hardware privilege levels: user programs run in user mode, while the kernel runs in kernel mode. User programs cannot touch hardware or shared system state directly. Instead, they interact with the kernel through system calls. These system calls form a stable kernel ABI (Application Binary Interface) that precisely defines syscall numbers, argument passing conventions, and return values. The kernel does not know what language a program is written in, which Linux distribution provided it, or whether it runs inside a container. It only sees processes invoking syscalls.
Everything outside the kernel is user space. User space includes executables, shared libraries, language runtimes, background services, and utilities. A Linux distribution is essentially a particular user-space layout paired with the Linux kernel. Containers replace only the user space and reuse the host kernel. This is why a Debian container can run on a Fedora host: the Debian user space communicates with the Fedora kernel using the same Linux syscall ABI.
Linux executables are stored in ELF (Executable and Linkable Format) files. An ELF binary contains not only machine code, but also metadata describing how it should be loaded. For dynamically linked binaries, this metadata includes a reference to a dynamic linker, such as /lib64/ld-linux-x86-64.so.2. When a program is executed, the kernel loads this dynamic linker first, using the filesystem view of the process. In a container, that filesystem view comes from the container’s root filesystem, so both the dynamic linker and the shared libraries it loads come from the container image, not the host.
The dynamic linker then loads shared libraries like libc.so.6, resolves symbols, and applies relocations. Calls from the program to libc functions follow the CPU’s calling convention ABI: arguments are placed in specific registers or stack locations, return values are retrieved from defined registers, and stack alignment rules are obeyed. libc is compiled to expect exactly this ABI, so the program and libc interoperate mechanically. libc then translates higher-level operations into raw system calls. When a syscall instruction is executed, the CPU switches into kernel mode, the kernel validates the request, enforces permissions and limits, performs the operation, and returns the result. The kernel has no awareness of libc as a concept; it sees only register values and syscalls.
Static linking works differently. In a statically linked executable, libc and other required libraries are copied into the binary at link time. There is no dynamic linker involved at runtime, and no shared libraries need to be loaded. The kernel loads the executable and jumps directly to its entry point. All libc code is already present inside the binary. Despite this difference, the interaction with the kernel is identical: when the statically linked program needs kernel services, it still issues syscalls using the same kernel ABI. Static binaries are larger, but they are self-contained and common in minimal container images and embedded environments.
Each container brings its own user-space components: its own libc, dynamic linker (if dynamically linked), and user-space tools. These components invoke syscalls against the host kernel. As long as the CPU architecture matches and the kernel is sufficiently recent, this works transparently. The kernel does not differentiate between syscalls coming from a container or from the host’s user space. The container's filesystem layout provides these components under standard paths like /lib, /bin, or /usr, making execution proceed as if the container were a full system.
The container’s filesystem is assembled using a union filesystem such as OverlayFS, which merges read-only image layers with a writable layer unique to each running container. This merged view is dynamic: every file access checks the writable layer first and falls back through the lower image layers. When a program modifies a file, a new copy is created in the writable layer, leaving the underlying image unchanged. This allows many containers to run from the same image without affecting each other. Executables, libraries, configuration files, and other user-space resources are all resolved through this combined view, so programs inside containers experience what looks like a normal, private filesystem.
Container isolation is enforced by the host kernel using namespaces and control groups (cgroups), rather than by running separate kernels. Namespaces isolate a container’s view of global resources: the mount namespace provides a separate filesystem hierarchy, the PID namespace gives an independent process ID space, the network namespace isolates network interfaces and routing, and the user namespace remaps user and group IDs. Cgroups limit and account for resource usage such as CPU, memory, and I/O, and also control access to devices by filtering which major and minor device numbers a container’s processes can use.
Containers access hardware devices by interacting with the host’s kernel drivers through special device files like /dev/nvidia0 or /dev/ttyUSB0. These device files are made visible in the container through a bind mount. To actually permit access, the kernel also requires the container’s cgroup configuration to allow that device’s major and minor numbers, granting permission for operations like reading, writing, or issuing IO controls. User code or user-space drivers then use standard syscalls to communicate with the device through these files, while the kernel driver performs the actual hardware interaction.
Language runtimes such as the JVM, Python, or JavaScript engines are themselves native executables running in user space. They are compiled ahead of time for a specific CPU architecture and linked like any other program. User code is loaded as data and executed either by interpretation or by just-in-time (JIT) compilation. JIT compilation generates machine code at runtime by allocating executable memory, emitting instructions for the current architecture, and jumping to that code. This generated code must obey the same ABI rules as statically or dynamically compiled code, because it may call runtime helpers, libc functions, or issue syscalls.
At a high level, a Linux container is best understood as a packaged user space running under kernel-enforced isolation. Executables inside containers are normal Linux programs. They are loaded by the kernel, optionally linked by a dynamic linker, call into their own libc or embedded static libraries, and invoke the host kernel via syscalls. Namespaces determine what those programs can see, cgroups determine how many resources they may consume, and the kernel enforces all boundaries. The container boundary alters visibility and limits, not the fundamental mechanics of program execution.