OS Fundamentals
An operating system is the layer between hardware and the applications you use. It manages resources, enforces security, and provides the environment where everything else runs.
What Is an Operating System?
The OS handles process scheduling, memory allocation, file system management, and hardware communication so that applications don't have to. Whether you're working with Linux servers in the cloud or managing containers on Kubernetes, the OS is always there underneath. Think of the OS as a manager that sits between the raw hardware -- CPU, RAM, disk, network cards -- and every application you run. Without it, each program would need to know how to talk directly to every piece of hardware, which would be impractical and dangerous.
Why It Matters
Understanding OS fundamentals helps you diagnose why a process is consuming too much memory, why a file can't be accessed, or why a container behaves differently from a VM. These are daily realities in cloud and infrastructure roles.
When you deploy a web server, troubleshoot a failing container, or configure permissions on a shared file system, you are interacting with the operating system whether you realize it or not. The concepts in this section will come up again and again as you move through Linux, Networking, and eventually Containers.
What You'll Learn
- What an operating system does and why it exists
- Processes and process management
- Memory management and virtual memory
- File systems and directory structures
- Users, groups, and permission models
- How the kernel interacts with hardware
- Init systems and service management
The Role of an Operating System
At the highest level, an operating system provides abstraction. It hides the complexity of hardware and presents a consistent interface that applications can use. A program does not need to know whether it is writing to an SSD or a spinning hard drive -- it simply asks the OS to write data, and the OS handles the rest.
The OS also provides isolation. One program cannot accidentally (or intentionally) read another program's memory. One user cannot delete another user's files without permission. This isolation is the foundation of system security.
Finally, the OS provides resource management. When dozens of processes all want CPU time, the OS decides who runs and for how long. When memory is running low, the OS decides what stays in RAM and what gets moved to disk. These decisions happen thousands of times per second, invisibly.
The following diagram shows the layered architecture of a typical operating system:
Each layer only communicates with the layer directly above or below it. Applications talk to system libraries. System libraries make system calls into the kernel. The kernel talks to hardware through device drivers. This layered design keeps things organized and secure.
The Kernel
The kernel is the core of the operating system. It is the first program loaded when the computer starts, and it runs for the entire time the machine is powered on. Everything else -- every application, every service, every shell command -- depends on the kernel.
Kernel Space vs User Space
Modern operating systems divide memory into two regions:
- Kernel space: Where the kernel and its modules run. Code here has unrestricted access to hardware and all of memory. A bug in kernel space can crash the entire system.
- User space: Where all applications run. Code here cannot access hardware directly or read another process's memory. It must ask the kernel for help through system calls.
This separation is enforced by the CPU itself. The processor has different privilege levels (often called "rings"), and it will refuse to execute privileged instructions when running in user mode.
System Calls
A system call (or "syscall") is the mechanism by which a user-space program requests a service from the kernel. When a program needs to read a file, allocate memory, or send data over the network, it makes a system call.
Common system calls include:
| System Call | Purpose |
|---|---|
open() | Open a file |
read() | Read data from a file or device |
write() | Write data to a file or device |
fork() | Create a new process |
exec() | Replace a process with a new program |
exit() | Terminate a process |
mmap() | Map memory |
You rarely call these directly. Instead, you use higher-level commands and libraries that wrap them. When you run cat /etc/hostname, the cat program internally calls open(), read(), and write() to open the file, read its contents, and print them to your terminal.
Monolithic vs Microkernel
There are two main kernel design philosophies:
- Monolithic kernels (like Linux) include most OS services -- file systems, device drivers, networking -- directly in the kernel. This is fast because there are fewer context switches, but a bug in any driver can crash the entire kernel.
- Microkernels (like Minix or QNX) keep the kernel minimal and run most services in user space. This is more stable and secure but can be slower due to the extra communication overhead.
Linux, the kernel you will work with most in cloud computing, is monolithic. However, it supports loadable kernel modules that can be inserted and removed at runtime, giving it some of the flexibility of a microkernel design.
Processes
A process is a running instance of a program. When you type a command in the terminal, the OS creates a process to execute it. When you start a web server, that is a process. Your shell itself is a process. A single program can spawn many processes.
Process Identifiers
Every process is assigned a unique Process ID (PID). The very first process started by the kernel is assigned PID 1 -- this is the init process (on modern Linux systems, this is systemd). Every other process descends from PID 1.
Each process also has a Parent Process ID (PPID) that identifies which process created it. This creates a tree structure:
systemd (PID 1)
├── sshd (PID 512)
│ └── bash (PID 1340)
│ └── vim (PID 1587)
├── nginx (PID 780)
│ ├── nginx worker (PID 781)
│ └── nginx worker (PID 782)
└── cron (PID 445)
Process States
A process is always in one of several states:
| State | Symbol | Description |
|---|---|---|
| Running | R | Actively executing on a CPU or waiting in the run queue |
| Sleeping | S | Waiting for an event (user input, network data, disk I/O) |
| Uninterruptible Sleep | D | Waiting for I/O that cannot be interrupted (often disk) |
| Stopped | T | Paused by a signal (e.g., Ctrl+Z) |
| Zombie | Z | Finished executing but still has an entry in the process table because its parent has not yet read its exit status |
Zombie processes are a common interview topic. They are not consuming CPU or memory, but they occupy a slot in the process table. A few zombies are harmless. Thousands of them indicate a parent process that is not properly cleaning up after its children.
Viewing Processes
The ps command shows a snapshot of current processes. The ps aux form is the most commonly used:
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 169316 13092 ? Ss Jan15 2:34 /usr/lib/systemd/systemd
root 512 0.0 0.0 15432 5648 ? Ss Jan15 0:02 /usr/sbin/sshd -D
www-data 780 0.0 0.2 55876 18204 ? S Jan15 1:12 nginx: worker process
clouduser 1340 0.0 0.0 22812 5364 pts/0 Ss 10:05 0:00 -bash
clouduser 1587 0.2 0.1 52432 10280 pts/0 S+ 10:32 0:01 vim config.yaml
Here is what each column means:
| Column | Meaning |
|---|---|
USER | The user who owns the process |
PID | Process ID |
%CPU | Percentage of CPU time the process is using |
%MEM | Percentage of physical memory the process is using |
VSZ | Virtual memory size in kilobytes |
RSS | Resident Set Size -- actual physical memory used, in kilobytes |
TTY | Terminal associated with the process (? means no terminal) |
STAT | Process state (see the state table above) |
START | When the process started |
TIME | Cumulative CPU time consumed |
COMMAND | The command that launched the process |
The top command provides a real-time, continuously updating view of processes sorted by resource usage:
$ top
top - 10:45:03 up 23 days, 4:12, 2 users, load average: 0.15, 0.10, 0.08
Tasks: 142 total, 1 running, 140 sleeping, 0 stopped, 1 zombie
%Cpu(s): 2.3 us, 0.8 sy, 0.0 ni, 96.5 id, 0.3 wa, 0.0 hi, 0.1 si
MiB Mem : 7963.2 total, 3241.5 free, 2104.8 used, 2616.9 buff/cache
MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 5412.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
780 www-data 20 0 55876 18204 12840 S 1.3 0.2 1:12.45 nginx
1587 clouduser 20 0 52432 10280 7648 S 0.7 0.1 0:01.23 vim
1 root 20 0 169316 13092 8456 S 0.0 0.2 2:34.56 systemd
The top header shows system-wide statistics: uptime, total tasks, CPU usage breakdown, and memory usage. Use q to exit top.
Try It: Open a terminal and run
ps aux. Find your shell process in the list. Note its PID, then runps aux | grep <PID>replacing<PID>with the number you found. You should see your shell and thegrepcommand itself.
Parent and Child Processes
When a process needs to launch another process, it uses fork() and exec():
- fork() creates an exact copy of the current process. The copy is called the child process and the original is the parent process. The child gets a new PID but inherits everything else -- open files, environment variables, current directory.
- exec() replaces the child's program with a new program. The PID stays the same, but the code running under that PID changes entirely.
This fork-and-exec model is how every command you type in a shell gets executed. Your shell forks itself, and the child exec's the command you typed.
Try It: Run
echo $$to see your shell's PID. Then runbashto start a child shell, and runecho $$again. Notice the PID changed. Runecho $PPIDto see the parent PID -- it should match the first PID. Typeexitto return to your original shell.
Identifying Zombie Processes
A zombie process has finished executing but its entry remains in the process table because its parent has not called wait() to read its exit status. Zombies show a Z in the STAT column of ps:
# Find zombie processes
$ ps aux | grep 'Z'
clouduser 2045 0.0 0.0 0 0 pts/0 Z 11:23 0:00 [defunct]
A few zombies are normal and harmless — they consume no CPU or memory, just a process table entry. However, if thousands of zombies accumulate, it means a parent process is not properly reaping its children. The fix is to address the parent process (fix the code, restart the service), not the zombies themselves. Killing a zombie with kill has no effect because the process is already dead.
Process Priorities
Not all processes are equally important. Linux uses nice values to influence scheduling priority. Nice values range from -20 (highest priority) to +19 (lowest priority). Regular users can only set nice values of 0 to +19. Only root can set negative values.
# Start a process with lower priority (nice value 10)
$ nice -n 10 ./long-running-backup.sh
# Change the nice value of a running process
$ renice -n 5 -p 1587
# View nice values in top (NI column)
$ top
Nice values matter when a system is under load. A CPU-intensive backup script with a high nice value will yield CPU time to more important processes like your web server.
Signals
Processes communicate through signals — simple notifications sent from one process to another (or from the kernel to a process). Signals are how you stop, pause, resume, and terminate processes.
| Signal | Number | Description | Default Action |
|---|---|---|---|
SIGTERM | 15 | Graceful termination request | Terminate (can be caught) |
SIGKILL | 9 | Forced termination | Terminate (cannot be caught) |
SIGINT | 2 | Interrupt from keyboard (Ctrl+C) | Terminate (can be caught) |
SIGHUP | 1 | Hangup — terminal closed or config reload | Terminate (can be caught) |
SIGSTOP | 19 | Pause process | Stop (cannot be caught) |
SIGCONT | 18 | Resume paused process | Continue |
SIGCHLD | 17 | Child process stopped or terminated | Ignored |
The distinction between SIGTERM and SIGKILL is critical:
kill <PID>sendsSIGTERM(signal 15) by default. This asks the process to shut down gracefully — it can save state, close connections, and clean up before exiting.kill -9 <PID>sendsSIGKILL(signal 9). This instantly terminates the process with no chance to clean up. Use this only whenSIGTERMfails.
Always try SIGTERM first. Jumping straight to kill -9 can leave temporary files, corrupt data, or leave network connections in a broken state.
# List all available signals
$ kill -l
# Send SIGTERM (graceful shutdown)
$ kill 1587
# Send SIGKILL (forced termination) — last resort
$ kill -9 1587
# Send SIGHUP (often used to reload configuration)
$ kill -HUP 780
Many services (like nginx and Apache) use SIGHUP as a signal to reload their configuration without a full restart. This is a convention, not a rule — each program decides how to handle each signal.
Try It: Run
kill -lto see all the signals your system supports. Start asleep 300process in the background withsleep 300 &, note the PID, and try sendingSIGTERMto it withkill <PID>. Then start anothersleep 300 &and sendSIGINTwithkill -2 <PID>.
File Descriptors and Resource Limits
Every open file, network connection, and pipe in a process is represented by a file descriptor (FD) — a small integer. Three file descriptors are opened automatically for every process:
| FD | Name | Description |
|---|---|---|
| 0 | stdin | Standard input (keyboard by default) |
| 1 | stdout | Standard output (terminal by default) |
| 2 | stderr | Standard error (terminal by default) |
When a process opens additional files or network connections, they get FD numbers starting from 3, 4, 5, and so on.
The operating system limits how many file descriptors a single process can have open. You can check and modify this limit:
# View current FD limit for your shell
$ ulimit -n
1024
# View all resource limits
$ ulimit -a
# See how many FDs a specific process is using
$ ls /proc/<PID>/fd | wc -l
Running out of file descriptors is a common cause of "too many open files" errors in busy servers. If a web server or database cannot open new connections, it is often because it has reached its FD limit. The fix is to increase the limit in /etc/security/limits.conf or the service's systemd unit file.
Memory Management
Every running process needs memory. The OS is responsible for allocating memory to processes, keeping them isolated from each other, and reclaiming memory when processes finish. Without memory management, a single misbehaving program could overwrite another program's data or crash the entire system.
Physical Memory vs Virtual Memory
Physical memory (RAM) is the actual hardware installed in the machine. It is fast but limited -- a typical cloud server might have 4 GB, 16 GB, or 64 GB of RAM.
Virtual memory is an abstraction. Each process thinks it has access to a large, contiguous block of memory starting at address zero. In reality, the OS maps these virtual addresses to scattered physical locations in RAM. This provides three key benefits:
- Isolation: Process A's virtual address 1000 maps to a completely different physical address than Process B's virtual address 1000. They cannot interfere with each other.
- Simplicity: Programs do not need to worry about where in physical RAM they are loaded. They always see the same virtual layout.
- Overcommitment: The OS can promise more virtual memory than physical RAM exists, using disk space (swap) as overflow.
The Page Table
The OS divides memory into fixed-size chunks called pages (typically 4 KB each). A data structure called the page table maps each virtual page to a physical page frame. When a process accesses a virtual address, the CPU's Memory Management Unit (MMU) translates it to a physical address using the page table.
If a process accesses a page that is not currently in physical RAM (perhaps it was moved to swap), the CPU triggers a page fault. The OS then loads the page from disk back into RAM. Frequent page faults slow the system dramatically -- this is called thrashing.
Stack vs Heap
Within a process's virtual memory, there are two primary regions for storing data:
| Feature | Stack | Heap |
|---|---|---|
| Purpose | Local variables, function call data | Dynamically allocated data |
| Allocation | Automatic (managed by the compiler) | Manual (managed by the programmer) |
| Speed | Very fast | Slower |
| Size | Small and fixed (typically 1-8 MB) | Large and flexible |
| Growth direction | Grows downward (toward lower addresses) | Grows upward (toward higher addresses) |
| Lifecycle | Freed automatically when function returns | Must be explicitly freed or garbage collected |
| Common errors | Stack overflow (too deep recursion) | Memory leaks (forgetting to free) |
When you call a function, its local variables go on the stack. When you allocate memory for a data structure whose size is not known until runtime, it goes on the heap. Understanding this distinction helps when debugging memory issues like stack overflows or memory leaks.
Swap Space
Swap space is a dedicated area on disk that acts as overflow for RAM. When physical memory fills up, the OS moves inactive pages from RAM to swap (this is called swapping out or paging out). If those pages are needed again, they are swapped back in.
Swap is much slower than RAM -- hundreds or thousands of times slower. It exists as a safety net, not a replacement for adequate RAM. If a server is heavily using swap, it is a strong indicator that it needs more physical memory.
Viewing Memory Usage
The free command shows memory usage at a glance:
$ free -h
total used free shared buff/cache available
Mem: 7.8Gi 2.1Gi 3.2Gi 256Mi 2.5Gi 5.3Gi
Swap: 2.0Gi 0B 2.0Gi
Here is what each column means:
| Column | Meaning |
|---|---|
total | Total installed memory |
used | Memory actively used by processes |
free | Memory not being used at all |
shared | Memory used by tmpfs and shared between processes |
buff/cache | Memory used by the kernel for disk caching (can be reclaimed if needed) |
available | Estimated memory available for starting new applications (includes reclaimable cache) |
The available column is the most useful one. It tells you how much memory the system can actually use for new work. The free column is misleading because Linux intentionally uses "free" memory for disk caching -- this is a good thing, not a problem.
Try It: Run
free -hon a Linux system. Compare thefreeandavailablecolumns. Available will almost always be larger than free because it includes reclaimable buffer and cache memory.
The OOM Killer
When a Linux system runs completely out of memory and swap, the kernel invokes the OOM (Out-Of-Memory) Killer. This is a last-resort mechanism that selects a process to forcefully terminate in order to free memory and keep the system alive.
The OOM killer chooses its victim based on a scoring system — processes using the most memory and with the least importance get killed first. You can see OOM events in the system log:
# Check for OOM kills in the system journal
$ journalctl -k | grep -i "out of memory"
# Check a specific process's OOM score (higher = more likely to be killed)
$ cat /proc/<PID>/oom_score
OOM kills are a sign that the system needs more memory or that a process has a memory leak. In cloud environments, containers often hit OOM kills when their memory limit is set too low. You will encounter this again in Containers.
Detailed Memory Information
For more detailed memory information than free provides, examine /proc/meminfo:
$ head -20 /proc/meminfo
MemTotal: 8134656 kB
MemFree: 3318784 kB
MemAvailable: 5542144 kB
Buffers: 262144 kB
Cached: 2359296 kB
SwapTotal: 2097152 kB
SwapFree: 2097152 kB
...
This virtual file is updated in real time by the kernel and provides the raw data that tools like free and top use for their output.
File Systems
A file system is the method an OS uses to organize and store data on disk. It defines how files are named, where they are stored physically, how permissions are tracked, and how free space is managed. Without a file system, a disk would just be a sea of raw bytes with no structure.
What a File System Does
A file system provides:
- Naming: Files have human-readable names organized in directories.
- Metadata: Each file has associated metadata -- owner, permissions, timestamps, size.
- Space management: The file system tracks which disk blocks are in use and which are free.
- Integrity: Modern file systems include journaling or copy-on-write mechanisms to prevent data corruption during crashes.
Common File System Types
| File System | Used By | Key Features |
|---|---|---|
| ext4 | Linux | Default for most Linux distributions. Journaling, up to 1 EB volume size. The file system you will encounter most in cloud computing. |
| XFS | Linux (RHEL/CentOS default) | High performance with large files. Common in enterprise environments and used by default in Red Hat-based systems. |
| Btrfs | Linux | Copy-on-write, snapshots, built-in RAID. Growing adoption for advanced use cases. |
| NTFS | Windows | Journaling, access control lists, encryption. The standard Windows file system. |
| APFS | macOS | Copy-on-write, encryption, snapshots. Default on modern Apple devices. |
| tmpfs | Linux (RAM-based) | Exists entirely in memory. Fast but volatile -- data is lost on reboot. Used for /tmp on many systems. |
For cloud and infrastructure work, you will primarily encounter ext4 and XFS on Linux servers.
The Linux Directory Hierarchy
Linux organizes all files into a single tree starting at the root directory (/). Even separate disks and network shares are mounted as branches of this tree. Here are the most important directories:
| Directory | Purpose |
|---|---|
/ | The root of the entire file system tree |
/home | Home directories for regular users (e.g., /home/clouduser) |
/root | Home directory for the root (admin) user |
/etc | System-wide configuration files (e.g., /etc/ssh/sshd_config) |
/var | Variable data -- logs (/var/log), mail, spool files |
/tmp | Temporary files, often cleared on reboot |
/usr | User programs and supporting files (/usr/bin, /usr/lib) |
/bin | Essential command binaries (ls, cp, cat) |
/sbin | Essential system binaries (mount, shutdown, iptables) |
/opt | Optional third-party software |
/dev | Device files representing hardware (/dev/sda, /dev/null) |
/proc | Virtual file system exposing kernel and process information |
/sys | Virtual file system exposing hardware and driver information |
/mnt | Temporary mount point for manually mounted file systems |
/boot | Kernel and bootloader files |
You do not need to memorize all of these now. As you work through the Linux section, you will interact with these directories regularly and they will become familiar.
Inodes
Every file on a Linux file system has an inode (index node). The inode stores all of the file's metadata -- owner, permissions, timestamps, size, and the list of disk blocks where the file's data lives. Notably, the inode does not store the file's name.
The file name is stored in the directory entry, which is simply a mapping from a name to an inode number. This is why:
- A file can have multiple names (hard links) pointing to the same inode.
- Renaming a file is nearly instant -- it only changes the directory entry, not the file's data.
- Deleting a file removes the directory entry. The data is only freed when no more names (links) point to the inode.
Disk Usage Commands
The df command shows how much space is available on each mounted file system:
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 18G 30G 38% /
/dev/sda2 200G 95G 95G 50% /home
tmpfs 3.9G 256M 3.7G 7% /tmp
The du command shows how much space a directory and its contents consume:
$ du -sh /var/log/*
1.2G /var/log/journal
45M /var/log/nginx
12M /var/log/syslog
4.0K /var/log/lastlog
Try It: Run
df -hto see your mounted file systems and their usage. Then pick a directory and rundu -sh /path/to/directoryto see how much space it uses. Try/var/logif you are on a Linux system.
Users, Groups, and Permissions
Operating systems are multi-user by design. Even a server that only one person ever logs into still has multiple user accounts -- the human administrator, the web server, the database, the logging daemon. Each runs under its own user account with its own permissions. This limits the damage any single compromised or buggy process can cause.
Users and the Root Account
Every user has:
- A username (e.g.,
clouduser) - A User ID (UID) -- a number that the kernel actually uses internally
- A home directory (usually
/home/username) - A default shell (e.g.,
/bin/bash)
The root user (UID 0) is the superuser. Root can read any file, kill any process, and modify any setting. Because of this power, you should rarely log in as root directly. Instead, use sudo to execute individual commands with root privileges. This creates an audit trail and reduces the risk of accidental damage.
Groups
A group is a collection of users. Groups simplify permission management. Instead of granting file access to each user individually, you add users to a group and grant access to the group.
Every user has a primary group (often the same name as the username) and can belong to additional supplementary groups. For example, a user might be in the docker group to run Docker commands, or the sudo group to use sudo.
File Permissions
Linux uses a permission model with three categories and three permission types:
Categories:
| Category | Applies To |
|---|---|
| Owner (u) | The user who owns the file |
| Group (g) | Members of the file's group |
| Other (o) | Everyone else on the system |
Permission types:
| Permission | Symbol | On a File | On a Directory |
|---|---|---|---|
| Read | r | View file contents | List directory contents |
| Write | w | Modify file contents | Create, delete, or rename files in the directory |
| Execute | x | Run the file as a program | Enter the directory (cd into it) |
Reading ls -l Output
The ls -l command shows detailed file information including permissions:
$ ls -l /etc/nginx/
total 24
drwxr-xr-x 2 root root 4096 Jan 15 09:30 conf.d
-rw-r--r-- 1 root root 1482 Jan 15 09:30 nginx.conf
-rw-r----- 1 root www-data 512 Jan 15 09:30 ssl-passwords.txt
-rwxr-xr-x 1 root root 8832 Jan 10 14:22 custom-script.sh
Let us decode the first entry, drwxr-xr-x:
d rwx r-x r-x
│ │ │ │
│ │ │ └── Other: read + execute (can list and enter)
│ │ └──────── Group: read + execute (can list and enter)
│ └─────────────── Owner: read + write + execute (full control)
└──────────────────── File type: d = directory
For the third entry, -rw-r-----:
- rw- r-- ---
│ │ │ │
│ │ │ └── Other: no permissions (cannot access at all)
│ │ └──────── Group (www-data): read only
│ └─────────────── Owner (root): read + write
└──────────────────── File type: - = regular file
This means only root can modify ssl-passwords.txt, members of the www-data group (the web server) can read it, and no one else can access it at all. This is a good security practice for sensitive files.
Numeric (Octal) Permissions
Permissions can also be expressed as numbers. Each permission has a value:
| Permission | Value |
|---|---|
| Read (r) | 4 |
| Write (w) | 2 |
| Execute (x) | 1 |
| None (-) | 0 |
You add the values together for each category. For example:
| Symbolic | Calculation | Numeric |
|---|---|---|
rwx | 4 + 2 + 1 | 7 |
rw- | 4 + 2 + 0 | 6 |
r-x | 4 + 0 + 1 | 5 |
r-- | 4 + 0 + 0 | 4 |
--- | 0 + 0 + 0 | 0 |
So rwxr-xr-x becomes 755, and rw-r----- becomes 640. You will use these numeric values constantly when setting permissions with chmod:
$ chmod 755 custom-script.sh # rwxr-xr-x
$ chmod 640 ssl-passwords.txt # rw-r-----
$ chmod 600 private-key.pem # rw-------
Try It: Create a file with
touch testfile.txt, then runls -l testfile.txtto see its default permissions. Usechmod 700 testfile.txtto give only yourself full access, and runls -lagain to see the change.
File Ownership
Every file has an owner (a user) and a group. You can change ownership with chown:
$ chown clouduser:developers project-plan.txt # Set owner and group
$ chown clouduser project-plan.txt # Set owner only
$ chown :developers project-plan.txt # Set group only
In cloud environments, getting ownership and permissions right is critical. A misconfigured permission can prevent a web server from reading its configuration, or worse, allow unauthorized users to read sensitive data like API keys or database credentials.
Init Systems and Services
When a Linux system boots, the kernel starts a single process -- the init process (PID 1). This process is responsible for starting everything else: network services, logging daemons, login prompts, and all the other programs that make the system functional.
What Init Does
The init system:
- Starts essential system services in the correct order (e.g., networking before web servers)
- Manages service dependencies (e.g., a database must be running before the application that uses it)
- Restarts services that crash
- Handles system shutdown, cleanly stopping services in reverse order
systemd
On nearly all modern Linux distributions, the init system is systemd. It replaced older init systems (SysVinit, Upstart) and has become the standard. Whether you are running Ubuntu, Fedora, Debian, CentOS, or Amazon Linux 2, you will interact with systemd.
The primary command for managing services with systemd is systemctl:
# Check the status of a service
$ systemctl status nginx
● nginx.service - A high performance web server
Loaded: loaded (/lib/systemd/system/nginx.service; enabled)
Active: active (running) since Wed 2025-01-15 09:30:00 UTC; 3 weeks ago
Main PID: 780 (nginx)
Tasks: 3 (limit: 4915)
Memory: 12.4M
CGroup: /system.slice/nginx.service
├─780 nginx: master process /usr/sbin/nginx
├─781 nginx: worker process
└─782 nginx: worker process
# Start a service
$ sudo systemctl start nginx
# Stop a service
$ sudo systemctl stop nginx
# Restart a service (stop then start)
$ sudo systemctl restart nginx
# Reload configuration without full restart (if supported)
$ sudo systemctl reload nginx
# Enable a service to start automatically at boot
$ sudo systemctl enable nginx
# Disable automatic start at boot
$ sudo systemctl disable nginx
The distinction between start and enable is important:
- start runs the service right now but does not affect future boots
- enable configures the service to start automatically when the system boots but does not start it right now
- To do both, you can combine them:
sudo systemctl enable --now nginx
Try It: Run
systemctl list-units --type=service --state=runningto see all services currently running on your system. Pick one and runsystemctl status <service-name>to see its details.
Viewing Service Logs
systemd includes a logging system called journald. You can view logs for any service using journalctl:
# View logs for a specific service
$ journalctl -u nginx
# View only the most recent logs
$ journalctl -u nginx --since "1 hour ago"
# Follow logs in real time (like tail -f)
$ journalctl -u nginx -f
Understanding services and logs is essential for troubleshooting. When a cloud application is not responding, the first steps are almost always: check if the service is running (systemctl status), and check the logs (journalctl -u).
Try It: Run
journalctl -u sshd --since "today"to see today's SSH login attempts on a Linux system. Each line represents an event -- you may see successful logins, failed attempts, or session disconnects.
Putting It All Together
Every concept in this section connects to the others. Here is how they fit together in a real scenario:
When you SSH into a cloud server and start a web server, the following happens at the OS level:
- Process: Your SSH session creates a shell process. When you run
sudo systemctl start nginx, the init system (systemd) forks a new process for nginx. - Memory: The kernel allocates virtual memory for the nginx process. The nginx binary is loaded into memory, and the process requests additional heap memory for its data structures.
- File System: nginx reads its configuration from
/etc/nginx/nginx.conf. It reads static web files from/var/www/html. It writes logs to/var/log/nginx/. - Permissions: nginx starts as root (to bind to port 80), then drops privileges to the
www-datauser. The configuration file is owned by root so that only administrators can change it. The log directory is writable bywww-dataso nginx can write logs. - Kernel: Every file read, network connection, and memory allocation happens through system calls. The kernel mediates all of it.
This layered, interconnected design is what makes operating systems both powerful and complex. The more you understand these fundamentals, the faster you can diagnose and fix problems in any computing environment.
Control Groups (cgroups)
Control Groups (cgroups) are a Linux kernel feature that lets you limit, account for, and isolate the resource usage of a collection of processes. They answer the question: "How do I prevent one process from using all the CPU or memory on a machine?"
cgroups can control:
- CPU — limit how much CPU time a group of processes can use
- Memory — set a maximum memory limit; processes exceeding it get OOM-killed
- I/O — throttle disk read/write bandwidth
- Network — limit network bandwidth (via associated tools)
- PIDs — limit the number of processes a group can create
Every process on a Linux system belongs to a cgroup. You can see your current cgroup:
$ cat /proc/self/cgroup
0::/user.slice/user-1000.slice/session-1.scope
cgroups are the foundation of container resource isolation. When you run a Docker container with --memory=512m --cpus=1, Docker creates a cgroup with those limits. The kernel enforces the limits transparently — the process inside the container does not even know it is restricted.
# View cgroup resource controllers available
$ cat /proc/cgroups
# View memory limit for a cgroup (cgroups v2)
$ cat /sys/fs/cgroup/user.slice/memory.max
Understanding cgroups is essential for understanding how Containers and Container Orchestration work under the hood. Kubernetes resource requests and limits map directly to cgroup settings.
Try It: Run
cat /proc/self/cgroupto see which cgroup your shell belongs to. Then explore/sys/fs/cgroup/to see the cgroup filesystem hierarchy.
Namespaces
While cgroups control how much of a resource a process can use, namespaces control what a process can see. Namespaces provide isolation by giving a process its own private view of system resources.
Linux provides several types of namespaces:
| Namespace | Isolates | Effect |
|---|---|---|
| PID | Process IDs | Process thinks it's PID 1; can't see processes outside its namespace |
| Network | Network stack | Process gets its own IP addresses, routing table, and ports |
| Mount | File system mounts | Process sees a different set of mounted file systems |
| User | User and group IDs | Process can be root (UID 0) inside namespace but a regular user outside |
| UTS | Hostname and domain | Process can have a different hostname |
| IPC | Inter-process communication | Process has its own shared memory and semaphores |
You can list active namespaces on a system:
$ lsns
NS TYPE NPROCS PID USER COMMAND
4026531835 cgroup 142 1 root /sbin/init
4026531836 pid 142 1 root /sbin/init
4026531837 user 142 1 root /sbin/init
4026531838 uts 142 1 root /sbin/init
4026531839 ipc 142 1 root /sbin/init
4026531840 net 142 1 root /sbin/init
4026531841 mnt 142 1 root /sbin/init
Namespaces are the other half of container isolation (alongside cgroups). When you run a Docker container, Docker creates a new set of namespaces so the container has its own PID tree, its own network stack, its own filesystem view, and its own hostname — making it feel like a separate machine even though it shares the host kernel.
This is the fundamental difference between containers and virtual machines:
- VMs run a complete separate kernel on virtualized hardware
- Containers share the host kernel but use namespaces and cgroups for isolation
You will explore this distinction in depth in Containers.
Try It: Run
lsns(may requiresudo) to see all namespaces on your system. If Docker is installed, start a container and runlsnsagain — you will see new namespaces created for the container.
Key Takeaways
- The operating system is an abstraction layer between hardware and applications. It manages resources, enforces isolation, and provides a consistent interface through system calls.
- The kernel runs in privileged kernel space. Applications run in restricted user space and must use system calls to request OS services.
- A process is a running program identified by a PID. Processes have states (Running, Sleeping, Stopped, Zombie) and are organized in a parent-child tree.
- Virtual memory gives each process the illusion of its own private address space. The page table maps virtual addresses to physical RAM. Swap provides overflow when RAM is full, but at a severe performance cost.
- File systems organize data on disk. Linux uses a single directory tree rooted at
/. Every file has an inode storing its metadata and permissions. - Permissions control who can read, write, and execute files. The model has three categories (owner, group, other) and three permission types (read, write, execute). Numeric notation (e.g., 755, 640) is used with
chmod. - systemd is the init system on modern Linux. Use
systemctlto start, stop, enable, and check the status of services. Usejournalctlto read logs. - These concepts are interconnected -- a web server involves processes, memory, file systems, permissions, and services all working together.
- Signals are how processes communicate. Always prefer
SIGTERMoverSIGKILLto allow graceful shutdown. - cgroups limit how much CPU, memory, and I/O a group of processes can use. They are the resource-limiting foundation of containers.
- Namespaces control what a process can see — its own PID tree, network stack, filesystem, and hostname. They are the isolation foundation of containers.