08 Overview of the Kernel Source

In this lesson, you'll get a general idea of where the various parts of the kernel are located in the source tree, what order they execute in, and how to go looking for a particular piece of code.

Where is all the code?

Let's start with the top-level directory of the Linux source tree, which is usually but not always in /usr/src/linux-<version>. We won't get too detailed, because the Linux source changes constantly, but we'll try to give you enough information to figure out where a certain driver or function is.

`Makefile`

This file is the top-level Makefile for the whole source tree. It defines a lot of useful variables and rules, such as the default gcc compilation flags.

`Documentation/`

This directory contains a lot of useful (but often out of date) information about configuring the kernel, running with a ramdisk, and similar things. The help entries corresponding to different configuration options are not found here, though - they're found in Kconfig files in each source directory.

`arch/`

All the architecture specific code is in this directory and in the include/asm-<arch> directories. Each architecture has its own directory underneath this directory. For example, the code for a PowerPC based computer would be found under arch/ppc. You will find low-level memory management, interrupt handling, early initialization, assembly routines, and much more in these directories.

`crypto/`

This is a cryptographic API for use by the kernel itself.

`drivers/`

As a general rule, code to run peripheral devices is found in subdirectories of this directory. This includes video drivers, network card drivers, low-level SCSI drivers, and other similar things. For example, most network card drivers are found in drivers/net. Some higher level code to glue all the drivers of one type together may or may not be included in the same directory as the low-level drivers themselves.

`fs/`

Both the generic filesystem code (known as the VFS, or Virtual File System) and the code for each different filesystem are found in this directory. Your root filesystem is probably an ext2 filesystem; the code to read the ext2 format is found in fs/ext2. Not all of the filesystems compile or run, and the more obscure filesystems are always a good candidate for someone looking for a kernel project.

`include/`

Most of the header files included at the beginning of a .c file are found in this directory. Architecture specific include files are in asm-<arch>. Part of the kernel build process creates the symbolic link from asm to asm-<arch>, so that #include <asm/file.h> will get the proper file for that architecture without having to hard code it into the .c file. The other directories contain non-architecture specific header files. If a structure, constant, or variable is used in more than one .c file, it should be probably be in one of these header files.

`init/`

This directory contains the files main.c, version.c, and code for creating "early userspace". version.c defines the Linux version string. main.c can be thought of as the kernel "glue". We'll talk more about main.c in the next section. Early userspace provides functionality that needs to be available while a Linux kernel is coming up, but that doesn't need to be run inside the kernel itself.

`ipc/`

"IPC" stands for "Inter-Process Communication". It contains the code for shared memory, semaphores, and other forms of IPC.

`kernel/`

Generic kernel level code that doesn't fit anywhere else goes in here. The upper level system call code is here, along with the printk() code, the scheduler, signal handling code, and much more. The files have informative names, so you can type ls kernel/ and guess fairly accurately at what each file does.

`lib/`

Routines of generic usefulness to all kernel code are put in here. Common string operations, debugging routines, and command line parsing code are all in here.

`mm/`

High level memory management code is in this directory. Virtual memory (VM) is implemented through these routines, in conjunction with the low-level architecture specific routines usually found in arch/<arch>/mm/. Early boot memory management (needed before the memory subsystem is fully set up) is done here, as well as memory mapping of files, management of page caches, memory allocation, and swap out of pages in RAM (along with many other things).

`net/`

The high-level networking code is here. The low-level network drivers pass received packets up to and get packets to send from this level, which may pass the data to a user-level application, discard the data, or use it in-kernel, depending on the packet. The net/core directory contains code useful to most of the different network protocols, as do some of the files in the net/ directory itself. Specific network protocols are implemented in subdirectories of net/. For example, IP (version 4) code is found in the directory net/ipv4.

`scripts/`

This directory contains scripts that are useful in building the kernel, but does not include any code that is incorporated into the kernel itself. The various configuration tools keep their files in here, for example.

`security/`

Code for different Linux security models can be found here, such as NSA Security-Enhanced Linux and socket and network security hooks.

`sound/`

Drivers for sound cards and other sound related code is placed here.

`usr/`

This directory contains code that builds a cpio-format archive containing a root filesystem image, which will be used for early userspace.

Where does it all come together?

The central connecting point of the whole Linux kernel is the file init/main.c. Each architecture executes some low-level setup functions and then executes the function called start_kernel, which is found in init/main.c.

The order of execution of code looks something like this:

   Architecture-specific setup code (in arch/<arch>/*)
    |
    v
   The function start_kernel() (in init/main.c)
    |
    v
   The function init() (in init/main.c)
    |
    v
   The user level "init" program

In more detail, this is what happens:

Architecture-specific set up code that does:
- Unzip and move the kernel code itself, if necessary
- Initialize the hardware
  - This may include setting up low-level memory management
- Transfer control to the function start_kernel()
start_kernel() does, among other things:
- Print out the kernel version and command line
- Start output to the console
- Enable interrupts
- Calibrate the delay loop
- Calls rest_init(), which does:
  - Start a kernel thread to run the init() function
  - Enter the idle loop
init() does:
- Start the other processors (on SMP machines)
- Start the device subsystems
- Mount the root filesystem
- Free up unused kernel memory
- Run /sbin/init (or /etc/init, or...)

At this point, the userlevel init program is running, which will do things like start networking services and run getty (the login program) on your console(s).

You can figure out when a subsystem is initialized from start_kernel() or init() by putting in your own printk's and seeing when the printk's from that subsystem appear with regard to your own printk's. For example, if you wanted to find out when the ALSA sound system was initialized, put printk's at the beginning of start_kernel() and init() and look for where "Advanced Linux Sound Architecture [...]" is printed out relative to your printk's. (See 06 Your First printk for help with using the printk() function.)

Finding things in the kernel source tree

So, you want to start working on, say, the USB driver. Where do you start looking for the USB code?

First, you can try a find command from the top-level kernel directory:

   $ find . -name \*usb\*

This command will print out every filename that has the string "usb" in the middle of it.

Another thing you might try is looking for a unique string. This unique string can be the output of a printk(), the name of a file in /proc, or any other unique string that might be found in the source code for that driver. For example, USB prints out the message:

   usb-ohci.c: USB OHCI at membase 0xcd030000, IRQ 27

So you might try using a recursive grep to find the part of that printk that is not a conversion character like %d:

   $ grep -r "USB OHCI at" .

Another way you might try to find the USB source code is by looking in /proc. If you type find /proc -name usb, you might find that there is a directory named /proc/bus/usb. You might be able to find a unique string to grep for by reading the entries in that directory.

If all else fails, try descending into individual directories and listing the files, or looking at the output of ls -lR. You may see a filename that looks related. But this should really be a last resort, and something to be tried only after you have run many different find and grep commands.

Once you've found the source code you are interested in, you can start reading it. Reading and understanding Linux kernel code is another lesson in itself. Just remember that the more you read kernel code, the easier it gets. Have fun exploring the kernel!