02 Introduction to filesystem structure

Introduction

This is lesson 1 in Meredydd's UNIX filesystem course for Linuxchix.org. This lesson will cover the conceptual grounding of the UNIX filesystem, and what's so good about it.

The UNIX filesystem is - I hesitate to use the word, but it's pretty accurate - a design philosophy. There are a couple of major principles of design which need to be properly grasped to understand not just how it works, but what its designers were thinking.

It's all one system

This is one of the biggest surprises that greets migrating Windows users. Under DOS, Windows, or VMS, a file's path (that's its name, plus all the directories it's in - for example, /bin/ls, or C:\WINDOWS\NOTEPAD.EXE) is a very precise description of its location. If a file is called C:\MYFILE.TXT, it means something very specific. If you look at the contents of the first partition of the first disk

What's a partition?
All the data on a hard disk does not need to be grouped together. It is possible to slice up disks into separate partitions. Windows or DOS will see each partition as a separate disk, and most of the time UNIX treats them as separate devices too.

on the first bus, there will be a file called MYFILE.TXT in the top-level directory. This, you might think, is a sensible way of constructing filenames. And you'd be right. For DOS especially, a teeny weeny little operating system which couldn't really be bothered with abstractions, it was a very good idea. The problem is that this becomes hideously inflexible when you start getting much more advanced than that.

For one thing, tying things so strongly to the hardware doesn't do you any favours when ambiguity comes along. For example - C: generally means "the first hard disk on the first bus". But what does that mean? If you have both IDE and SCSI busses, which one is "first"? And what if you have more than two floppy disks (A: and B:)? The answer, as anyone who's tried it will tell you, is that the operating system gets confused, inconsistent, and arbitrary. Not good qualities for computers, whose greatest advantage is consistency and determinism.

OK, so how does UNIX do it? The answer is a radically different approach. There is only one directory hierarchy, located under /, the root directory. Everything else, whatever it is, wherever the actual files are stored, is a subdirectory of /.

This is fairly simple to envision when applied to a single disk. This is what the filesystem on my laptop looks like. When I was installing Linux on it, I created one partition for all my files. If you look at that partition, and the files and directories stored there, it looks like this:

What's a volume?
A volume is something which can contain files and directories. A hard drive partition is a volume. So is a floppy disk, a CD-ROM, or anything else which stores a set of files.

Fairly simple, right? When the machine starts up, it maps the node I've labelled "Volume contents" to the / directory. So, subdirectories of that node, such as bin/, are visible under / (for example, as /bin/).

But, but, but...this approach hasn't actually solved any more problems than DOS's rather simple-minded approach to things. We have successfully represented a single volume as a single, consistent tree from the computer's point of view. So far, there's nothing between C:\FILES\WORK.TXT and /files/work.txt, apart from which way the slashes lean.

The real magic comes when you add another volume. The DOS approach, depending on what type of volume this is, chooses a letter, and places the contents of the volume underneath it. So, separate volumes have entirely separate directory trees, the names of which depend on the hardware concerned.

UNIX, on the other hand, takes a far more elegant route. It merges the new volume into the existing filesystem tree, grafting its contents onto an existing directory. Take the example of a floppy disk: this diagram shows what my / filesystem tree looks like when I've added the new volume to it (this is called mounting the volume).

Worth noting
You will notice here that my floppy disk's contents are grafted onto (mounted under) the /floppy/ directory. Those of you with a bit of experience with floppies and Linux are likely to notice that on your system, this probably isn't the case. Most Linux distributions mount removable media under /mnt/floppy/, /mnt/cdrom/, etc. Those of you who use Solaris or BSD, however, will find nothing out of the ordinary - by default, they mount removable media in top-level directories such as /floppy/. This, of course, is the beauty of the UNIX approach. Because you can mount a volume under any directory you like, you can choose what path you use to access your floppy drive, and I can choose mine.

The best thing about this? Those dotted lines on the diagram, which show which files are from the new volume, are for illustration only. As far as a program running on the computer is concerned, there is no boundary. Yes, accessing files and directories under the /floppy directory is a little slower than the rest of the system, because the actual hardware being used is a floppy disk rather than a hard drive, but that's no concern of the software's. Therein lies the real power of the UNIX filesystem. This is the take-home point - understand this, really feel the design principles and the power they give you, and the rest is window-dressing.

Another quick illustration

It's not unusual, in corporate environments especially, to have your home directory on a different volume (specifically, one somewhere else on the network, a topic we will be coming to later), and to have it mounted automatically when you log in. With Windows systems, you add a whole new drive letter for the network volume, and any references to files on that drive (for example, "recently used documents" lists in a word processor) are inextricably linked to the drive letter, which is why one has to create even more inconsistency by abandoning the letter-allocation scheme and forcing it to occupy a constant letter such as H:. With UNIX systems, a piece of software called an automounter (another topic we'll come to later) automatically mounts your home directory from the remote machine, and /home/your_username (a constant, standardised location for a user's home directory) starts seamlessly referring to files on a different computer to the one you're using.

Pretty neat, huh?

Exercises

Because of this lesson's conceptual nature, there are no exercises or demonstrations. However, I'd recommend that you explore your root directory. Find out what sort of things lurk in which directories. If you find something interesting that you're not sure about, ask me what it is!