05 A introduction to version control 1: what problem does it solve?

This post is the fifth 'lesson' in the Participating in Free Software
LinuxChix course. You can find previous lessons at
http://www.linuxchix.org/content/courses/tools/ . Questions and
discussion are welcome, please make sure the string "[Tools]" is in the
subject of your mail.

This is where we get the the second half of the course, and the "meat"
of it: version control. In this lesson, I talk about what it is, and the
reasons why software projects, in particular, use it. Homework will be
back in a couple of lessons from now when we start actually working with
version control systems.

Consider a group of ten people, working on a common project. If you're
familiar with programming, you can imagine that they're working on some
code. Otherwise, imagine that they're writing a large document, perhaps
this course.

At any given time, most likely two or three of those people are working
on the same fairly narrow bit of the project: the bit that's newest and
which is under most active development. (The smaller the project and the
tighter the deadline, the more likely this is to be true: larger
projects will have people spread out on different tasks. Good project
managers try and encourage this a bit, but I digress.) Other people are
working on other things. One might be fixing a bug or doing some copy
editing in one part of the project, another in another part.

There are several problems to do with file sharing when working on
projects with a team. bYou generally want to work on the latest version
of the team's work, andyou want to reguarly share your changes with
them. A naive solution would be perhaps to have some kind of central
place where you all copy your files to. But a problem that pretty
quickly crops up when you do this is that you do, say, a day's work on a
file, having copied it to your workstation that morning. Someone else
copies that file the same morning and does their own work on it. You put
your copy in the central place (in version control jargon, the central
place would be called "the repository"). The other person then cpies
their own work there, but because their work doesn't incorporate yours,
this is effectively removing your work. So either one of you notices and
has to "merge" (this is also version control jargon, and it means what
you'd think: combining two or more sets of changes into a combined
change) your changes, or you don't notice until sometime later, when
someone complains that your work hasn't been done and you look, and
they're right, it's not in the file.

So one thing version control systems do is provide a protected way to
store files in a repository (not always a single one either, we'll get
to that in a few weeks). Rather than manually copying files in and out
of the repository, you run a command to put them in the respository
(known as "committing"), just like you run a command to update your own
copy of the files (known as a "working copy" or "sandbox" in version
control jargon). These commands keep track of things like whether or not
someone else has committed in the meantime and will attempt to merge
your changes with existing changes in various ways, rather than
completely obliterate every copy.

Now, one thing is important to note here. The version control systems we
will be talking about in this course do this *for text files*, like
code, HTML, almost all UNIX config files, and some types of
documentation (LaTeX, Docbook, RST and other text markup formats). They
do not handling merging for "binary" formats, for example, images and a
lot of word processing files. They usually have a way to store them
because it's useful to be able to put, say, a whole website in version
control and not have to worry about storing the images separately. But
they don't try and merge them for you, they just store great big lumps
of data. They will warn you about not having an up-to-date copy and so
on, so they're still more useful than the alternative, but they won't
merge. There are some similar systems that are designed to work with
particular binary formats. I'm told, for example, that there are some
for Microsoft Word documents. But I won't be covering them in this

The other most obvious benefit version control systems have is implied
in the words "revision control". A version control system will store
*every* version of a file that has been committed to it so that you can
access any particular version later. This not only saves you from silly
mistakes like accidently committing an empty file over the top of a 1000
line file, but it helps coders in particular because if it turns out
that they or someone else has introduced a problem into the code, they
can both study the differences between the files and "revert" (change
back) the code if need be.

Much of this course will cover using an existing revision control:
getting a copy of the files, changing them, committing them, dealing
with simple merge errors and getting at old versions of a file.

Another benefit that the systems we'll look at provide: the ability to
"branch", that is, mark your code (it's usually code that you branch) as
not ready for merging yet. You want to store it so that it is version
controlled, let other people look at it and perhaps work on it, but you
don't want to merge it with everyone else's work just now. This is
called "branching". This course will look at branching, but in a fairly
simple way.

Now, one of the big reasons you need to either know or be willing to
learn to use version control systems for working on Free Software is
that almost all projects use them, and use them for almost all of their
work. Code, documentation, translations and images are all stored in
version control. Free Software projects, which usually have work
distributed between people on different continents and in different
timezones. Further, these people may be working on a number of
experimental medium-term features in the code. Coordinating all of this
via emailing files to each other or dumping them in a central place
would be an unmitigated nightmare compared to using version control.
Hence, they all use version control. The first thing you'll ever do if
you want to submit a patch against a project will usually be to get a
copy of their latest files from the version control and make your

Next lesson we'll review these concepts in more detail, and then onto
the meat of using a version control system.