06 More SQL

This week we’ll talk a bit more about managing databases and tables, talk about normalization of databases, and getting data in and out of a database.

Databases and Tables

So, you know how to create databases and tables. Let’s delve into both of those in more detail this week.

First, a bit about PostgreSQL’s commands. A lot of commands that you might want to do inside psql, you can also do outside. PG has a createdb command:

 $ createdb [options] database_name [description]

is equivalent to

 => CREATE DATABASE database_name with ... ;

See: http://www.postgresql.org/docs/8.2/static/manage-ag-createdb.html

Options include things like template. Templates are one of the most useful things in Postgresql. There are two databases that exist no matter what, when you initialize pg. ‘template0’ and ‘template1’. template0 is a database that can never be modified. template1 is a copy of template0, and is used as the default template when creating a new database. It’s basically empty. However, you can modify template1, and you can also create databases using a template of any other database. Thus cloning a database (full of data) is trivially easy.

Although one caviat is that no-one can be connected to a database in order for it to be used as a template. So you have to basically take the database offline in order to do this. If you can’t, there are other methods of cloning a db, just more steps.

To destroy a database, use:

 $ dropdb [options] database_name or => DROP DATABASE database_name

Be very careful when using this command - because everything is gone ... forever!

To destroy tables, use DROP TABLE table_name; inside of psql. Again, be careful!

Normalization of Databases

This relates specifically to relational databases. What are relational databases? Well, once upon a time, databases were all just individual “flat” tables. In order, for instance, to put information about individual items someone bought, you’d have to put all of that info into one big table. Now, we have tables that can relate to one another, so that data can be logically arranged.

But when you’ve got this kind of database, it’s really important that you follow some rules. And those rules are called database normalization. You don’t have to follow these rules, but nine times out of ten, if you don’t, you’ll be very, very sorry.

Each rule is called a “normal form” (don’t ask me, I have no idea how they came up with this.)

The “first normal form” is just that the table does not allow null rows, or duplicate rows. This is generally taken care of by including a primary key. This field (generally one) will contain a unique key (usually an integer), will not allow null values in this field, and will (generally) be automatically generated.

Here’s one way to do this in postrgres:

 => CREATE TABLE people (

 -> people_id serial NOT NULL PRIMARY KEY,

 ...

This creates a field with serial data type (autonumbered integer), that does not allow null values, and is a primary key.

You can then have, for instance, a second table, which relates people to things:

 => CREATE TABLE things (

 -> thing_id serial NOT NULL PRIMARY KEY,

 -> FOREIGN KEY people_id REFERENCES people,

 ...

Check out http://www.postgresql.org/docs/8.2/static/sql-createtable.html for details on this. I don’t want to go too deeply here - I don’t want to lose people.

For the rest of the normal forms, check out the wikipedia article: http://en.wikipedia.org/wiki/Database_normalization It’s almost impossible to translate that into english - look carefully at the examples, they will make sense.

Basically, the purpose of normalization is to make it as easy as possible to find data when you need to, as simply as possible. And to make reduce possibilities of error on data entry or updates.

OK, so recreate testtable1, except use the following as the first line:

 test=> create table testtable2 (

 test(> userid serial NOT NULL PRIMARY KEY,

 test(> firstname varchar(20),

 ....

 test(> code char(2) );
 NOTICE: CREATE TABLE will create implicit sequence "testtable2_userid_seq" for serial column "testtable2.userid"

 NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "testtable2_pkey" for table "testtable2"

 CREATE TABLE

Aha! What happened? First off, serial is not really a data type (I told you.) It actually is a super sekret code for “create this thing called a sequence”. Adding the primary key made an index.

Getting data into your tables

OK, so you’ve created at least one table. Let’s get some data into it.

 => INSERT INTO testtable1 (firstname,lastname,... , code) VALUES ('michelle','murrain',...,'qw');

Now, if you are lazy, you can just do this:

 => INSERT INTO testable1 VALUES ('michelle','murrain',...,'qw');

But actually, that’s not really being lazy, because you have to make sure that you include the fields in the right order. Also, if you want to insert partial data, using the first command is better - just specify the fields you want to use.

So, how to get data out?

 => SELECT ALL FROM testtable1;

That will give you all of the fields, and all of the rows of a table. That’s useful if there aren’t many rows, or aren’t many fields. That’s not likely to be the case with working databases.

 => SELECT userid,lastname,city FROM testtable1;

That will return only those three fields, but will return all of the rows.

 => SELECT userid,lastname,address FROM testtable1 WHERE city='Atlanta';

Much more useful, yes? Check out http://www.postgresql.org/docs/8.2/static/sql-select.html for the full skinny on SELECT. It’s rich!