Keys and Constraints

What are Keys and Constraints?

A database must adhere to certain properties to maintain the integrity and quality of the data that it is storing. Keys and Constraints are rules that define what data values are allowed in certain data columns. They are an important database concept and are part of a database's schema definition. Defining Keys and constraints is part of the database design process and ensures that the data within a database is reliable and maintains its integrity. Constraints can apply to a specific column, an entire table, more than one table, or an entire schema. A reliable database system ensures that constraints hold at all times. Keys are a special type of constraint used to establish relationships and uniqueness.

Primary Key

A necessary part of establishing relationships between two entities or two pieces of data is identifying the data correctly. In SQL, uniquely identifying data is critical. A Primary Key is a unique identifier for a row of data. In the spreadsheet analogy we've used so far, there's always been an id column. While any column can serve as the primary key, using a column named id is easy for mnemonic reasons and is a popular convention. In an RDBMS, each table can have only one Primary Key.

A primary key is a guaranteed unique identifier for a row in a table. Because of this, we can use a row's primary key value to reference that row from other tables. In the previous chapter, we established relationships between multiple tables based on that, but relational databases need a way to relate to each other, and the primary key column is a way to do this efficiently. In PostgreSQL, primary keys are also given performance benefits as it's known that they'll be frequently used to look up data, especially in multi-table lookups.

Be aware that not all columns named "id" are primary keys; this is just a convention. In addition, primary keys can have any name.

We've already seen the syntax to designate a column as a primary key column in previous chapters:

CREATE TABLE users (
  id serial,
  username VARCHAR(25) NOT NULL,
  enabled BOOLEAN DEFAULT TRUE,
  last_login timestamp NOT NULL DEFAULT NOW(),
  PRIMARY KEY (id)
);

Let's take a look at the above command that creates a table called users, from the perspective of understanding Primary Keys.

  1. id column is the Primary Key of this table. We can also say that the Primary Key constraint has been applied to the id column.
  2. id can only hold numerical values (int).
  3. For each new row added to the table, the id field will be automatically incremented by 1. With many RDBMS, the AUTO_INCREMENT keyword is used. With PostgreSQL, we use serial to auto increment our ids and set their type as integer. It is not required to manually specify the primary key value each time you insert data, PostgreSQL will do it for you if you specify serial as the type for id.

Foreign Key

Foreign key columns are used to reference another row of data, perhaps in another table. In order to reference another row, the database needs a unique identifier for that row. Therefore, foreign key columns contain the value of the referenced row's primary key. For instance, a table might have a column called user_id as a foreign key column, which corresponds to the id column of the users table. Foreign keys are how RDBMS sets up relationships between rows of data, either in the same table or across tables.

CREATE TABLE reviews (
  id serial,
  book_id int NOT NULL,
  user_id int NOT NULL,
  review_content VARCHAR(255),
  rating int,
  published_date timestamp DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (id),
  FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE,
  FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

You can see from the above SQL statement that the line to establish this relationship is

FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE

In the above example id is the primary key of the books table, which is used to establish a relationship with the reviews table. The iddata is stored in the reviews table in the book_id field to map the books with the reviews. The ON DELETE CASCADE clause indicates that if a book is deleted all reviews associated with that book are also deleted.

Composite Key as Primary Key

In the previous chapter we created a many-to-many relationship, in which our primary key was made of two columns. The user_id and the book_id both together formed the primary key for the cross reference table as can be seen in this line of the SQL statement.

PRIMARY KEY (user_id, book_id)

CREATE TABLE users_books (
  user_id int NOT NULL ,
  book_id int NOT NULL,
  checkout_date timestamp,
  return_date timestamp,
  PRIMARY KEY (user_id, book_id),
  FOREIGN KEY (user_id) REFERENCES users(id) ON UPDATE CASCADE,
  FOREIGN KEY (book_id) REFERENCES books(id) ON UPDATE CASCADE
);

When a primary_key consists of a unique pair PRIMARY KEY (user_id, book_id), it is known as a composite key. It usually occurs in many-to-many relationships, and we need to add an extra table to store this relationship data. The composite key ensures that the data in the table will be unique for the relation between the user and book. Example:

All three records below are valid and the table has primary keys (1,1), (1,2) and (2,1). Therefore a new record (1,1) will be rejected by the database.

|User_id | Book_id |
|  1     |   1     |
|  1     |   2     |
|  2     |   1     |

UNIQUE Constraints

Besides the Primary Key, a database also allows applying a Unique constraint on any column in the database. For example, in the books table, we may choose to not only have the book_id as a unique primary key, but also need to store the book's ISBN number. The ISBN, International Standard Book Number is a 10 digit unique number for a book. We can choose to design our database to make this column unique. Even though this is not a primary key, duplicate ISBNs will not be allowed in the books table, and the data quality will be maintained.

CREATE TABLE books (
  id serial,
  title VARCHAR(100) NOT NULL,
  author VARCHAR(100) NOT NULL,
  published_date timestamp NOT NULL,
  isbn int,
  PRIMARY KEY (id),
  UNIQUE (isbn)
);
library=# \d books
                                       Table "public.books"
     Column     |            Type             |                     Modifiers
----------------+-----------------------------+----------------------------------------------------
 id             | integer                     | not null default nextval('books_id_seq'::regclass)
 title          | character varying(100)      | not null
 author         | character varying(100)      | not null
 published_date | timestamp without time zone | not null
 isbn           | integer                     |
Indexes:
    "books_pkey" PRIMARY KEY, btree (id)
    "books_isbn_key" UNIQUE CONSTRAINT, btree (isbn)

library=# INSERT INTO books( title , author, published_date, isbn) values ('My first rails book', 'Mereda', now(),1234567890);
INSERT 0 1
library=# INSERT INTO books( title , author, published_date, isbn) values ('My first rails book', 'Mereda', now(),1234567890);
ERROR:  duplicate key value violates unique constraint "books_isbn_key"
DETAIL:  Key (isbn)=(1234567890) already exists.

The Unique constraint disallows duplicate entry into the books table for the column value of isbn.

NOT NULL Constraints

We have seen "NOT NULL" in many of our SQL statements. The NOT NULL constraint prevents a column from allowing a null value (i.e. this column must contain data). This is important for key fields. For example: If you want to add a user to a database, you must provide a username. Without this constraint the database would get filled with null values and become less useful.

How does SQL handle these relationships?

PostgreSQL allows relationships between both the schema and data of different tables through the use of FOREIGN KEYS and an operation called a JOIN. As the name suggests, the purpose is to join tables. Through this mechanism we can, for instance, get the full details of the author of the reviews table shown above, since it has a field user_id which relates to the user table. Here's a quick example:

library=# SELECT reviews.*, users.* FROM reviews LEFT JOIN users ON (users.id = reviews.user_id);
id  | user_id |    review_content     | rating | username   | enabled
----+---------+-----------------------+--------+------------+--------
  1 | 1       | Lots of Winter Scenes | 10     | John Smith | t
  2 | 1       | Would watch again     | 9      | John Smith | t
(2 rows)

Now that we understand how to create multiple tables and the role keys play in establishing relationships, we will take a deeper look at the various joins and how SQL queries are built using these joins to retrieve the data required from multiple tables.