A database must adhere to certain properties to maintain the integrity and quality of the data that it is storing. Keys and Constraints are rules that define what data values are allowed in certain data columns. They are an important database concept and are part of a database's schema definition. Defining Keys and constraints is part of the database design process and ensures that the data within a database is reliable and maintains its integrity. Constraints can apply to a specific column, an entire table, more than one table, or an entire schema. A reliable database system ensures that constraints hold at all times. Keys are a special type of constraint used to establish relationships and uniqueness.
A necessary part of establishing relationships between two entities or two pieces of data is identifying the data correctly. In SQL, uniquely identifying data is critical. A Primary Key is a unique identifier for a row of data. In the spreadsheet analogy we've used so far, there's always been an
id column. While any column can serve as the primary key, using a column named
id is easy for mnemonic reasons and is a popular convention. In an RDBMS, each table can have only one Primary Key.
A primary key is a guaranteed unique identifier for a row in a table. Because of this, we can use a row's primary key value to reference that row from other tables. In the previous chapter, we established relationships between multiple tables based on that, but relational databases need a way to relate to each other, and the primary key column is a way to do this efficiently. In PostgreSQL, primary keys are also given performance benefits as it's known that they'll be frequently used to look up data, especially in multi-table lookups.
Be aware that not all columns named "id" are primary keys; this is just a convention. In addition, primary keys can have any name.
We've already seen the syntax to designate a column as a primary key column in previous chapters:
CREATE TABLE users ( id serial, username VARCHAR(25) NOT NULL, enabled BOOLEAN DEFAULT TRUE, last_login timestamp NOT NULL DEFAULT NOW(); PRIMARY KEY (id) );
Let's take a look at the above command that creates a table called
users, from the perspective of understanding Primary Keys.
idcolumn is the Primary Key of this table. We can also say that the Primary Key constraint has been applied to the
idcan only hold numerical values (
idfield will be automatically incremented by 1. With many RDBMS, the
AUTO_INCREMENTkeyword is used. With PostgreSQL, we use serial to auto increment our ids and set their type as integer. It is not required to manually specify the primary key value each time you insert data, PostgreSQL will do it for you if you specify serial as the type for id.
Foreign key columns are used to reference another row of data, perhaps in another table. In order to reference another row, the database needs a unique identifier for that row. Therefore, foreign key columns contain the value of the referenced row's primary key. For instance, a table might have a column called
user_id as a foreign key column, which corresponds to the
id column of the
users table. Foreign keys are how RDBMS sets up relationships between rows of data, either in the same table or across tables.
CREATE TABLE reviews ( id serial, book_id int NOT NULL, user_id int NOT NULL, review_content VARCHAR(255), rating int, published_date timestamp DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id), FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE, FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE );
You can see from the above SQL statement that the line to establish this relationship is
FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE
In the above example
id is the primary key of the
books table, which is used to establish a relationship with the reviews table. The
iddata is stored in the reviews table in the
book_id field to map the books with the reviews. The
ON DELETE CASCADE clause indicates that if a book is deleted all reviews associated with that book are also deleted.
In the previous chapter we created a many-to-many relationship, in which our primary key was made of two columns. The
user_id and the
book_id both together formed the primary key for the cross reference table as can be seen in this line of the SQL statement.
PRIMARY KEY (user_id, book_id)
CREATE TABLE users_books ( user_id int NOT NULL , book_id int NOT NULL, checkout_date timestamp, return_date timestamp, PRIMARY KEY (user_id, book_id), FOREIGN KEY (user_id) REFERENCES users(id) ON UPDATE CASCADE, FOREIGN KEY (book_id) REFERENCES books(id) ON UPDATE CASCADE );
When a primary_key consists of a unique pair
PRIMARY KEY (user_id, book_id), it is known as a composite key. It usually occurs in many-to-many relationships, and we need to add an extra table to store this relationship data. The composite key ensures that the data in the table will be unique for the relation between the user and book. Example:
All three records below are valid and the table has primary keys (1,1), (1,2) and (2,1). Therefore a new record (1,1) will be rejected by the database.
|User_id | Book_id | | 1 | 1 | | 1 | 2 | | 2 | 1 |
Besides the Primary Key, a database also allows applying a Unique constraint on any column in the database. For example, in the books table, we may choose to not only have the
book_id as a unique primary key, but also need to store the book's ISBN number. The ISBN, International Standard Book Number is a 10 digit unique number for a book. We can choose to design our database to make this column unique. Even though this is not a primary key, duplicate ISBNs will not be allowed in the books table, and the data quality will be maintained.
CREATE TABLE books ( id serial, title VARCHAR(100) NOT NULL, author VARCHAR(100) NOT NULL, published_date timestamp NOT NULL, isbn int, PRIMARY KEY (id), UNIQUE (isbn) );
library=# \d books Table "public.books" Column | Type | Modifiers ----------------+-----------------------------+---------------------------------------------------- id | integer | not null default nextval('books_id_seq'::regclass) title | character varying(100) | not null author | character varying(100) | not null published_date | timestamp without time zone | not null isbn | integer | Indexes: "books_pkey" PRIMARY KEY, btree (id) "books_isbn_key" UNIQUE CONSTRAINT, btree (isbn) library=# INSERT INTO books( title , author, published_date, isbn) values ('My first rails book', 'Mereda', now(),1234567890); INSERT 0 1 library=# INSERT INTO books( title , author, published_date, isbn) values ('My first rails book', 'Mereda', now(),1234567890); ERROR: duplicate key value violates unique constraint "books_isbn_key" DETAIL: Key (isbn)=(1234567890) already exists.
The Unique constraint disallows duplicate entry into the books table for the column value of isbn.
We have seen "NOT NULL" in many of our SQL statements. The
NOT NULL constraint prevents a column from allowing a null value (i.e this column must contain data). This is important for key fields. For example: If you want to add a user to a database, you must provide a username. Without this constraint the database would get filled with null values and become less useful.
PostgreSQL allows relationships between both the schema and data of different tables through the use of FOREIGN KEYS and an operation called a
JOIN. As the name suggests, the purpose is to join tables. Through this mechanism we can, for instance, get the full details of the author of the reviews table shown above, since it has a field
user_id which relates to the
user table. Here's a quick example:
library=# SELECT reviews.*, users.* FROM reviews LEFT JOIN users ON (users.id = reviews.user_id); id | user_id | review_content | rating | username | enabled ----+---------+-----------------------+--------+------------+-------- 1 | 1 | Lots of Winter Scenes | 10 | John Smith | t 2 | 1 | Would watch again | 9 | John Smith | t (2 rows)
Now that we understand how to create multiple tables and the role keys play in establishing relationships, we will take a deeper look at the various joins and how SQL queries are built using these joins to retrieve the data required from multiple tables.