SQL Joins

If you don't already have a database named "library", then make sure to run createdb library from the terminal before starting this chapter. The library database is needed to work through this chapter's content. For this chapter, we will only need to start with the schema of the various tables we've created in past chapters. You can create those tables manually or restore the database from a backup file with the following command:

psql -d library < joins.sql

What is a SQL Join?

SQL handles relationships through the use of JOINs. JOINs are clauses in SQL statements that link two tables based on one or more fields. With JOINs, relational databases can reduce redundancy. A table doesn't have to contain all the fields related to its rows, and table data can be used in multiple places. There are several types of JOINs. To describe these different types, we need to add some data to our database so we can show the characteristics of each type.

Set Up Data

To set up the data, we will create 3 users, 3 books and 2 reviews in our library database by running the SQL commands shown below. This mix of data lets us explore the various types of JOIN, and observe the results of queries. The table data can also be set up by running the following command in addition to the command at the top of the page. First download this file.

psql -d library < joins_data.sql

Let's take a look at how these statements work. If you haven't run the statements, you can follow along and run them now as we explain the data and statements used to create them.

  • User: John Smith
  • Address: 1 Market Street, San Francisco, CA
  • Book: My First SQL Book by Mary Parker with a review My First Review
  • Book: My Second SQL Book by John Mayer with a review My Second Review

The first set of statements set up the data for the first user, John Smith. John Smith has an address at 1 Market Street, San Francisco, CA, and has checked out 2 books: My First SQL Book and My Second SQL Book. John Smith also has 2 reviews for each of the books.

-- A user
INSERT INTO users (id, username) VALUES (1, 'John Smith');

-- An address
INSERT INTO addresses (user_id, street, city, state)
VALUES (1, '1 Market Street', 'San Francisco', 'CA');

-- A book
INSERT INTO books (id, title, author, published_date)
VALUES(1, 'My First SQL book', 'Mary Parker', NOW());

INSERT INTO users_books (user_id, book_id, checkout_date)
VALUES(1, 1, NOW());

-- A review
INSERT INTO reviews (id, book_id, user_id, review_content)
VALUES (1, 1, 1, 'My first review');

-- A second book
INSERT INTO books (id, title, author, published_date)
VALUES(2, 'My Second SQL book','John Mayer', NOW());

INSERT INTO users_books (user_id, book_id, checkout_date)
VALUES (1, 2 ,NOW());

-- A second review
INSERT INTO reviews (id, book_id, user_id, review_content)
VALUES (2, 2, 1, 'My second review');
  • User: Jane Smiley
  • Address: 2 Elm Street, San Francisco, CA
  • Book: My Second SQL Book by John Mayer with a review Jane Smiley's review

The second set of statements set up the data for the second user, Jane Smiley. She has an address at 2 Elm Street, San Francisco, CA, and has 1 book My Second SQL Book with a review by Jane Smiley.

-- A second User
INSERT INTO users (id, username) VALUES (2, 'Jane Smiley');

-- A second address
INSERT INTO addresses (user_id, street, city, state)
VALUES (2, '2 Elm Street', 'San Francisco', 'CA');

INSERT INTO users_books (user_id, book_id, checkout_date)
VALUES(2, 2 , NOW());

-- A third review
INSERT INTO reviews (id, book_id, user_id, review_content )
VALUES (3, 2, 2, 'review_content');
  • User: Alice Munro
  • Book: My Third SQL Book by Cary Flint

The third user, Alice Munro, has no address, no books and no reviews. We do also add one extra book, but it isn't checked out by any users.

-- A Third User
INSERT INTO users (id, username) VALUES (3, 'Alice Munro');

-- A Third book
INSERT INTO books (id, title, author, published_date)
VALUES(3, 'My Third SQL book','Cary Flint', NOW());

We have intentionally setup the data in the following ways:

  1. Not all users have books.
  2. Not all users have an address.
  3. Not all users have reviews.
  4. Not all books are connected to users.
  5. Not all books have reviews.

INNER JOIN

To demonstrate how INNER JOINs work, we will join the users and addresses tables. In the previous chapter, we designed the user and address table to have a one-to-one relationship. In the data we set up, we have 3 users, but only 2 users have addresses. Alice Munro does not have an address.

Let's take a look at the data in the users table. Run the queries below in your PostgreSQL console and follow along to view results in the users and addresses tables.

library=# SELECT * FROM users;
 id |  username   | enabled |         last_login
----+-------------+---------+----------------------------
  1 | John Smith  | t       | 2016-04-03 11:53:18.730872
  2 | Jane Smiley | t       | 2016-04-03 11:53:18.74642
  3 | Alice Munro | t       | 2016-04-03 11:53:18.752364
(3 rows)
library=# SELECT * FROM addresses;
 user_id |     street      |     city      | state
---------+-----------------+---------------+-------
       1 | 1 Market Street | San Francisco | CA
       2 | 2 Elm Street    | San Francisco | CA
(2 rows)

Now we will create an INNER JOIN between these two tables. An INNER JOIN returns a result set that contains the common elements of the tables, i.e the intersection where they match on the joined column. INNER JOINs are the most frequently used JOINs. In the query below, the line INNER JOIN (addresses) ON (users.id = addresses.user_id) creates the intersection between the two tables. Only 2 records are returned for the query; the result does not contain the user Alice Munro who does not have an address.

library=# SELECT users.*, addresses.*
FROM users
INNER JOIN addresses
ON users.id = addresses.user_id;
 id |  username   | enabled |         last_login         | user_id |     street      |     city      | state
----+-------------+---------+----------------------------+---------+-----------------+---------------+-------
  1 | John Smith  | t       | 2016-04-03 11:53:18.730872 |       1 | 1 Market Street | San Francisco | CA
  2 | Jane Smiley | t       | 2016-04-03 11:53:18.74642  |       2 | 2 Elm Street    | San Francisco | CA
(2 rows)

INNER JOINs can be described by the Venn diagram's intersectionality which states that only the things that are in both of the sets are returned.

See how the query above has "TABLEA A" and "TABLEB B" . The second "A" and second "B" are aliases. Specifying those values after the table name allows us to refer to those tables with just "A" and "B". Aliasing can be very useful when writing long SQL statements. We'll talk a bit more about aliasing later on.

LEFT JOIN

A LEFT JOIN or a LEFT OUTER JOIN takes all the rows from one table, defined as the LEFT table, and JOINs it with a second table. The JOIN is based on the conditions supplied in the parentheses. A LEFT JOIN will always include the rows from the LEFT table, even if there are no matching rows in the table it is JOINed with.

library=# SELECT users.*, addresses.*
library-# FROM users
library-# LEFT JOIN addresses
library-# ON (users.id = addresses.user_id);
 id |  username   | enabled |         last_login         | user_id |     street      |     city      | state
----+-------------+---------+----------------------------+---------+-----------------+---------------+-------
  1 | John Smith  | t       | 2016-04-03 11:53:18.730872 |       1 | 1 Market Street | San Francisco | CA
  2 | Jane Smiley | t       | 2016-04-03 11:53:18.74642  |       2 | 2 Elm Street    | San Francisco | CA
  3 | Alice Munro | t       | 2016-04-03 11:53:18.752364 |         |                 |               |
(3 rows)

What's being said in SQL is "give me all the matching rows from the left table along with any matching data from the RIGHT table based on the ON clause". In the above query, Alice Munro does not have an address. However, the query returns 3 records, and the address fields return NULL values for Alice Munro. In the INNER JOIN, Alice Munro's record was not returned at all.

LEFT JOINs can be described by the Venn diagram below.

CROSS JOIN

A CROSS JOIN, also known as a Cartesian JOIN, returns all rows from one table crossed with every row from the second table. This JOIN does not have an ON clause.

This SQL query has the similar syntax to other JOINs, but without the ON clause

library=# SELECT *
FROM users
CROSS JOIN addresses;
 id |  username   | enabled |         last_login         | user_id |     street      |     city      | state
----+-------------+---------+----------------------------+---------+-----------------+---------------+-------
  1 | John Smith  | t       | 2016-04-03 11:53:18.730872 |       1 | 1 Market Street | San Francisco | CA
  1 | John Smith  | t       | 2016-04-03 11:53:18.730872 |       2 | 2 Elm Street    | San Francisco | CA
  2 | Jane Smiley | t       | 2016-04-03 11:53:18.74642  |       1 | 1 Market Street | San Francisco | CA
  2 | Jane Smiley | t       | 2016-04-03 11:53:18.74642  |       2 | 2 Elm Street    | San Francisco | CA
  3 | Alice Munro | t       | 2016-04-03 11:53:18.752364 |       1 | 1 Market Street | San Francisco | CA
  3 | Alice Munro | t       | 2016-04-03 11:53:18.752364 |       2 | 2 Elm Street    | San Francisco | CA
(6 rows)

The query above returns the addresses and users tables, cross joined. The result set consists of every record in users mapped to every record in addresses. For 3 users and 2 addresses, we get a total of 3x2=6 records. In mathematical terms, this is the cross product of a set.

In an application, usually we will not use a CROSS JOIN. More often than not, it's more important to match rows together through a JOIN, and then use the information returned. Therefore, using something like a CROSS JOIN to query for the cross product of all rows will most likely not return a meaningful result. It's still important to know of CROSS JOINs, since they do have their uses here and there.
CROSS JOINs can be described by the Venn diagram below.

RIGHT JOIN

A RIGHT JOIN is similar to a LEFT JOIN except that all the data on the second table is included. When we set up our data earlier in this chapter, we created 3 books, but not all books are checked out by a user. The book My Third SQL Book does not have any user associated with it. When we run the RIGHT JOIN below, all books are returned in the result. Notice that though all books have been included, the username is returned as NULL for the book that is not associated with any user.

library=# SELECT users.username, books.title
library-# FROM users
library-# RIGHT JOIN users_books ON (users_books.user_id = users.id)
library-# RIGHT JOIN books ON (books.id = users_books.book_id);
  username   |       title
-------------+--------------------
 John Smith  | My First SQL Book
 John Smith  | My Second SQL Book
 Jane Smiley | My Second SQL Book
             | My Third SQL Book
(4 rows)

Now, notice that for the same query, if a LEFT JOIN is used instead of a RIGHT JOIN, all users are returned, even those with no books associated with them, but NOT all books are returned.

library=# SELECT users.username, books.title
library-# FROM users
library-# LEFT JOIN users_books ON (users_books.user_id = users.id)
library-# LEFT JOIN books ON (books.id = users_books.book_id);
  username   |       title
-------------+--------------------
 John Smith  | My First SQL Book
 John Smith  | My Second SQL Book
 Jane Smiley | My Second SQL Book
 Alice Munro |
(4 rows)

RIGHT JOINs can be described by the Venn diagram below.

In the next chapters, we will study advanced queries, functions, and operators in SQL queries. These topics require a solid understanding of relationships, joins and keys so make sure to revisit this chapter to learn them thoroughly.