SQL Joins

When our data was all in a single table, we could easily retrieve a particular row from that table in order to get all the pertinent data we needed for a particular query. For example, looking at our unnormalized table below, if we wanted information on the books that 'John Smith' has checked out, we could use a simple SELECT query such as SELECT * WHERE full_name = 'John Smith'. This would return us the first two rows from that table, containing information such as the book title and author, and checkout and return dates.

Unormalized Users Table

Now that this data is split across three tables, users, books, and checkouts, we first have to join those tables together before we can select the data that we need.

What is a SQL Join?

SQL handles queries across more than one table through the use of JOINs. JOINs are clauses in SQL statements that link two tables together, usually based on the keys that define the relationship between those two tables. There are several types of JOINs: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS; they all do slightly different things, but the basic theory behind them all is the same. We'll take a look at each type of JOIN in turn, but first lets go over the general syntax that JOIN statements share.

Join Syntax

The general syntax of a JOIN statement is as follows:

SELECT [table_name.column_name1, table_name.column_name2,..] FROM table_name1
join_type JOIN table_name2 ON (join_condition);

The first part of this:

SELECT [table_name.column_name1, table_name.column_name2,..] FROM

is essentially the SELECT column_list FROM form that you've alredy seen in previous SELECT queries, with the slight difference that column names are prepended by table names in the column list.

Let's first focus on the second part of the statement, the part that joins the tables:

table_name1 join_type JOIN table_name2 ON (join_condition)

To join one table to another, PostgreSQL needs to know several pieces of information:

  • The name of the first table to join
  • The type of join to use
  • The name of the second table to join
  • The join condition.

These pieces of information are combined together using the JOIN and ON keywords. The part of the statement that comes after the ON keyword is the join condition; this defines the logic by which a row in one table is joined to a row in another table. In most cases this join condition is created using the primary key of one table and the foreign key of the table we want to join it with.

Remember our colors and shapes example from the previous chapter? The color_id column of the shapes table is a Foreign Key which references the id column of the colors table.

Shapes and Colors, separate tables

If we wanted a list of shapes and their colors, we could a query like this:

SELECT colors.color, shapes.shape FROM
  colors JOIN shapes ON
  color.id = shapes.color_id;

Within the second part of this query, colors JOIN shapes ON color.id = shapes.color_id, the join condition condition will look at each id value in the colors table and attempt to match it with a color_id value in the shapes table. If there is a match then those two rows are joined together to form a new row in a virtual table known as a join table. Since the id 1 for the color Red appears twice in the color_id column of our shapes table, this row of the colors table appears twice in our virtual join table, joined to both Square and Star. Since the id 3 for the color Orange does not appear at all in the color_id column of our shapes table, this row of the colors table is omitted completely from our virtual join table.

Shapes and Colors, virtual join table

With this virtual join table created, the SELECT column_list FROM part of our statement can then be executed to select columns from this virtual table. Those columns could originally be from the first table or the second table; to avoid confusion, we therefore need to specify both the table name and column name in our column list, in the form table_name.column_name1. Looking at our example, selecting columns from our virtual join table is effectively the same as saying:

SELECT colors.color, shapes.shape FROM shapes_colors_join_table;

The resulting data would look like this:

Shapes and Colors, resulting data

Now that we understand the basic principle behind how joins work, let's take a look at some specific examples of different types of join.

Types of Joins

As mentioned earlier, a JOIN statement can come in various forms. To specify which type of join to use, you can add either INNER, LEFT, RIGHT, FULL or CROSS just before the keyword JOIN. We'll look at an example of each of those types of join using the tables in our sql_book database.

INNER JOIN

An INNER JOIN returns a result set that contains the common elements of the tables, i.e the intersection where they match on the joined condition. INNER JOINs are the most frequently used JOINs; in fact if you don't specify a join type and simply use the JOIN keyword, then PostgreSQL will assume you want an inner join. Our shapes and colors example from earlier used an INNER JOIN in this way.

In the query below, the line INNER JOIN (addresses) ON (users.id = addresses.user_id) creates the intersection between the two tables, which means that the join table contains only rows where there is a definite match between the values in the two columns used in the condition.

SELECT users.*, addresses.*
FROM users
INNER JOIN addresses
ON (users.id = addresses.user_id);

The data in our unjoined tables looks like this:

Users and Addresses

The result of our SELECT query using an INNER JOIN would look like this:

Inner Join

The value in the id column of the users table for the user Jane Smith is 5; since this value does not appear in the user_id column of the addresses table, she is omitted entirely from the join table and so only 3 records are returned by the query.

If we did want to include Jane Smith in our results despite her not having an address, we would have to use a different type of join, an outer join.

LEFT JOIN

A LEFT JOIN or a LEFT OUTER JOIN takes all the rows from one table, defined as the LEFT table, and joins it with a second table. The JOIN is based on the conditions supplied in the parentheses. A LEFT JOIN will always include the rows from the LEFT table, even if there are no matching rows in the table it is JOINed with.

Let's try and use the same JOIN query as before, but this time we'll use a left join:

SELECT users.*, addresses.*
FROM users
LEFT JOIN addresses
ON (users.id = addresses.user_id);

Here, the Jane Smith row from the users table is included in the join table, since she doesn't have any matching rows in the addresses table, the join table has NULL values in her row for the columns of that table.

Left Join

Note that the using either LEFT JOIN or LEFT OUTER JOIN does exactly the same thing, and the OUTER part is often omitted. Even so, it is still common to refer to this type of join as an 'outer' join in order to differentiate it from an 'inner' join. Another type of outer join is a RIGHT JOIN, which we'll look at next.

RIGHT JOIN

A RIGHT JOIN is similar to a LEFT JOIN except that the roles between the two tables are reversed, and all the rows on the second table is included along with any matching rows from the first table. In the last chapter we mentioned that in our sql_book database we have books, and also reviews for those books. Not all of our books have reviews, however. Lets make a RIGT JOIN or RIGHT OUTER JOIN that displays all reviews and their associated books, along with any books that don't have a review.

SELECT reviews.book_id, reviews.content,
       reviews.rating, reviews.published_date,
       books.id, books.title, books.author
FROM reviews RIGHT JOIN books ON (reviews.book_id = books.id);

The data in our unjoined tables looks like this:

Reviews and Books

The result of our SELECT query using an RIGHT JOIN would look like this:

Right Join

As you can see, My Third SQL Book doesn't yet have a review, and so all the columns from the review table have NULL values for that row in the join table.

FULL JOIN

A FULL JOIN or FULL OUTER JOIN is essentially a combination of LEFT JOIN and RIGHT JOIN. This type of join contains all of the rows from both of the tables. Where the join condition is met, the rows of the two tables are joined, just as in the previous examples we've seen. For any rows on either side of the join where the join condition is not met, the columns for the other table have NULL values for that row.

A FULL JOIN is a little less common than the other ones we've looked at so far and so we won't show an example for this.

Another uncommon type of join is a CROSS JOIN; let's take a look.

CROSS JOIN

A CROSS JOIN, also known as a Cartesian JOIN, returns all rows from one table crossed with every row from the second table. In other words, the join table of a cross join contains every possible combination of rows from the tables that have been joined. Since it returns all combinations, a CROSS JOIN does not need to match rows using a join condition, therefore it does not have an ON clause.

The way this join works is sometimes a little difficult to envisage, so it's worth looking at an example in this case. This SQL query has the similar syntax to other JOINs, but without the ON clause:

sql_book=# SELECT * FROM users CROSS JOIN addresses;

The query above returns the addresses and users tables, cross joined. The result set consists of every record in users mapped to every record in addresses. For 4 users and 3 addresses, we get a total of 4x3=12 records. In mathematical terms, this is the cross product of a set.

CROSS Join

In an application, it's very unlikely that you would use a CROSS JOIN. Most of the time, it's more important to match rows together through a join condition in order to return a meaningful result. It's still important to be aware of CROSS JOINs however, since you may occasionally encounter them.

Now that we've covered the basics of joins, let's explore a couple of other useful techniques for working with multiple tables.

Multiple Joins

It is possible, and indeed common, to join more than just two tables together. This is done by adding additional JOIN clauses to your SELECT statement. To join multiple tables in this way, there must be a logical relationship between the tables involved. One example would be joining our users, checkouts, and books tables.

SELECT users.full_name, books.title, checkouts.checkout_date
FROM users
INNER JOIN checkouts ON (users.id = checkouts.user_id)
INNER JOIN books AS b ON (books.id = checkouts.book_id);

Here we are using two INNER JOINs. One between users and checkouts and one between books and checkouts. In both cases the JOIN is implemented by using the Primary Key of one table (either users or books) and the Foreign Key for that table in the chekouts table.

Aliasing

You may have noticed that some of the queries we list above can get a bit long. We can cut back on the length of these queries by using aliasing. Aliasing allows us to specify another name for a column and then use that name in later parts of a query to allow for more concise syntax. Let's use our three table join from above as an example. Using aliasing, the query would look like this:

SELECT u.full_name, b.title, c.checkout_date
FROM users AS u
INNER JOIN checkouts AS c ON (u.id = c.user_id)
INNER JOIN books AS b ON (b.id = c.book_id);

Here we specify single letter aliases for our tables, and use those aliases instead of our table names in order to prepend the columns from those tables in the column list and in the join conditions. This is commonly referred to as 'table aliasing'.

We can even use a shorthand for aliasing by leaving out the AS keyword entirely. FROM users u and FROM users AS u are equivalent SQL clauses.

Column Aliasing

Aliasing isn't just useful for shortening SQL queries. We can also use it to display more meaningful information in our result table. For instance, if we want to display the number of checkouts from the library we could write something like:

sql_book=# SELECT count(id) AS "Number of Books Checked Out"
sql_book-# FROM checkouts;
Number of Books Checked Out
-----------------------------
                          4
(1 row)

If we hadn't used aliasing above then we lose context about what was counted.

sql_book=# SELECT count(id) FROM checkouts;
 count
-------
     4
(1 row)

If you're a user just trying to access information, then most likely you wouldn't know about the exact tables being queried; being explicit about what information we're displaying can be important in a case like that.

Subqueries

So far in this chapter, we've looked at using JOINs to work with more than one table. Although joining tables together is probably the most common way of working with multiple tables, you can often achieve the same results throguh use of a subquery. Before we compare subqueries and joins, let's examine what a subquery is.

Imagine executing a SELECT query, and then using the results of that SELECT query as a condition in another SELECT query. This is called nesting, and the query that is nested is referred to as a subquery.

For example, suppose we need to select users that have no books checked out. We could do this by finding users whose user_id is not in the checkouts table. If no relation is found, that would mean that the user has not checked out any books.

sql_book=# SELECT u.full_name FROM users u
          WHERE u.id NOT IN (SELECT c.user_id FROM checkouts c);
  full_name
-------------
 Harry Potter
(1 row)

In the code above, the NOT IN clause compares the current user_id to all of the rows in the result of the subquery. If that id number isn't part of the subquery results, then the full_name for current row is added to the result set.

This might seem a bit confusing, so let's break it down. Our initial checkouts table looks like this:

Checkouts table

The nested query SELECT c.user_id FROM checkouts c returns the following results:

Nested Query Results

This virtual table can then effectively be by our NOT IN clause as a list of values against which to check the values in the id column of our users table.

NOT IN example

The only value in that column that is not in the results of the nested query is the id for 'Harry Potter': 3.

Subquery Expressions

PostgreSQL provides a number of expressions that can be used specifically with sub-queries, such as IN, NOT IN, ANY, SOME, and ALL. These all work slightly differently, but essentially they all compare values to the results of a sub-query.

We won't go into too much detail about subqueries here as you'll get to work with them some more later in course 180. One thing it is useful to know though is that in some situations a subquery can be used as an alternative to a join.

Subqueries vs Joins

As you write more queries, you may find that there is more than one way to write a query and achieve the same results. The most common choices are between subqueries and JOINs.

For instance, we can get the same result table as in our previous example by using a JOIN clause instead of a subquery.

sql_book=# SELECT u.full_name FROM users u
sql_book-# LEFT JOIN checkouts c ON (u.id = c.user_id)
sql_book-# WHERE c.user_id IS NULL;
  full_name   
--------------
 Harry Potter
(1 row)

When creating queries that return the same result, a differentiator between them may be their performance when compared to each other. As a general rule, JOINs are faster to run than subqueries. This may be something to bear in mind if working with large datasets.

Summary

We've covered a lot of content in this chapter, from exploring how joins work at a conceptual level, through working with different types of joins, and finally to useful techniques such as aliasing and subqueries.

One of the most important things to remember about how joins work is that we set a condition that compares a value from the first table (usually a primary key), with one from the second table (usually a foreign key). If the condition that uses these two values evaluates to true, then the row that holds the first value is joined with the row that holds the second value.

Let's quickly recap on some of the different types of join we can use:

Join Type Notes
INNER Combines rows from two tables whenever the join condition is met.
LEFT Same as an inner join, except rows from the first table are added to the join table, regardless of the evaluation of the join condition.
RIGHT Same as an inner join, except rows from the second table are added to the join table, regardless of the evaluation of the join condition.
FULL A combination of left join and right join.
CROSS Doesn't use a join condition. The join table is the result of matching every row from the first table with the second table, the cross product of all rows across both tables.

We using joins, sometimes our queries can get unwieldy, especially when we're dealing with 2 or more JOINs. To better manage this we can alias table and column names to shorten our query. We can also use aliasing to give more context about the query results.

Finally, the result from a join query can sometimes be obtained using different methods. Subqueries offer another method for us to query the database and retrieve the same results of similar results, as if we had used a JOIN clause.

We're almost finished with our Introduction to SQL. In the next and final chapter we'll sum up everything we've learned so far and point you towards some next steps and resources to continue your learning!