When our data was all in a single table, we could easily retrieve a particular row from that table in order to get all the pertinent data we needed for a particular query. For example, looking at our unnormalized table below, if we wanted information on the books that 'John Smith' has checked out, we could use a simple
SELECT query such as
SELECT * WHERE full_name = 'John Smith'. This would return us the first two rows from that table, containing information such as the book title and author, and checkout and return dates.
Now that this data is split across three tables,
checkouts, we first have to join those tables together before we can select the data that we need.
SQL handles queries across more than one table through the use of JOINs. JOINs are clauses in SQL statements that link two tables together, usually based on the keys that define the relationship between those two tables. There are several types of JOINs: INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS; they all do slightly different things, but the basic theory behind them all is the same. We'll take a look at each type of
JOIN in turn, but first lets go over the general syntax that JOIN statements share.
The general syntax of a
JOIN statement is as follows:
SELECT [table_name.column_name1, table_name.column_name2,..] FROM table_name1 join_type JOIN table_name2 ON (join_condition);
The first part of this:
SELECT [table_name.column_name1, table_name.column_name2,..] FROM
is essentially the
SELECT column_list FROM form that you've already seen in previous
SELECT queries, with the slight difference that column names are prepended by table names in the column list.
Let's first focus on the second part of the statement, the part that joins the tables:
table_name1 join_type JOIN table_name2 ON (join_condition)
To join one table to another, PostgreSQL needs to know several pieces of information:
These pieces of information are combined together using the
ON keywords. The part of the statement that comes after the
ON keyword is the join condition; this defines the logic by which a row in one table is joined to a row in another table. In most cases this join condition is created using the primary key of one table and the foreign key of the table we want to join it with.
shapes example from the previous chapter? The
color_id column of the
shapes table is a Foreign Key which references the
id column of the
If we wanted a list of shapes and their colors, we could use a query like this:
SELECT colors.color, shapes.shape FROM colors JOIN shapes ON colors.id = shapes.color_id;
Within the second part of this query,
colors JOIN shapes ON color.id = shapes.color_id, the join condition condition will look at each
id value in the
colors table and attempt to match it with a
color_id value in the
shapes table. If there is a match then those two rows are joined together to form a new row in a virtual table known as a join table. Since the
1 for the color
Red appears twice in the
color_id column of our
shapes table, this row of the
colors table appears twice in our virtual join table, joined to both
Star. Since the
3 for the color
Orange does not appear at all in the
color_id column of our
shapes table, this row of the
colors table is omitted completely from our virtual join table.
With this virtual join table created, the
SELECT column_list FROM part of our statement can then be executed to select columns from this virtual table. Those columns could originally be from the first table or the second table; to avoid confusion, we therefore need to specify both the table name and column name in our column list, in the form
table_name.column_name1. Looking at our example, selecting columns from our virtual join table is effectively the same as saying:
SELECT colors.color, shapes.shape FROM shapes_colors_join_table;
The resulting data would look like this:
Now that we understand the basic principle behind how joins work, let's take a look at some specific examples of different types of join.
As mentioned earlier, a
JOIN statement can come in various forms. To specify which type of join to use, you can add either
CROSS just before the keyword
JOIN. We'll look at an example of each of those types of join using the tables in our
INNER JOIN returns a result set that contains the common elements of the tables, i.e the intersection where they match on the joined condition. INNER JOINs are the most frequently used JOINs; in fact if you don't specify a join type and simply use the
JOIN keyword, then PostgreSQL will assume you want an inner join. Our
colors example from earlier used an
INNER JOIN in this way.
In the query below, the line
INNER JOIN (addresses) ON (users.id = addresses.user_id) creates the intersection between the two tables, which means that the join table contains only rows where there is a definite match between the values in the two columns used in the condition.
SELECT users.*, addresses.* FROM users INNER JOIN addresses ON (users.id = addresses.user_id);
The data in our unjoined tables looks like this:
The result of our
SELECT query using an
INNER JOIN would look like this:
The value in the
id column of the
users table for the user
Jane Smith is
5; since this value does not appear in the
user_id column of the addresses table, she is omitted entirely from the join table and so only 3 records are returned by the query.
If we did want to include Jane Smith in our results despite her not having an address, we would have to use a different type of join, an outer join.
A LEFT JOIN or a LEFT OUTER JOIN takes all the rows from one table, defined as the
LEFT table, and joins it with a second table. The
JOIN is based on the conditions supplied in the parentheses. A
LEFT JOIN will always include the rows from the
LEFT table, even if there are no matching rows in the table it is JOINed with.
Let's try and use the same JOIN query as before, but this time we'll use a left join:
SELECT users.*, addresses.* FROM users LEFT JOIN addresses ON (users.id = addresses.user_id);
Jane Smith row from the
users table is included in the join table, since she doesn't have any matching rows in the
addresses table, the join table has
NULL values in her row for the columns of that table.
Note that the using either
LEFT JOIN or
LEFT OUTER JOIN does exactly the same thing, and the
OUTER part is often omitted. Even so, it is still common to refer to this type of join as an 'outer' join in order to differentiate it from an 'inner' join. Another type of outer join is a
RIGHT JOIN, which we'll look at next.
RIGHT JOIN is similar to a
LEFT JOIN except that the roles between the two tables are reversed, and all the rows on the second table is included along with any matching rows from the first table. In the last chapter we mentioned that in our
sql_book database we have books, and also reviews for those books. Not all of our books have reviews, however. Lets make a
RIGHT JOIN or
RIGHT OUTER JOIN that displays all reviews and their associated books, along with any books that don't have a review.
SELECT reviews.book_id, reviews.content, reviews.rating, reviews.published_date, books.id, books.title, books.author FROM reviews RIGHT JOIN books ON (reviews.book_id = books.id);
The data in our unjoined tables looks like this:
The result of our
SELECT query using an
RIGHT JOIN would look like this:
As you can see,
My Third SQL Book doesn't yet have a review, and so all the columns from the review table have
NULL values for that row in the join table.
FULL JOIN or
FULL OUTER JOIN is essentially a combination of
LEFT JOIN and
RIGHT JOIN. This type of join contains all of the rows from both of the tables. Where the join condition is met, the rows of the two tables are joined, just as in the previous examples we've seen. For any rows on either side of the join where the join condition is not met, the columns for the other table have
NULL values for that row.
FULL JOIN is a little less common than the other ones we've looked at so far and so we won't show an example for this.
Another uncommon type of join is a
CROSS JOIN; let's take a look.
CROSS JOIN, also known as a Cartesian JOIN, returns all rows from one table crossed with every row from the second table. In other words, the join table of a cross join contains every possible combination of rows from the tables that have been joined. Since it returns all combinations, a
CROSS JOIN does not need to match rows using a join condition, therefore it does not have an
The way this join works is sometimes a little difficult to envisage, so it's worth looking at an example in this case. This SQL query has the similar syntax to other
JOINs, but without the
sql_book=# SELECT * FROM users CROSS JOIN addresses;
The query above returns the
users tables, cross joined. The result set consists of every record in
users mapped to every record in
addresses. For 4 users and 3 addresses, we get a total of
4x3=12 records. In mathematical terms, this is the cross product of a set.
In an application, it's very unlikely that you would use a
CROSS JOIN. Most of the time, it's more important to match rows together through a join condition in order to return a meaningful result. It's still important to be aware of
CROSS JOINs however, since you may occasionally encounter them.
Now that we've covered the basics of joins, let's explore a couple of other useful techniques for working with multiple tables.
It is possible, and indeed common, to join more than just two tables together. This is done by adding additional
JOIN clauses to your
SELECT statement. To join multiple tables in this way, there must be a logical relationship between the tables involved. One example would be joining our
SELECT users.full_name, books.title, checkouts.checkout_date FROM users INNER JOIN checkouts ON (users.id = checkouts.user_id) INNER JOIN books ON (books.id = checkouts.book_id);
Here we are using two
INNER JOINs. One between
checkouts and one between
books and checkouts. In both cases the
JOIN is implemented by using the Primary Key of one table (either
books) and the Foreign Key for that table in the
You may have noticed that some of the queries we list above can get a bit long. We can cut back on the length of these queries by using aliasing. Aliasing allows us to specify another name for a column and then use that name in later parts of a query to allow for more concise syntax. Let's use our three table join from above as an example. Using aliasing, the query would look like this:
SELECT u.full_name, b.title, c.checkout_date FROM users AS u INNER JOIN checkouts AS c ON (u.id = c.user_id) INNER JOIN books AS b ON (b.id = c.book_id);
Here we specify single letter aliases for our tables, and use those aliases instead of our table names in order to prepend the columns from those tables in the column list and in the join conditions. This is commonly referred to as 'table aliasing'.
We can even use a shorthand for aliasing by leaving out the
AS keyword entirely.
FROM users u and
FROM users AS u are equivalent SQL clauses.
Aliasing isn't just useful for shortening SQL queries. We can also use it to display more meaningful information in our result table. For instance, if we want to display the number of checkouts from the library we could write something like:
sql_book=# SELECT count(id) AS "Number of Books Checked Out" sql_book-# FROM checkouts; Number of Books Checked Out ----------------------------- 4 (1 row)
If we hadn't used aliasing above then we lose context about what was counted.
sql_book=# SELECT count(id) FROM checkouts; count ------- 4 (1 row)
If you're a user just trying to access information, then most likely you wouldn't know about the exact tables being queried; being explicit about what information we're displaying can be important in a case like that.
So far in this chapter, we've looked at using
JOINs to work with more than one table. Although joining tables together is probably the most common way of working with multiple tables, you can often achieve the same results through use of a subquery. Before we compare subqueries and joins, let's examine what a subquery is.
Imagine executing a
SELECT query, and then using the results of that SELECT query as a condition in another
SELECT query. This is called nesting, and the query that is nested is referred to as a subquery.
For example, suppose we need to select users that have no books checked out. We could do this by finding
user_id is not in the
checkouts table. If no relation is found, that would mean that the user has not checked out any books.
sql_book=# SELECT u.full_name FROM users u WHERE u.id NOT IN (SELECT c.user_id FROM checkouts c); full_name ------------- Harry Potter (1 row)
In the code above, the
NOT IN clause compares the current
user_id to all of the rows in the result of the subquery. If that
id number isn't part of the subquery results, then the
full_name for current row is added to the result set.
This might seem a bit confusing, so let's break it down. Our initial
checkouts table looks like this:
The nested query
SELECT c.user_id FROM checkouts c returns the following results:
This virtual table can then effectively be used by our
NOT IN clause as a list of values against which to check the values in the
id column of our
The only value in that column that is not in the results of the nested query is the
id for 'Harry Potter':
PostgreSQL provides a number of expressions that can be used specifically with sub-queries, such as
ALL. These all work slightly differently, but essentially they all compare values to the results of a sub-query.
We won't go into too much detail about subqueries here as you'll get to work with them some more later in course 180. One thing it is useful to know though is that in some situations a subquery can be used as an alternative to a join.
As you write more queries, you may find that there is more than one way to write a query and achieve the same results. The most common choices are between subqueries and JOINs.
For instance, we can get the same result table as in our previous example by using a
JOIN clause instead of a subquery.
sql_book=# SELECT u.full_name FROM users u sql_book-# LEFT JOIN checkouts c ON (u.id = c.user_id) sql_book-# WHERE c.user_id IS NULL; full_name -------------- Harry Potter (1 row)
When creating queries that return the same result, a differentiator between them may be their performance when compared to each other. As a general rule, JOINs are faster to run than subqueries. This may be something to bear in mind if working with large datasets.
We've covered a lot of content in this chapter, from exploring how joins work at a conceptual level, through working with different types of joins, and finally to useful techniques such as aliasing and subqueries.
One of the most important things to remember about how joins work is that we set a condition that compares a value from the first table (usually a primary key), with one from the second table (usually a foreign key). If the condition that uses these two values evaluates to true, then the row that holds the first value is joined with the row that holds the second value.
Let's quickly recap on some of the different types of join we can use:
|INNER||Combines rows from two tables whenever the join condition is met.|
|LEFT||Same as an inner join, except rows from the first table are added to the join table, regardless of the evaluation of the join condition.|
|RIGHT||Same as an inner join, except rows from the second table are added to the join table, regardless of the evaluation of the join condition.|
|FULL||A combination of left join and right join.|
|CROSS||Doesn't use a join condition. The join table is the result of matching every row from the first table with the second table, the cross product of all rows across both tables.|
When using joins, sometimes our queries can get unwieldy, especially when we're dealing with 2 or more
JOINs. To better manage this we can alias table and column names to shorten our query. We can also use aliasing to give more context about the query results.
Finally, the result from a join query can sometimes be obtained using different methods. Subqueries offer another method for us to query the database and retrieve the same results of similar results, as if we had used a JOIN clause.
We're almost finished with our Introduction to SQL. In the next and final chapter we'll sum up everything we've learned so far and point you towards some next steps and resources to continue your learning!
Connect to the
encyclopedia database. Write a query to return all of the country names along with their appropriate continent names.
encyclopedia=# SELECT countries.name, continents.continent_name encyclopedia-# FROM countries JOIN continents encyclopedia-# ON countries.continent_id = continents.id; name | continent_name ---------+---------------- Brazil | South America Egypt | Africa France | Europe Germany | Europe Japan | Asia USA | North America (6 rows)
Write a query to return all of the names and capitals of the European countries.
WHERE clause as the final clause of the query.
encyclopedia=# SELECT countries.name, countries.capital encyclopedia-# FROM countries JOIN continents encyclopedia-# ON countries.continent_id = continents.id encyclopedia-# WHERE continents.continent_name = 'Europe'; name | capital ---------+--------- France | Paris Germany | Berlin (2 rows)
Write a query to return the first name of any singer who had an album released under the Warner Bros label.
encyclopedia=# SELECT DISTINCT singers.first_name encyclopedia-# FROM singers JOIN albums encyclopedia-# ON singers.id = albums.singer_id encyclopedia-# WHERE albums.label LIKE '%Warner Bros%'; first_name ------------ Prince Madonna (2 rows)
The use of
DISTINCT here ensures that the names aren't repeated in our results.
Write a query to return the first name and last name of any singer who released an album in the 80s and who is still living, along with the names of the album that was released and the release date. Order the results by the singer's age (youngest first).
Here we need to group our
WHERE conditions together using parentheses.
encyclopedia=# SELECT singers.first_name, singers.last_name, albums.album_name, albums.released encyclopedia-# FROM singers JOIN albums encyclopedia-# ON singers.id = albums.singer_id encyclopedia-# WHERE (albums.released > '1980-01-01' encyclopedia(# AND albums.released < '1990-01-01') encyclopedia-# AND singers.deceased = false encyclopedia-# ORDER BY singers.date_of_birth DESC; first_name | last_name | album_name | released ------------+-------------+-----------------+------------ Madonna | | Madonna | 1983-07-27 Madonna | | True Blue | 1986-06-30 Bruce | Springsteen | Born in the USA | 1984-06-04 (3 rows)
Write a query to return the first name and last name of any singer without an associated album entry.
encyclopedia=# SELECT singers.first_name, singers.last_name encyclopedia-# FROM singers LEFT JOIN albums encyclopedia-# ON singers.id = albums.singer_id encyclopedia-# WHERE albums.id IS NULL; first_name | last_name ------------+----------- Frank | Sinatra (1 row)
Here we use a combination of a
LEFT JOIN and
IS NULL. The
LEFT JOIN will create a join table using every row in the
singers table and any matching rows in
albums table. Since Frank Sinatra doesn't have any matching rows in the
albums table, the value for the
albums.id column for his row in the join table will be
NULL; this allows us to use
IS NULL as part of a
WHERE clause to return only this row.
Rewrite the query for the last question as a sub-query.
encyclopedia=# SELECT first_name, last_name encyclopedia-# FROM singers encyclopedia-# WHERE id NOT IN (SELECT singer_id FROM albums); first_name | last_name ------------+----------- Frank | Sinatra (1 row)
Here, the sub-query returns a list of id values from the
singer_id column of the
albums table. We can then query the
singers table for any row where the
id is not in that list.
Connect to the
ls_burger database. Return a list of all orders and their associated product items.
ls_burger=# SELECT orders.*, products.* ls_burger-# FROM orders JOIN order_items ls_burger-# ON orders.id = order_items.order_id ls_burger-# JOIN products ls_burger-# ON order_items.product_id = products.id; id | customer_id | order_status | id | product_name | product_cost | product_type | product_loyalty_points ----+-------------+--------------+----+-------------------------+--------------+--------------+------------------------ 1 | 1 | In Progress | 3 | LS Chicken Burger | 4.50 | Burger | 20 1 | 1 | In Progress | 5 | Fries | 1.20 | Side | 3 1 | 1 | In Progress | 6 | Onion Rings | 1.50 | Side | 5 1 | 1 | In Progress | 8 | Lemonade | 1.50 | Drink | 5 2 | 2 | Placed | 2 | LS Cheeseburger | 3.50 | Burger | 15 2 | 2 | Placed | 5 | Fries | 1.20 | Side | 3 2 | 2 | Placed | 7 | Cola | 1.50 | Drink | 5 3 | 2 | Complete | 4 | LS Double Deluxe Burger | 6.00 | Burger | 30 3 | 2 | Complete | 2 | LS Cheeseburger | 3.50 | Burger | 15 3 | 2 | Complete | 5 | Fries | 1.20 | Side | 3 3 | 2 | Complete | 5 | Fries | 1.20 | Side | 3 3 | 2 | Complete | 10 | Chocolate Shake | 2.00 | Drink | 7 3 | 2 | Complete | 9 | Vanilla Shake | 2.00 | Drink | 7 4 | 3 | Placed | 1 | LS Burger | 3.00 | Burger | 10 4 | 3 | Placed | 5 | Fries | 1.20 | Side | 3 (15 rows)
Return the id of any order that includes Fries. Use table aliasing in your query.
ls_burger=# SELECT o.id ls_burger-# FROM orders AS o JOIN order_items AS oi ls_burger-# ON o.id = oi.order_id ls_burger-# JOIN products AS p ls_burger-# ON oi.product_id = p.id ls_burger-# WHERE p.product_name LIKE 'Fries'; id ---- 1 2 3 3 4 (5 rows)
Build on the query from the previous question to return the name of any customer who ordered fries. Return this in a column called 'Customers who like Fries'. Don't repeat the same customer name more than once in the results.
ls_burger=# SELECT DISTINCT c.customer_name AS "Customers who like Fries" ls_burger-# FROM customers AS c JOIN orders as o ls_burger-# ON c.id = o.customer_id ls_burger-# JOIN order_items AS oi ls_burger-# ON o.id = oi.order_id ls_burger-# JOIN products AS p ls_burger-# ON oi.product_id = p.id ls_burger-# WHERE p.product_name = 'Fries'; Customers who like Fries -------------------------- Natasha O'Shea Aaron Muller James Bergman (3 rows)
Write a query to return the total cost of Natasha O'Shea's orders.
ls_burger=# SELECT sum(p.product_cost) ls_burger-# FROM customers AS c JOIN orders as o ls_burger-# ON c.id = o.customer_id ls_burger-# JOIN order_items AS oi ls_burger-# ON o.id = oi.order_id ls_burger-# JOIN products AS p ls_burger-# ON oi.product_id = p.id ls_burger-# WHERE c.customer_name = 'Natasha O''Shea'; sum ------- 25.10 (1 row)
Write a query to return the name of every product included in an order alongside the number of times it has been ordered.
ls_burger=# SELECT p.product_name, COUNT(oi.id) ls_burger-# FROM products AS p JOIN order_items AS oi ls_burger-# ON p.id = oi.product_id ls_burger-# GROUP BY p.product_name; product_name | count -------------------------+------- LS Cheeseburger | 2 Fries | 5 LS Burger | 1 LS Chicken Burger | 1 Onion Rings | 1 Vanilla Shake | 1 Lemonade | 1 LS Double Deluxe Burger | 1 Chocolate Shake | 1 Cola | 1 (10 rows)
Here we use the
count() function along with a
GROUP BY clause to count the order rows in each product name grouping.