In the previous chapter we used
INSERT to add some data to our
users table. Now that our table contains data, we can use
SELECT to access, or query, that data in various ways. Querying data forms the Read part of our CRUD operations, and is arguably the most common operation in database-backed applications.
We saw earlier in a brief SQL tutorial that
SELECT may be used to query a wide array of data in a table: from specific columns to specific rows, or even a combination of the two. In this chapter and the next, we're going to go into more depth on what constitutes a
SELECT statement. While the options we've covered so far are important, they're just a small part of what you can do with
SELECT; in this and the following chapter, we'll dive a bit deeper into this important command and its usage.
Let's start by breaking down the
SELECT statement into individual generalized parts, as we've done for SQL statements in previous chapters:
SELECT column_name, ... FROM table_name WHERE condition;
The ellipsis (
...) in the above command format indicates that multiple column names may be specified. You can also use
* if you want to specify all columns.
This example is relatively straightforward. A
SELECT statement is very flexible, and can be used with a number of different clauses. We'll look at some other clauses and add them to our example as we move through this chapter. For now let's focus on the three parts shown above:
Let's look at an example using our
SELECT enabled, full_name FROM users WHERE id < 2;
This should give us the following response:
enabled | full_name ---------+-------------- f | John Smith (1 row)
This shows the columns we specified, and the data in those columns for the rows which match the
There are several things to note here:
idcolumn in our
WHEREcondition so that it is used to filter our table, but we didn't specify
idin our column list so the values in that column are not included in our results
1 rowis returned. This is the number of rows in the table that match the
WHEREcondition is a 'less than' operator. We'll talk about operators some more later in this chapter
Identifiers and Key Words
In a SQL statement such as
SELECT enabled, full_name FROM users; there are identifiers and keywords. The identifiers, such as
users, identify tables or columns within a table. The keywords, such as
FROM, tell PostgreSQL to do something specific.
Since SQL is not a case-sensitive language, the case differences in our example can't be used by PostgreSQL to differentiate between identifiers and keywords. Instead it assumes that anything which is not a keyword (or operator, or function) is an identifier and so treats it as such. What do we do if we want to use an identifier that is the same as a keyword? For example, we might have a column called
year, which is actually a reserved word in PostgreSQL. If used in certain SQL statements, depending on the context, this identifier would cause an error.
Generally it's best to try and avoid naming columns the same as keywords for this very reason. If it's unavoidable however, you can double quote the identifier in your statement:
"year". PostgreSQL then knows to treat it specifically as an identifier rather than as a keyword.
Thus far, we've looked at using the
WHERE clause to filter results. Another clause that you might use in a
SELECT query is the
ORDER BY clause. Rather than returning only certain rows, as
ORDER BY displays the results of a query in a particular sort order. This can be useful for a number of reasons. For example, you may have a web page that display blog posts in order of the most recently created. In order to facilitate this, when you go retrieve the blog post records from your database, you could write your query so that they are returned in descending order based on time of creation. It all depends on the data sets you are dealing with and for what purpose you use that data. Being able to order data to fit your needs is an important part of utilizing SQL.
SQL allows returning sorted data by adding the
ORDER BY column_name clause to a query. Let's add to our
SELECT query syntax example with an
ORDER BY clause.
SELECT column_name, ... FROM table_name WHERE condition ORDER BY column_name;
ORDER BY clause comes after we have specified the table name. If our statement includes a
WHERE clause, the
ORDER BY clause comes after this also.
Let's try using
ORDER BY by running a query on our
users table, ordering it by
SELECT full_name, enabled FROM users ORDER BY enabled;
full_name | enabled --------------+--------- John Smith | f Jane Smith | t Harry Potter | t (3 rows)
There are a couple of things to note here:
truein ascending order
Harry Potter, have the same value
enabledcolumn we are ordering by, the sort order between those two rows is arbitrary.
You may have noticed that the order of our rows hasn't actually changed here. Since the one
false value for the
enabled column is in a row that is already above the rows with
true in that column, PostgreSQL doesn't actually need to reorder any of the rows and so can return them in their existing order.
You can fine tune your ordering with the
ORDER BY clause by specifying the sort direction, either ascending or descending, using the keywords
DESC. If omitted, then the default is
ASC, which is why our previous query was sorted by
enabled in ascending order.
If we add
DESC after the
ORDER BY expression in our previous query, the rows with
enabled should be ordered before those with
Let's execute the query to check:
SELECT full_name, enabled FROM users ORDER BY enabled DESC;
full_name | enabled --------------+--------- Jane Smith | t Harry Potter | t John Smith | f (3 rows)
We can see that the order has changed.
Jane Smith and
Harry Potter are now above
John Smith since
enabled is now 'descending' from
f. The order between the
Jane Smith and
Harry Potter rows hasn't changed however, since they both have the same value for
You can fine tune your ordering even further by returning results ordered by more than one column. This is done by having comma-separated expressions in the
ORDER BY clause. If we add
id DESC to our
ORDER BY clause, the rows will first be ordered by
enabled as in our previous example, but then within any sets of rows which have identical values for
enabled a second level of ordering will be applied, this time by
id in descending order.
Let's try it out:
SELECT full_name, enabled FROM users ORDER BY enabled DESC, id DESC;
full_name | enabled --------------+--------- Harry Potter | t Jane Smith | t John Smith | f (3 rows)
Harry Potter is now above
Jane Smith since the value of his
3 is greater than that of hers
2 and we are ordering by
id in a descending order.
John Smith is still below both of them since we first order by
enabled in descending order. The value in
id doesn't affect the sort order for the
John Smith row, though coincidentally this row would have still have come last if we'd ordered our rows just according to descending
id without first ordering by
There's a couple of things you may have noticed about our query:
WHEREclause, we can
ORDER BYa column even if we do not include it in our column list
Now we've looked at the basics of filtering and ordering our data, let's explore how we make our queries even more powerful by using Operators. You've already encountered a few operators in this book such as
<, but until now we haven't really said too much about them.
Operators are generally used as part of an expression in a
WHERE clause. We'll briefly look at a few different operators and how to use them as part of a
SELECT query by grouping some of them into the following different types:
The operators we discuss below are only a selection of those available within PostgreSQL; they do however represent some of the most commonly used operators and fundamental use cases that you will encounter.
These operators are used to compare one value to another. Often these values are numerical, but other data types can also be compared. Examples of comparison operator would be 'less than'
< or 'not equal to'
<> (both of which we've already encountered).
Within the expression of a
WHERE clause, the comparison operator is placed in between the two things being compared; i.e. the column name and the specific value to be compared against the values in that column. Let's look at an example, using the 'greater than or equal to' operator
SELECT full_name, enabled, last_login FROM users WHERE id >= 2;
The above example should return all of the rows for which the value in the
id column is greater than or equal to
2; since this is every row except for the first, the second and third rows should be returned.
Executing the statement should get the following response:
full_name | enabled | last_login --------------+---------+---------------------------- Jane Smith | t | 2017-10-25 10:26:50.295461 Harry Potter | t | 2017-10-25 10:26:50.295461 (2 rows)
Some other comparison operators that work in a similar way are listed below:
||less than or equal to|
||greater than or equal to|
As well as the comparison operators listed above, there are what is termed comparison predicates which behave much as operators but have special syntax. Examples include
IS DISTINCT FROM,
IS NOT DISTINCT FROM. We won't discuss these in this book, though there are two important ones which we will briefly cover:
IS NULL and
IS NOT NULL.
NULL is a special value in SQL which actually represents the absence of a value or an unknown value. Don't worry too much about the finer details of what this means for now, we'll explore this some more later in the curriculum. On a practical level though, what this means is that we can't simply treat
NULL as we would any other value. We couldn't, for example, have a
WHERE clause in the form
WHERE column_name = NULL. When identifying
NULL values we must instead use the
IS NULL comparison predicate.
SELECT * FROM my_table WHERE my_column IS NULL;
The above example would select all rows for which the column
my_column had a
IS NOT NULL would do the opposite; i.e. it would select all rows for which the column had any value other than
Logical operators can be used to provide more flexibility to your expressions. There are three logical operators:
The third one,
NOT is less commonly used than the other two, so we won't cover it here. The
OR operators allow you to combine multiple conditions in a single expression. Let's try them out with some quick examples:
SELECT * FROM users WHERE full_name = 'Harry Potter' OR enabled = 'false';
The above statement should retrieve two rows. This is because there is one row where the value of
"Harry Potter" and so matches the first condition, and another row where
false and so matches the second condition. Since by using
OR we are interested in rows that satisfy either condition, both of these rows should be returned. The other row in our table didn't satisfy either of the conditions and so shouldn't be returned.
Let's execute the statement to test this out:
id | full_name | enabled | last_login ----+--------------+---------+---------------------------- 1 | John Smith | f | 2017-10-25 10:26:10.015152 3 | Harry Potter | t | 2017-10-25 10:26:50.295461 (2 rows)
If we changed that statement to use the logical operator
AND instead of
OR, the response would be quite different.
SELECT * FROM users WHERE full_name = 'Harry Potter' AND enabled = 'false';
id | full_name | enabled | last_login ----+-----------+---------+------------ (0 rows)
AND we are saying that we are only interested in rows that satisfy both conditions. Since there are no rows where
"Harry Potter" and
enabled is also
false, no rows satisfy our overall condition and so
0 rows are returned.
String, or pattern, matching allows you to add flexibility to your conditional expressions in another way, by searching for a sub-set of the data within a column. For instance, let's say we wanted to find all users with the last name Smith. We can't directly check if
full_name is equal to Smith, since Smith is only part of the entire name. We need a way to look at a substring within the entire name. As the name suggests, string matching can only be done when the data type of the column is a string type. It is most often carried out using the
LIKE operator. Let's try it out by using our
SELECT * FROM users WHERE full_name LIKE '%Smith';
WHERE clause here looks very much like other
WHERE clauses we've seen so far in this book, except where the
= operator would normally be, we use the
LIKE operator instead. Also notice the use of the
% character in the value that we want to match against; this is a wildcard character. By putting
% just before Smith we are saying:
Match all users that have a full name with any number of characters followed by "Smith"
Test this out by executing the statement. You should receive the following response:
id | full_name | enabled | last_login ----+------------+---------+---------------------------- 1 | John Smith | f | 2017-10-25 10:26:10.015152 2 | Jane Smith | t | 2017-10-25 10:26:50.295461 (2 rows)
LIKE is case sensitive:
LIKE %Smith matches
Smith but not
SMITH. If you want a case-insensitive selection, you can use
As well as the
% character, the underscore
_ can also be used as a wildcard with
_ stands in for only a single character whereas
% stands in for any number of characters.
An alternative to
ILIKE for pattern matching is to use
SIMILAR TO. It works in much the same way as
LIKE, except that it compares the target column to a Regex (Regular Expression) pattern. We won't cover
SIMILAR TO or Regular Expressions in this book, though if you would like to learn more about Regex, we have another book about that topic! (Note: the Regex book doesn't specifically cover SQL-flavored Regex).
SELECT statement is probably the most commonly used statement in SQL. Every database-backed application will need to present data to users in some way. In order to meet the many and varied data presentation requirements of different applications, developers need to be able to query data in very specific ways. In this chapter we've explored the
SELECT statement syntax, and looked at a few ways in which we can fine-tune our queries, such as using operators or ordering results.
Let's quickly recap:
|ORDER BY column_name [ASC, DESC]||Orders the data selected by a column name within the associated table. Data can be ordered in descending or ascending order; if neither are specified, the query defaults to ascending order. The brackets (``) indicate that you can select `ASC` or `DESC`, but not both.|
|WHERE column_name [>,<, >=, <=, =, <>] value||Filters a query result based on some comparison between a column's value and a specified literal value. There are several comparison operators available for use, from "greater than" to "not equal to". The brackets (``) indicate that you can choose any of the operators, but only one of them.|
|WHERE expression1 [AND, OR] expression2||Filters a query result based whether one expression is true [and,or] another expression is true. The brackets (``) indicate that you can choose `AND` or `OR`, but not both.|
|WHERE string_column LIKE '%substring'||Filters a query result based on whether a substring is contained within string_column's data and has any number of characters before that substring. Those characters are matched using the wildcard
In the next chapter, we'll continue to explore
SELECT and learn some more ways in which queries can be fine-tuned to return exactly the data that you need.
Make sure you are connected to the
encyclopedia database. Write a query to retrieve the population of the USA.
SELECT population FROM countries WHERE name = 'USA';
population ------------ 325365189 (1 row)
Write a query to return the population and the capital (with the columns in that order) of all the countries in the table.
SELECT population, capital FROM countries;
population | capital ------------+----------------- 67158000 | Paris 325365189 | Washington D.C. 82349400 | Berlin 126672000 | Tokyo (4 rows)
Write a query to return the names of all the countries ordered alphabetically.
SELECT name FROM countries ORDER BY name;
name --------- France Germany Japan USA (4 rows)
Write a query to return the names and the capitals of all the countries in order of population, from lowest to highest.
SELECT name, capital FROM countries ORDER BY population;
name | capital ---------+----------------- France | Paris Germany | Berlin Japan | Tokyo USA | Washington D.C. (4 rows)
Write a query to return the same information as the previous query, but ordered from highest to lowest.
SELECT name, capital FROM countries ORDER BY population DESC;
name | capital ---------+----------------- USA | Washington D.C. Japan | Tokyo Germany | Berlin France | Paris (4 rows)
Write a query on the
animals table, using
ORDER BY, that will return the following output:
name | binomial_name | max_weight_kg | max_age_years ------------------+--------------------------+---------------+--------------- Peregrine Falcon | Falco Peregrinus | 1.5000 | 15 Pigeon | Columbidae Columbiformes | 2.0000 | 15 Dove | Columbidae Columbiformes | 2.0000 | 15 Golden Eagle | Aquila Chrysaetos | 6.3500 | 24 Kakapo | Strigops habroptila | 4.0000 | 60 (5 rows)
Use only the columns that can be seen in the above output for ordering purposes.
SELECT name, binomial_name, max_weight_kg, max_age_years FROM animals ORDER BY max_age_years, max_weight_kg, name DESC;
To return the data in the order specified, we need to order the data by three different columns.
max_age_years, affects all five rows. Within this ordering, since three of the rows (
Dove) have the same value, the ordering between them is arbitrary.
max_weight_kg, affects only these three rows; whatever happens at this level of ordering doesn't affect the initial level of ordering which placed the
Kakaporows at the bottom. Here again two of the rows (
Dove) have the same value, so the ordering between them is arbitrary.
name, affects only those two rows who had the same value for the previous order condition. Here we specify a sort direction so that the
Pigeonrow ends up above the
Try running the query a few times, adding or removing one of the sort conditions each time to see how that affects the order.
Write a query that returns the names of all the countries with a population greater than 70 million.
SELECT name FROM countries WHERE population > 70000000;
name --------- USA Germany Japan (3 rows)
Write a query that returns the names of all the countries with a population greater than 70 million but less than 200 million.
SELECT name FROM countries WHERE population > 70000000 AND population < 200000000;
name --------- Germany Japan (2 rows)
Write a query that will return the first name and last name of all entries in the celebrities table where the value of the deceased column is not true.
SELECT first_name, last_name FROM celebrities WHERE deceased != true OR deceased IS NULL;
first_name | last_name ------------+------------- Bruce | Springsteen Scarlett | Johansson Tom | Cruise Madonna | Elvis | Presley (5 rows)
You might expect to be able to just use the first part of that
WHERE expression (to the left of the
OR). Doing this, however, excludes Elvis from our results as he has a
NULL value for the
SELECT first_name, last_name FROM celebrities WHERE deceased != true;
first_name | last_name ------------+------------- Bruce | Springsteen Scarlett | Johansson Tom | Cruise Madonna | (4 rows)
NULL values we must use
IS NULL rather than a normal comparison operator. This is an example of why you might want to avoid the possibility of
NULL values in a column with a boolean data type.
Write a query that will return the first and last names of all the celebrities who sing.
SELECT first_name, last_name FROM celebrities WHERE occupation LIKE '%Singer%';
first_name | last_name ------------+------------- Bruce | Springsteen Frank | Sinatra Madonna | Prince | Elvis | Presley (5 rows)
Here we use the
LIKE clause with a string containing wildcards. Two wildcards are used in the query, one either side of the word
Singer, since this word could potentially appear anywhere within the
Write a query that will return the first and last names of all the celebrities who act.
SELECT first_name, last_name FROM celebrities WHERE occupation LIKE '%Actor%' OR occupation LIKE '%Actress%';
first_name | last_name ------------+----------- Scarlett | Johansson Frank | Sinatra Tom | Cruise Madonna | Prince | Elvis | Presley (6 rows)
Here we could potentially have used a single
LIKE '%Act%' rather than two separated by an
OR. In that case we would have to be careful that there wasn't the possibility of that three letter combination appearing elsewhere in the string. When using string matching it's important to make your search terms specific enough for the data that you want to retrieve.
Write a query that will return the first and last names of all the celebrities who both sing and act.
You can use parentheses to group sub-clauses together.
SELECT first_name, last_name FROM celebrities WHERE (occupation LIKE '%Actor%' OR occupation LIKE '%Actress%') AND occupation LIKE '%Singer%';
first_name | last_name ------------+----------- Frank | Sinatra Madonna | Prince | Elvis | Presley (4 rows)
This one is a little tricky. We've not really talked about operator precedence in PostgreSQL, and we won't go into too much detail here other than to say that certain operators (e.g.
AND) have higher precedence than others (e.g.
OR). When writing queries that combine a number of these operators, it is advisable to use parentheses to group them together in order to avoid any potential precedence issues.
Try running the above query without the parentheses and see if you can work out why the result is different.
Connect to the
ls_burger database. Write a query that lists all of the burgers that have been ordered, from cheapest to most expensive, where the cost of the burger is less than $5.00.
SELECT burger FROM orders WHERE burger_cost < 5.00 ORDER BY burger_cost;
burger ------------------- LS Burger LS Cheeseburger LS Chicken Burger (3 rows)
Write a query to return the customer name and email address and loyalty points from any order worth 20 or more loyalty points. List the results from the highest number of points to the lowest.
SELECT customer_name, customer_email, customer_loyalty_points FROM orders WHERE customer_loyalty_points >= 20 ORDER BY customer_loyalty_points DESC;
customer_name | customer_email | customer_loyalty_points ----------------+-------------------------+------------------------- Natasha O'Shea | email@example.com | 42 James Bergman | firstname.lastname@example.org | 28 (2 rows)
Write a query that returns all the burgers ordered by Natasha O'Shea.
SELECT burger FROM orders WHERE customer_name = 'Natasha O''Shea';
burger ------------------------- LS Cheeseburger LS Double Deluxe Burger (2 rows)
Just as when adding data to a table with
INSERT, we need to escape quote marks in our
SELECT queries too.
Write a query that returns the customer name from any order which does not include a drink item.
SELECT customer_name FROM orders WHERE drink IS NULL;
customer_name ---------------- Natasha O'Shea Aaron Muller (2 rows)
Write a query that returns the three meal items for any order which does not include fries.
SELECT burger, side, drink FROM orders WHERE side != 'Fries' OR side IS NULL;
burger | side | drink -------------------------+-------------+----------------- LS Double Deluxe Burger | Onion Rings | Chocolate Shake LS Burger | | (2 rows)
Here we need to account for
NULL values as well through use of the
Write a query that returns the three meal items for any order that includes both a side and a drink.
SELECT burger, side, drink FROM orders WHERE side IS NOT NULL AND drink IS NOT NULL;
burger | side | drink -------------------------+-------------+----------------- LS Chicken Burger | Fries | Cola LS Double Deluxe Burger | Onion Rings | Chocolate Shake (2 rows)
Here we need to use
IS NOT NULL to only return rows where there is a value in the specified columns.