More on Select

As we stated earlier, the SELECT command is one of the most important and frequently used features of the SQL language. It is also very powerful. In the earlier chapters we discovered a few of the features we get with SELECT -- namely, selecting specific columns, selecting specific rows by filtering in various ways, and ordering the results of our queries. In this chapter we'll look at some more of the features available within the syntax of the SELECT statement.

We'll introduce the use functions to process data. We'll also explore how data can be grouped together based on various criteria. First of all though, let's look at how we can further filter our data by adding LIMIT, OFFSET, and DISTINCT clauses to our queries.

LIMIT and OFFSET

The SELECT statements we've used so far have returned all of the results that match the conditions of our statement. That's not a problem for our users table, since we only have a few rows of data in it. When working with large datasets, a common requirement is to only display one portion of the data at a time.

Displaying portions of data as separate 'pages' is a user interface pattern used in many web applications, generally referred to as 'pagination'. An example of this can be seen in the Launch School forum pages, where twelve forum posts are displayed on the first 'page' and you need to navigate to the next 'page' to see the next twelve.

pagination example

The LIMIT and OFFSET clauses of SELECT are the base on which pagination is built. Let's look at how it works.

Say we are outputting the details of all of our users to an user admin page of our web application. We perhaps use this page to manage individual users and so only want to display one user at a time. We can do this by adding a LIMIT clause to our SELECT query:

sql_book=# SELECT * FROM users LIMIT 1;

Setting the limit to 1 lets us select a single row, the first row, from our table.

LIMIT example

Executing this query should produce the following result:

 id | full_name  | enabled |         last_login         
----+------------+---------+----------------------------
  1 | John Smith | f       | 2017-10-25 10:26:10.015152
(1 row)

If we want to return the second row instead, we need a way to skip the first row when querying data.

LIMIT and OFFSET example

We can do this with the OFFSET clause. This clause will skip the first N rows. Let's give it a try.

sql_book=# SELECT * FROM users LIMIT 1 OFFSET 1;
 id |  full_name   | enabled |         last_login         
----+--------------+---------+----------------------------
  2 | Jane Smith   | t       | 2017-10-25 10:26:50.295461
(1 row)

Here we still only return one row, due to the LIMIT clause, but since we have OFFSET our query by 1, it is the second row that is returned instead of the first. For each row we want to skip, we can increment the value of offset by 1.

If we want to return more than one result per page, we can adjust the values of LIMIT and OFFSET accordingly. For example, we could imagine the query for page 2 of the Launch School general forum looking something like this:

SELECT topic, author, publish_date, category, replies_count, likes_count, last_activity_date
FROM posts LIMIT 12 OFFSET 12;

As well as specific use cases such as pagination, LIMIT can also be useful in development when testing our queries. We can use LIMIT to get a preview or taste of what data is available or would be returned rather than returning the entire dataset. This is especially useful during development when forming your queries and getting an understanding of the dataset and data quality.

On the subject of data quality issues, a common one that you might encounter is duplicate data in your tables. There are situations where you might need to formulate queries in a certain way in order to deal with duplication. One way to do that is with the DISTINCT clause, which we'll look at next.

DISTINCT

Sometimes duplicate data is unavoidable. For example, we might get duplication when joining more than one table together. We'll delve into working with multiple tables later in this book, for now let's look at a way to deal with duplication by using the DISTINCT qualifier. To demonstrate how this works let's first add some duplicate data to our users table.

INSERT INTO users (id, full_name) VALUES (4, 'Harry Potter');
INSERT INTO users (id, full_name) VALUES (5, 'Jane Smith');
id |  full_name   | enabled |         last_login         
----+--------------+---------+----------------------------
 1 | John Smith   | f       | 2017-10-25 10:26:10.015152
 2 | Jane Smith   | t       | 2017-10-25 10:26:50.295461
 3 | Harry Potter | t       | 2017-10-25 10:26:50.295461
 4 | Harry Potter | t       | 2017-10-25 10:36:38.188286
 5 | Jane Smith   | t       | 2017-10-25 10:36:43.324015
(5 rows)

If we select the full_name column from our table, we get five rows back, two of which contain duplicate names:

sql_book=# SELECT full_name FROM users;
 full_name   
--------------
 John Smith
 Jane Smith
 Harry Potter
 Harry Potter
 Jane Smith
(5 rows)

We can use DISTINCT as part of our SELECT query to only return disctinct, or unique, values. Try running the following query:

sql_book=# SELECT DISTINCT full_name FROM users;
 full_name
--------------
 John Smith
 Jane Smith
 Harry Potter
(3 rows)

Here we only get three rows back, since there are only three disinct values in our full_name column.

SELECT DISTINCT example

DISTINCT can be useful when used in conjunction with SQL functions. Compare, for example, the results of the following two queries using the count() function:

sql_book=# SELECT count(full_name) FROM users;
 count
-------
     5
(1 row)
sql_book=# SELECT count(DISTINCT full_name) FROM users;
 count
-------
     3
(1 row)

The first query only really tells us how many rows of data we have in our table. The second one tells us how many unique names we have in the full_name column.

We haven't really looked at SQL functions yet, so let's do that now.

Functions

Functions are a way of working with data in SQL that may seem a little more familiar if you're coming to SQL from a programming language such as Ruby or JavaScript. Functions are a set of commands included as part of the RDBMS that perform particular operations on fields or data. Some functions provide data transformations that can be applied before returning results. Others simply return information on the operations carried out.

These functions can generally be grouped into different types. Some of the most commonly used types of functions are:

  1. String
  2. Date/Time
  3. Aggregate

We're not going to go into detail here on each function that exists for every function type. The most important thing is to know what types of functions exist and have a general understanding of how and when to use them. When you need to perform a specific operation, you can always look up a function for that operation in the documentation.

String Functions

String Functions, as their name suggests, perform some sort of operation on values whose data type is String. Some examples are:

Function Example Notes
length SELECT length(full_name) FROM users; This returns the length of every user's name. You could also use length in a WHERE clause to filter data based on name length.
trim SELECT trim(leading ' ' from full_name) FROM users; If any of the data in our full_name column had a space in front of the name, using the trim function like this would remove that leading space.

Date/ Time Functions

Just as string functions perform operations on String data, Date/ Time functions, for the most part, perform operations on date and time data. Many of the date/ time functions take time or timestamp inputs. Our last_login column, for example, has a data type of timestamp and so data in that column can act as an argument to such functions:

Function Example Notes
date_part SELECT full_name, date_part('year', last_login) FROM users; date_part allow us to view a table that only contains a part of a user's timestamp that we specify. The above query allows us to see each user's name along with the year of the last_login date. Sometimes having date/time data down to the second isn't needed
age SELECT full_name, age(last_login) FROM users; The age function, when passed a single timestamp as an argument, calculates the time elapsed between that timestamp and the current time. The above query allows us to see how long it has been since each user last logged in.

Aggregate Functions

Aggregate functions perform aggregation; that is, they compute a single result from a set of input values. We briefly looked at one of these count, a little earlier in this chapter.

Function Example Notes
count SELECT count(id) FROM users; Returns the number of values in the column passed in as an argument. This type of function can be very useful depending on the context. We could find the number of users who have enabled account, or even how many users have certain last names if we use the above statement with other clauses.
sum SELECT sum(id) FROM users; Not to be confused with count. This sums numeric type values for all of the rows being selected to return total.
min SELECT min(last_login) FROM users; This returns the the lowest value in a column for all of the rows being selected. Can be used with various data types such as numeric, date/ time, and string.
max SELECT max(last_login) FROM users; This returns the highest value in a column for all of the rows being selected. Can be used with various data types such as numeric, date/ time, and string.
avg SELECT avg(id) FROM users; Returns the average (arithmetic mean) of numeric type values for all of the rows being selected.

Currently the count function in our table above isn't all that useful. It would simply count how many values we have in the id column; since we have five rows in our table, we have five values in the id column, so the result would be 5. You can quickly execute that statement just to check:

sql_book=# SELECT count(id) FROM users;
 count
-------
     5
(1 row)

Aggregate functions really start to be useful when grouping table rows together. The way we do that is by using the GROUP BY clause.

GROUP BY

Sometimes you need to combine data results together to form more meaningful information. Let's say we wanted to implement one of the examples mentioned earlier, and count the number of users who have accounts that are or are not enabled. How would we approach this?

We could try using a WHERE clause to first filter the results to rows where the enabled column has a value of true, and then use the count function to count the values in the id column (which is equal to the number of rows, we could actually use any column here).

sql_book=# SELECT count(id) FROM users WHERE enabled = true;
 count
-------
     4
(1 row)

This only shows the number of rows where the value of enabled is true though; if we want to know how many rows there are where enabled is false, we have to change our condition to WHERE enabled = false.

What if we want to show the counts for true and false at the same time? What we need is some way to split up the true and false rows into two separate groups and then count the rows in each group.

GROUP BY example

The GROUP BY clause allows us to do exactly that:

sql_book=# SELECT enabled, count(id) FROM users GROUP BY enabled;
 enabled | count
---------+-------
 f       |     1
 t       |     4
(2 rows)

Although we don't need to include enabled in our column list in order to use it in our GROUP BY clause, doing so makes our output more meaningful as we can clearly see the correlation between the count and the values in enabled.

One thing to be aware of when using aggregate functions, is that if you include columns in the column list alongside the function then those columns must also be included in a GROUP BY clause. For example, both of the following statements would return an error:

SELECT enabled, count(id) FROM users;
SELECT enabled, id, count(id) FROM users GROUP BY enabled;

Summary

In this chapter we've built on our knowledge of SELECT, and looked at a number of different ways we can make our SELECT queries more flexible and powerful:

  • Returning portions of a dataset using LIMIT and OFFSET
  • Returning unique values using DISTINCT
  • Using SQL functions to work with data in various ways
  • Aggregating data using GROUP BY

So far in this book we've looked at a lot of the things that you need to know in order to work with a single table in a database; from creating, altering, or even deleting the table itself, through to adding data and then querying that data in a number of different ways. There's one thing we haven't looked at yet, and that's changing data that already exists in your table.

In the next chapter we'll do exactly that by looking at the UPDATE and DELETE commands that will let us modify existing data in our table.

Exercises

  1. Make sure you are connected to the encyclopedia database. Write a query to retrieve the first row of data from the countries table.

    Solution

    encyclopedia=# SELECT * FROM countries LIMIT 1;
     id |  name  | capital | population
    ----+--------+---------+------------
      1 | France | Paris   |   67158000
    (1 row)
    
  2. Write a query to retrieve the name of the country with the largest population.

    Solution

    encyclopedia=# SELECT name FROM countries
    encyclopedia-# ORDER BY population DESC
    encyclopedia-# LIMIT 1;
     name
    ------
     USA
    (1 row)
    

    Here we first order the rows in descending order by population, and then retrieve the name value from the first of the ordered rows.

  3. Write a query to retrieve the name of the country with the second largest population.

    Solution

    encyclopedia=# SELECT name FROM countries
    encyclopedia-# ORDER BY population DESC
    encyclopedia-# LIMIT 1 OFFSET 1;
     name
    ------
     Japan
    (1 row)
    

    This is similar to the previous solution, but we retrieve the second row, rather than the first, by using OFFSET.

  4. Write a query to retrieve all of the unique values from the binomial_name column of the animals table.

    Solution

    encyclopedia=# SELECT DISTINCT binomial_name
    encyclopedia-# FROM animals;
          binomial_name       
    --------------------------
     Aquila Chrysaetos
     Strigops habroptila
     Falco Peregrinus
     Columbidae Columbiformes
    (4 rows)
    

    Since Columbidae Columbiformes occurs twice (for Dove and for Pigeon), we can prepend the column name with DISTINCT in our query in order to return only the unique values.

  5. Write a query to return the longest binomial name from the animals table.

    Solution

    encyclopedia=# SELECT binomial_name
    encyclopedia-# FROM animals
    encyclopedia-# ORDER BY length(binomial_name) DESC
    encyclopedia-# LIMIT 1;
          binomial_name       
    --------------------------
     Columbidae Columbiformes
    (1 row)
    

    Here we use the length() function to determine the length of each string in the binomial_name column, and use the resulting values to order our query (in descending order). By setting a LIMIT of 1, we return only the longest value.

  6. Write a query to return the first name of any celebrity born in 1958.

    Solution

    encyclopedia=# SELECT first_name
    encyclopedia-# FROM celebrities
    encyclopedia-# WHERE date_part('year', date_of_birth) = 1958;
     first_name
    ------------
     Madonna
     Prince
    (2 rows)
    

    Here we use the date_part() function to extract the year from the date_of_birth, and then use within a WHERE clause.

  7. Write a query to return the highest maximum age from the animals table.

    Solution

    encyclopedia=# SELECT max(max_age_years)
    encyclopedia-# FROM animals;
     max
    -----
      60
    (1 row)
    
  8. Write a query to return the average maximum weight from the animals table.

    Solution

    encyclopedia=# SELECT avg(max_weight_kg)
    encyclopedia-# FROM animals;
            avg         
    --------------------
     3.1700000000000000
    (1 row)
    
  9. Write a query to return the number of rows in the countries table.

    Solution

    encyclopedia=# SELECT count(id) FROM countries;
     count
    -------
        4
    (1 row)
    

    Here we use the count() function to count the number of rows. We could have used any column to count the rows, e.g. count(name) would have returned the same result.

  10. Write a query to return the total population of all the countries in the countries table.

    Solution

    encyclopedia=# SELECT sum(population)
    encyclopedia-# FROM countries;
        sum    
    -----------
     601544589
    (1 row)
    
  11. Write a query to return each unique conservation status code alongside the number of animals that have that code.

    Solution

    encyclopedia=# SELECT conservation_status, count(id)
    encyclopedia-# FROM animals
    encyclopedia-# GROUP BY conservation_status;
     conservation_status | count
    ---------------------+-------
     CR                  |     1
     LC                  |     4
    (2 rows)
    

    The key thing here is the GROUP BY clause in conjuntion with the count() function. We are effectively combining the rows into groups based on a certain value (in this case conservation_status), and calling the function on the rows in each group. Without the GROUP BY clause, the function would treat all the rows as one single group.

  12. Connect to the ls_burger database. Write a query that returns the average burger cost for all orders that include fries.

    Solution

    ls_burger=# SELECT avg(burger_cost)
    ls_burger-# FROM orders
    ls_burger-# WHERE side = 'Fries';
            avg         
    --------------------
     4.0000000000000000
    (1 row)
    
  13. Write a query that returns the cost of the cheapest side ordered.

    Solution

    ls_burger=# SELECT min(side_cost)
    ls_burger-# FROM orders
    ls_burger-# WHERE side IS NOT NULL;
     min  
    ------
     0.99
    (1 row)
    
  14. Write a query that returns the number of orders that include Fries and the number of orders that include Onion Rings.

    Solution

    ls_burger=# SELECT side, count(id)
    ls_burger-# FROM ORDERS
    ls_burger-# WHERE side = 'Fries'
    ls_burger-# OR side = 'Onion Rings'
    ls_burger-# GROUP BY side;
        side     | count
    -------------+-------
     Fries       |     2
     Onion Rings |     1
    (2 rows)