Indexes

You will need a database to run the commands in this chapter. You can use a new db or continue to use the library database from earlier chapters. As a reminder, a new database can be created using the terminal command createdb db_name.

What are indexes?

Indexes are used by PostgreSQL to retrieve your data faster by helping locate column values faster without having to search through every record in sequence.

You'll encounter this term in the SQL world a lot. Indexing is a mechanism that SQL engines use to speed up queries by storing results in a table-like structure in ordered form. The index you're using by default is the primary key, which is optimized by SQL.

You can think of indexes as a dictionary. When you need to look up something in a dictionary, you go straight to the page based on the initial letters in the word. In the same way, having an index on the column of a database enables fast lookup.

The question you might be asking is, why not just index every column and speed up the entire table? People frequently do this, and end up with slower tables. There are numerous reasons for this slowness; one reason being that when you build an index of a field, reads become faster, but every time a row is updated or inserted, the index must be updated as well. Now you're updating not only the table but also the index, so that's a performance cost. There are a lot of choices involved in how to decide whether a column should be designated an index. After finishing this book you'll have a better grasp of the concepts needed to read about indexes and make sensible decisions about which columns to index, and the different types of indexing to use.

Index Types

PRIMARY

One of the index types you're already using in your tables is the PRIMARY KEY, which PostgreSQL uses to uniquely identify a row in a table. The terms KEY and INDEX in PostgreSQL are somewhat synonymous.

UNIQUE

You can also create a UNIQUE index on a column. We saw this in an earlier chapter where we added a unique constraint to the book table on the isbn column. Now when you create a SELECT query to search a book by its isbn, it will not scan the entire table and check each row; with the use of a unique index, that data can be accessed immediately, which should make the query very fast.

CREATE TABLE books (
  id SERIAL,
  title VARCHAR(100) NOT NULL,
  author VARCHAR(100) NOT NULL,
  published_date TIMESTAMP NOT NULL,
  isbn INT,
  PRIMARY KEY (id),
  UNIQUE (isbn)
);

NON-UNIQUE

Non-unique indexes are not explicitly specified in PostgreSQL. An index that isn't unique simply has the option to have duplicate value, yet that fact isn't shown in description of database tables. For example, the author column could be indexed but have multiple occurrences of a given value within that column.

When should I use an index?

  • Indexes are best used in cases where sequential reading is inadequate. For example: fields that aid in mapping relationships and fields that are frequently displayed with ORDERED BY are good candidates for indexing.
  • Misuse of indexes can slow down your database inserts/updates as indexes need to be updated appropriately. They are best used in tables where reads are more common.

Creating your own Indexes

You can create indexes for a table based on a single column, multiple columns or even the prefix of a column.

  • Single column
  • Multi-column
  • A partial index can be set to check the prefix of a field, this will make it so this column is indexed by the prefix of the value.

We will show you how to create an index on a single column.

Single Column Index

On a single column, an index is created with the CREATE INDEX command.

CREATE INDEX ON table_name (field_name)


library=# CREATE INDEX ON books (author);
CREATE INDEX

library=# \d books
                                       Table "public.books"
     Column     |            Type             |                     Modifiers
----------------+-----------------------------+----------------------------------------------------
 id             | integer                     | not null default nextval('books_id_seq'::regclass)
 title          | character varying(100)      | not null
 author         | character varying(100)      | not null
 published_date | timestamp without time zone | not null
 isbn           | integer                     |
Indexes:
    "books_pkey" PRIMARY KEY, btree (id)
    "books_isbn_key" UNIQUE CONSTRAINT, btree (isbn)
    "books_author_idx" btree (author)
Referenced by:
    TABLE "reviews" CONSTRAINT "reviews_book_id_fkey" FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE
    TABLE "users_books" CONSTRAINT "users_books_book_id_fkey" FOREIGN KEY (book_id) REFERENCES books(id) ON UPDATE CASCADE

Notice that the index for author is "books_author_idx" btree (author). It doesn't specify that it may accept multiple values. Rather, any constraints are listed if they are present. We can see this in the indexes for isbn and this table's primary key.

Deleting Indexes

The DROP INDEX command can be used to delete the index that was created. PostgreSQL generates unique names for all indexes. If you want to know the name generated you can run:

library=# \di

List of relations
 Schema |        Name         | Type  | Owner |     Table
--------+---------------------+-------+-------+----------------
 public | addresses_pkey      | index | User  | addresses
 public | books_author_idx    | index | User  | books
 public | books_isbn_key      | index | User  | books
 public | books_pkey          | index | User  | books
 public | decimal_values_pkey | index | User  | decimal_values
 public | messages_pkey       | index | User  | messages
 public | reviews_pkey        | index | User  | reviews
 public | users_books_pkey    | index | User  | users_books
 public | users_pkey          | index | User  | users
(9 rows)

As you can see the index created is named as books_author_idx. If you have been following along, go ahead and run the query below to DROP the index.

library=# DROP INDEX books_author_idx;
DROP INDEX
library=# \d books

Table "public.books"
     Column     |            Type             |                     Modifiers
----------------+-----------------------------+----------------------------------------------------
 id             | integer                     | not null default nextval('books_id_seq'::regclass)
 title          | character varying(100)      | not null
 author         | character varying(100)      | not null
 published_date | timestamp without time zone | not null
 isbn           | integer                     |
Indexes:
"books_pkey" PRIMARY KEY, btree (id)
"books_isbn_key" UNIQUE CONSTRAINT, btree (isbn)
Referenced by:
TABLE "reviews" CONSTRAINT "reviews_book_id_fkey" FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE
TABLE "users_books" CONSTRAINT "users_books_book_id_fkey" FOREIGN KEY (book_id) REFERENCES books(id) ON UPDATE CASCADE