Importing and Exporting Data

If you don't already have a library database, then make sure to run createdb library from the terminal before starting this chapter. The library database is needed to work through this chapter's content. The data needed to work through this chapter is stored here. If you haven't been following along feel free to download it, and then use the commands listed below to transfer that data into the library database.

When do we need to import/export data?

There will be times when you need to either store, import data or externally write out your database data in a particular format. For example, you may be required to use a different RDBMS, backup the database, or import new data that was created in a different database.

In another case, the library may decide to loan books to other libraries and provide access to the catalog of books available. This data would need to be exported to a format that the other library can import.

Importing and exporting are common tasks that are usually handled by a database administrator, but SQL developers should have an understanding of the commands and processes needed to perform these tasks. Most developers work with a local RDBMS, and you will, at the very least, need to learn how to manage your local database.

Data can be exported to/imported from many different formats. Tools and utilities can be written to perform these tasks. We will cover data exported to/imported from two basic formats that are common and simple, and do not require external tools/programs.

  1. SQL format
  2. CSV format

Exporting in SQL format

In some cases, there may be data supplied to you in PostgreSQL's native language in the form of a .sql file. Or you might want to import the SQL dump of an existing database or table. .sql files are usually supplied in files containing a set of PostgreSQL statements and commands, but may be compressed to save space.

Let's export the library database that we created in the chapter on SQL joins.

$ pg_dump --inserts --no-acl --clean library > library.sql

You can look at the content of library.sql by running the cat command below. As you can see, library.sql contains SQL statements that create the database schema, and also create all of the data in the tables of the database.

$ cat library.sql

Importing from SQL format

Now let's try to import the library.sql file we just generated to recreate the library database. In order to do that first, we drop the database if it exists and then create it.

$ dropdb library
$ createdb library

We now have an empty library database, and the table users does not exist in it.

library=# \dt users
No matching relations found.

Next we import the library.sql file into the library database. You should be very familiar with this next command if you've been following along from previous chapters.

$ psql -d library < library.sql

Now, when we view the table schema and data, we see that the table has been created, and all users have also been created.

$ psql -d library
library=# \d users
                                     Table "public.users"
   Column   |            Type             |                     Modifiers
------------+-----------------------------+----------------------------------------------------
 id         | integer                     | not null default nextval('users_id_seq'::regclass)
 username   | character varying(25)       | not null
 enabled    | boolean                     | default true
 last_login | timestamp without time zone | not null default now()
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "addresses" CONSTRAINT "fk_user_id" FOREIGN KEY (user_id) REFERENCES users(id)
    TABLE "reviews" CONSTRAINT "reviews_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
    TABLE "users_books" CONSTRAINT "users_books_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON UPDATE CASCADE

library=# SELECT * FROM users;
 id |  username   | enabled |         last_login
----+-------------+---------+----------------------------
  1 | John Smith  | t       | 2016-04-06 13:00:16.551272
  2 | Jane Smiley | t       | 2016-04-06 13:00:16.562224
  3 | Alice Munro | t       | 2016-04-06 13:00:16.566877
(3 rows)

Exporting a CSV file

Thanks to its simplicity, CSV (comma separated value) is a very common data interchange format. You can export a CSV file for a particular query. Exporting CSV files differ from exporting SQL. All of the data is not dumped. A query can be run to select a subset of data, and then copy to a CSV file.

COPY
(SELECT u.username, b.title
FROM users u
INNER JOIN users_books ub ON (ub.user_id = u.id)
INNER JOIN books b ON (b.id = ub.book_id))
TO '/tmp/users_books.csv'
WITH CSV;
$ cat /tmp/users_books.csv
John Smith,My First SQL book
John Smith,My Second SQL book
Jane Smiley,My Second SQL book

Make sure to use an absolute file path when using the COPY statement, otherwise COPY will not work correctly.

We're using two different commands here. The COPY statement is used to copy data between a table and a file. We're querying the specific data we want with the SELECT statement, and then we are copying that data to a new file. The CSV file got created in the tmp directory with the name users_books.csv and has 3 records in it. It does not have a header row. If we want to include headers with column names in the CSV file, we must add the headers by specifying the COPY command's HEADER option.

COPY (SELECT u.username, b.title
FROM users u
INNER JOIN users_books ub ON (ub.user_id = u.id)
INNER JOIN books b ON (b.id = ub.book_id))
TO '/tmp/users_books_with_header.csv'
WITH CSV HEADER;
$ cat /tmp/users_books_with_header.csv
username,title
John Smith,My First SQL book
John Smith,My Second SQL book
Jane Smiley,"My Second SQL book

Importing a CSV file

COPY vs \copy

Since PostgreSQL uses a client-server model, SQL statements that are entered at the psql prompt are sent to the server before they are executed. This means that any file paths included in SQL statements are interpreted by the server. Since the server doesn't know what directory a user is in when they execute a statement, this means that all file paths have to be absolute.

To work around this, psql has a meta command \copy, which is used in place of COPY. This meta command initiates copies from the client (which is the psql process in this case), and this allows it to interpret paths that are relative to the user's current directory.

To summarize: use \copy where you would use COPY if you are within psql, as it will behave in the way you expect it to.

Importing data into PostgreSQL from a CSV file is something database administrators may be required to do. You may also need to import CSV files for databases you are working with locally to debug a problem with data that only happens in production, or to load test data.

In this example, we will show how to load a data file into PostgreSQL. The data file is a list of books from https://www.gutenberg.org/, a free e-book site. We will load books and authors into the book table of our library database. The data file consists of only books and authors.

gutenberg.csv

Data is imported with the COPY command statement we mention above. But instead of using COPY directly, we'll use the meta command \copy. \copy allows us to specify a relative file path, whereas if we use COPY, the file path would have to be absolute. Technically, COPY is faster than \copy, but for smaller tables, the \copy meta command works just fine.

library=# \copy books (title, author, published_date) FROM './gutenberg_books.csv' DELIMITER ','  CSV;
COPY 16

Notice that the syntax is similar to our export statement that used COPY. In this case, we swap out COPY for \copy. We specify which table to copy data to and for which columns. Then, we specify where we want to import data from; this can be standard input or a file name. Finally, we specify various options for this \copy meta command: for this command, the delimiter of the file is listed as well as the expected file format.

With that finished, we have successfully imported data into our books table. In the next chapter we will use that imported data to do some advanced querying with the SELECT statement.