If you don't already have a library database, then make sure to run
createdb library from the terminal before starting this chapter.
The library database is needed to work through this chapter's content.
The data needed to work through this chapter is stored
here. If you haven't been following along feel free to download it, and then use the commands listed below to transfer that data into the library database.
There will be times when you need to either store, import data or externally write out your database data in a particular format. For example, you may be required to use a different RDBMS, backup the database, or import new data that was created in a different database.
In another case, the library may decide to loan books to other libraries and provide access to the catalog of books available. This data would need to be exported to a format that the other library can import.
Importing and exporting are common tasks that are usually handled by a database administrator, but SQL developers should have an understanding of the commands and processes needed to perform these tasks. Most developers work with a local RDBMS, and you will, at the very least, need to learn how to manage your local database.
Data can be exported to/imported from many different formats. Tools and utilities can be written to perform these tasks. We will cover data exported to/imported from two basic formats that are common and simple, and do not require external tools/programs.
In some cases, there may be data supplied to you in PostgreSQL's native language in the form of a .sql file. Or you might want to import the SQL dump of an existing database or table. .sql files are usually supplied in files containing a set of PostgreSQL statements and commands, but may be compressed to save space.
Let's export the library database that we created in the chapter on SQL joins.
$ pg_dump --inserts --no-acl --clean library > library.sql
You can look at the content of
library.sql by running the
cat command below.
As you can see, library.sql contains SQL statements that create the database schema, and also create all of the data in the tables of the database.
$ cat library.sql
Now let's try to import the library.sql file we just generated to recreate the library database. In order to do that first, we drop the database if it exists and then create it.
$ dropdb library
$ createdb library
We now have an empty library database, and the table
users does not exist in it.
library=# \dt users No matching relations found.
Next we import the library.sql file into the library database. You should be very familiar with this next command if you've been following along from previous chapters.
$ psql -d library < library.sql
Now, when we view the table schema and data, we see that the table has been created, and all users have also been created.
$ psql -d library
library=# \d users Table "public.users" Column | Type | Modifiers ------------+-----------------------------+---------------------------------------------------- id | integer | not null default nextval('users_id_seq'::regclass) username | character varying(25) | not null enabled | boolean | default true last_login | timestamp without time zone | not null default now() Indexes: "users_pkey" PRIMARY KEY, btree (id) Referenced by: TABLE "addresses" CONSTRAINT "fk_user_id" FOREIGN KEY (user_id) REFERENCES users(id) TABLE "reviews" CONSTRAINT "reviews_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE TABLE "users_books" CONSTRAINT "users_books_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON UPDATE CASCADE library=# SELECT * FROM users; id | username | enabled | last_login ----+-------------+---------+---------------------------- 1 | John Smith | t | 2016-04-06 13:00:16.551272 2 | Jane Smiley | t | 2016-04-06 13:00:16.562224 3 | Alice Munro | t | 2016-04-06 13:00:16.566877 (3 rows)
Thanks to its simplicity, CSV (comma separated value) is a very common data interchange format. You can export a CSV file for a particular query. Exporting CSV files differ from exporting SQL. All of the data is not dumped. A query can be run to select a subset of data, and then copy to a CSV file.
COPY (SELECT u.username, b.title FROM users u INNER JOIN users_books ub ON (ub.user_id = u.id) INNER JOIN books b ON (b.id = ub.book_id)) TO '/tmp/users_books.csv' WITH CSV;
$ cat /tmp/users_books.csv John Smith,My First SQL book John Smith,My Second SQL book Jane Smiley,My Second SQL book
Make sure to use an absolute file path when using the COPY statement, otherwise COPY will not work correctly.
We're using two different commands here. The
COPY statement is used to copy data between a table and a file. We're querying the specific data we want with the
SELECT statement, and then we are copying that data to a new file.
The CSV file got created in the tmp directory with the name
users_books.csv and has 3 records in it. It does not have a header row.
If we want to include headers with column names in the CSV file, we must add the headers by specifying the COPY command's
COPY (SELECT u.username, b.title FROM users u INNER JOIN users_books ub ON (ub.user_id = u.id) INNER JOIN books b ON (b.id = ub.book_id)) TO '/tmp/users_books_with_header.csv' WITH CSV HEADER;
$ cat /tmp/users_books_with_header.csv username,title John Smith,My First SQL book John Smith,My Second SQL book Jane Smiley,My Second SQL book
Since PostgreSQL uses a client-server model, SQL statements that are entered at the psql prompt are sent to the server before they are executed. This means that any file paths included in SQL statements are interpreted by the server. Since the server doesn't know what directory a user is in when they execute a statement, this means that all file paths have to be absolute.
To work around this, psql has a meta command
\copy, which is used in place of
COPY. This meta command initiates copies from the client (which is the psql process in this case), and this allows it to interpret paths that are relative to the user's current directory.
To summarize: use
\copy where you would use
COPY if you are within psql, as it will behave in the way you expect it to.
Importing data into PostgreSQL from a CSV file is something database administrators may be required to do. You may also need to import CSV files for databases you are working with locally to debug a problem with data that only happens in production, or to load test data.
In this example, we will show how to load a data file into PostgreSQL. The data file is a list of books from https://www.gutenberg.org/, a free e-book site. We will load books and authors into the book table of our library database. The data file consists of only books and authors.
Data is imported with the
COPY command statement we mention above. But instead of using
COPY directly, we'll use the meta command
\copy allows us to specify a relative file path, whereas if we use
COPY, the file path would have to be absolute. Technically,
COPY is faster than
\copy, but for smaller tables, the
\copy meta command works just fine.
library=# \copy books (title, author, published_date) FROM './gutenberg_books.csv' DELIMITER ',' CSV; COPY 16
Notice that the syntax is similar to our export statement that used
COPY. In this case, we swap out
\copy. We specify which table to copy data to and for which columns. Then, we specify where we want to import data from; this can be standard input or a file name. Finally, we specify various options for this
\copy meta command: for this command, the delimiter of the file is listed as well as the expected file format.
With that finished, we have successfully imported data into our books table. In the next chapter we will use that imported data to do some advanced querying with the SELECT statement.