2. PostgreSQL Guide

Ссылка на оригинал: http://www.gentoo.org/doc/en/postgres-howto.xml

C версии: 1.5

2.1. Introduction

2.1.1. PostgreSQL introduction

When talking to most developers about the different database solutions to use, two major databases will usually form the answer. One would be MySQL, and the other is what this document will refer to, PostgreSQL. The advantages of one over the other is a somewhat long winded debate, however it is just to say that PostgreSQL has had a more firm grasp on true relational database structure than MySQL. Most of the standard features such as FOREIGN KEY was only just added in MySQL 5. However, whatever the case may be, this document assumes that you have selected PostgreSQL as the database to use. The first place to start is the emerge process. In the next section, the installation process through emerge will be described, as well as the basic configuration.

2.1.2. PostgreSQL installation

To begin, we must first emerge the PostgreSQL package. To do so, run the following code to first ensure that the options for it are properly set:

Листинг 46. Checking the PostgreSQL build options

# emerge -pv postgresql

These are the packages that I would merge, in order:

Calculating dependencies ...done!
[ebuild  N    ] dev-db/postgresql-8.0.4  -doc -kerberos +nls +pam +perl -pg-intdatetime +python +readline (-selinux) +ssl -tcl +xml +zlib 0 kB

Here's a list of what the different build options indicate:

USE Flag	Meaning
doc	This USE flag enables or disables the installation of documentation outside of the standard man pages. The one good time to disable this option is if you are low on space, or you have alternate methods of getting a hold of the documentation (online, etc.)
kerberos	When connecting to the database, with this option enabled, the admin has the option of using kerberos to authenticate their users/services to the database.
nls	If this option is enabled, PostgreSQL can utilize translated strings for non-English speaking users.
pam	If this option is enabled, and the admin configures the PostgreSQL configuration file properly, users/services will be able to login to a PostgreSQL database using PAM (Pluggable Authentication Module).
perl	If this option is enabled, perl bindings for PostgreSQL will be built.
pg-intdatetime	If this option is enabled, PostgreSQL will support 64 bit integer date types.
python	If this option is enabled, PostgreSQL will be built with python bindings.
readline	If this option is enabled, PostgreSQL will support readline style command line editing. This includes command history and isearch.
selinux	If this option is enabled, an selinux policy for PostgreSQL will be installed.
ssl	If this option is enabled, PostgreSQL will utilize the OpenSSL library to encrypt traffic between PostgreSQL clients and servers.
tcl	If this option is enabled, PostgreSQL will build tcl bindings.
xml	If this option is enabled, XPATH style xml support will be built. More information on using xml support with PostgreSQL can be found on: PostgreSQL and XML.
zlib	This isn't really used by PostgreSQL itself, but by pg_dump to compress the dumps it produces.

Once you've customized PostgreSQL to meet your specific needs, go ahead and start the emerge:

Листинг 47. Emerge-ing PostgreSQL

# emerge postgresql
(Output shortened)
>>> /usr/lib/libecpg.so.5 -> libecpg.so.5.0
>>> /usr/bin/postmaster -> postgres
 * Make sure the postgres user in /etc/passwd has an account setup with /bin/bash as the shell
 *
 * Execute the following command
 * emerge --config =postgresql-8.0.4
 * to setup the initial database environment.
 *
>>> Regenerating /etc/ld.so.cache...
>>> dev-db/postgresql-8.0.4 merged.

As shown by the einfo output, there is some post setup that must be done. The next chapter will look at the actual configuration of PostgreSQL.

2.2. PostgreSQL configuration

2.2.1. Setting up the initial database environment

As noted in the earlier emerge output, the initial database environment must be setup. However, before this is done, one thing needs to be considered. Unlike, say MySQL, PostgreSQL's "root" password is the password of the actual user. However, only the user is created by the ebuild not the password. So before we can begin, the password must be set for the postgres user:

Листинг 48. Setting the password

# passwd postgres
New UNIX password:
Retype new UNIX password:
passwd: password updated successfully

Now that this is setup, the creation of the initial database environment can occur:

Листинг 49. Configuring the database environment with emerge --config

# emerge --config =postgresql-8.0.4


Configuring pkg...

 * Creating the data directory ...
 * Initializing the database ...
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale C.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating directory /var/lib/postgresql/data/global ... ok
creating directory /var/lib/postgresql/data/pg_xlog ... ok
creating directory /var/lib/postgresql/data/pg_xlog/archive_status ... ok
creating directory /var/lib/postgresql/data/pg_clog ... ok
creating directory /var/lib/postgresql/data/pg_subtrans ... ok
creating directory /var/lib/postgresql/data/base ... ok
creating directory /var/lib/postgresql/data/base/1 ... ok
creating directory /var/lib/postgresql/data/pg_tblspc ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 1000
creating configuration files ... ok
creating template1 database in /var/lib/postgresql/data/base/1 ... ok
initializing pg_shadow ... ok
enabling unlimited row size for system tables ... ok
initializing pg_depend ... ok
creating system views ... ok
loading pg_description ... ok
creating conversions ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.

Success. You can now start the database server using:

    /usr/bin/postmaster -D /var/lib/postgresql/data
or
    /usr/bin/pg_ctl -D /var/lib/postgresql/data -l logfile start

 *
 * You can use /etc/init.d/postgresql script to run PostgreSQL instead of pg_ctl.
 *

Now the initial database environment is setup. The next section will look at verifying the install and setting up users to access the database.

2.2.2. PostgreSQL database setup

Now that PostgreSQL is setup, it's a good idea at this point to verify the installation. First, make sure the service starts up ok:

Листинг 50. Starting up the PostgreSQL service

# /etc/init.d/postgresql start
* Starting PostgreSQL ...                                          [ ok ]

Once this is verified working, it's also a good idea to add it to the default runlevel so it starts at boot:

Листинг 51. Adding to the default runlevel

# rc-update add postgresql default
* postgresql added to runlevel default

Now that the service has started, it's time to try setting up a test database. To start out, let's create a test database by using the createdb command. We'll also pass along the -U option to set the user (it defaults to the current user name if you don't), and the -W option to request the password we created earlier. Finally we give it the name of the database we want to create:

Листинг 52. Creating a database with createdb

$ createdb -U postgres -W test
Password:
CREATE DATABASE

The database was successfully created, and we can confirm that the database can run basic tasks. We'll go ahead and drop this database (remove it) with the dropdb command:

Листинг 53. Dropping a database with dropdb

$ dropdb -U postgres -W test
Password:
DROP DATABASE

Right now, only the postgres user can run commands. Obviously this is not the sort of setup one would like in a multi-user environment. The next section will look at working with user accounts.

2.2.3. Setting up database user accounts

As mentioned earlier, having to login as the postgres user is somewhat undesirable in a mult-user environment. In most cases there will be various users and services accessing the server, and each have different permission requirements. So, to handle this, the createuser command can be used. This command is an alternative to running a few SQL queries, and is a lot more flexible from an admin standpoint. We'll go ahead and create two users, a 'superuser' that can add other users and administer the db, and a standard user:

Листинг 54. Setting up the superuser

(replace chris with the username you'd like to use)
$ createuser -a -d -P -E -U postgres -W chris
Enter password for new user:
Enter it again:
Password:
CREATE USER

There, we've created the superuser. The command line option -a specifies that this user can add other users. -d means that this user can create databases. -P let's you enter a password for the user and -E will encrypt it for security purposes. Now then, we'll test this new user's permissions out by setting up our standard user:

Листинг 55. Setting up the standard user

(replace chris with the username you've just created)
$ createuser -A -D -P -E -U chris -W testuser
Enter password for new user:
Enter it again:
Password:
CREATE USER

Success! Our new user was created using the previously created superuser. The -A and -D options do the opposite of -a and -d, and instead deny the user the ability to create other users and databases. Now that there are users to work with, the next chapter will look at using the new database.

2.3. Using PostgreSQL

2.3.1. Setting up permissions

Now there is a user that can create databases and add other users, and the main postgres user that can do anything. The user created earlier can currently login to the server, and that's about it. In general, users need to be able to insert data and retrieve data, and sometimes any other number of tasks. So, for this new user to be able to do anything, they must be setup with the proper permissions. This can easily be done by passing the -O parameter to createdb. We'll start by making a new database, MyDB with our superuser that will be owned by the previous testuser:

Листинг 56. Creating the MyDB database

$ createdb -O testuser -U chris -W MyDB
Password:
CREATE DATABASE

Alright, now we have a new MyDB database, and a testuser that can access it. To test this out, we'll login as the testuser to the new MyDB database. We'll do this with the psql program. This program is what's used to connect to the PostgreSQL database from command line. So connect to the new database like so:

Листинг 57. Logging into the MyDB database as the testuser

$ psql -U testuser -W MyDB
Password:
Welcome to psql 8.0.4, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

MyDB=>

So, the testuser is now logged into the database, and can begin to initiate some commands. To get a feel for using PostgreSQL, the next section will take a look at some of the basic commands in navigating the psql client.

Basic PostgreSQL commands and creating a table

For those who are used to MySQL, this is somewhat of a definite read. This is where PostgreSQL may get somewhat unique with regards to running commands. To start, here is a list of some commands that will be discussed:

Command	Usage	MySQL Equivalent
\c[onnect] [DBNAME\|- [USER]]	Connects to another database	USE DATABASE
\q	Quit the psql client	quit
\i FILE	Run commands from FILE	source FILE
\o [FILE]	Send query results to FILE	INTO OUTFILE, but outputs everything (not just SELECTS)
\d [NAME]	Describe a database or table (as well as other items)	DESC(RIBE)
\db [PATTERN]	List available tables that match PATTERN (all if no pattern is given)	SHOW TABLES

With the exception of \c[onnect], all the commands shown will be used later on in the section. So right now the database is empty. That said, we need to insert some data. The first step to inserting data, however, is to put it in a table. Right now there are no tables in the database, so we need to create one. This is done with the CREATE TABLE command. We'll make a table of items. They will contain a Product ID, Description, and price:

Листинг 58. Creating the products table

MyDB=> CREATE TABLE products (
MyDB(>   product_id SERIAL,
MyDB(>   description TEXT,
MyDB(>   price DECIMA♦
MyDB(> );
NOTICE:  CREATE TABLE will create implicit sequence "products_product_id_seq"
for serial column "products.product_id"
CREATE TABLE

You can ignore the NOTICE, it's perfectly harmless. Looking at the last line of the function, CREATE TABLE seems to indicate that the command has succeeded. However, let's go ahead and verify that the table was indeed successfully created with the \d command:

Листинг 59. Looking at the newly created table

MyDB=> \d products
                                 Table "public.products"
   Column    |  Type   |                            Modifiers
-------------+---------+------------------------------------------------------------------
 product_id  | integer | not null default nextval('public.products_product_id_seq'::text)
 description | text    |
 price       | numeric |

Indeed the table was successfully created. Now that the table is created, it needs to be populated with data. The next section will look at populating the database with data.

2.3.2. Inserting data into the database

This section will look at the two ways of populating the newly created table with data. First let's look at the most basic command, INSERT:

Листинг 60. INSERT syntax

INSERT INTO [tablename] (column1,column2,column3) VALUES(value1,value2,value3)

tablename contains the name of the table to insert the data into. (column1,column2,column3) lets you specify the specific columns to insert the values into. VALUES(value1,value2,value3) is the listing of values. The values are inserted into the same order as the columns (column1 gets value1, column2 gets value2, column3 gets value3). These counts must be the same. So let's go ahead and insert an item into the table:

Важно

From working with databases for a long time, I personally recommend specifying INSERT statements exactly as above. Developers often make the mistake of using INSERT INTO without specifying columns. This is unproductive, as if a new column gets added to the database, it will cause in error if the value to column count is not the same. You should always specify the columns unless you're 300% sure you'll never add a column.

Листинг 61. Inserting data into the table

MyDB=> INSERT INTO products (description,price) VALUES('A test product', 12.00);
INSERT 17273 1

The last line needs a bit of explaining. The return of an insert command is an OID (Object Identifier) and the number of rows inserted. OID's are a bit beyond the scope of this guide, and the PostgreSQL manual has some good information on it. Now, for a situation where you have 20,000 products, these insert statements can be a little tedious. However, not all is lost. The COPY command can be used to insert data into a table from a file or stdin. In this example, let's assume that you have a csv (comma separated values) file, which contains the product id, description, and price. The file looks like this:

Листинг 62. products.csv

2,meat,6.79
3,soup,0.69
4,soda,1.79

Now we'll use the COPY command to populate our data:

Важно

The COPY FROM STDIN command is used because only the postgres user can insert data from a file (for obvious security reasons).

Листинг 63. Using COPY to populate the products table

MyDB=> COPY products FROM STDIN WITH DELIMITER AS ',';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 2,meat,6.79
>> 3,soup,0.69
>> 4,soda,1.79
>> \.

Unfortunately, this line doesn't return the same status information as the INSERT INTO statement. How do we know the data was inserted? The next section will look at running queries to check our data.

2.3.3. Using PostgreSQL queries

This section will look at using the SELECT statement to view data in our tables. The basic SELECT format looks like this:

Листинг 64. SELECT syntax

SELECT (column1,column2|*) FROM (table) [WHERE (conditionals)]

There are two ways to select columns. The first is using * to select all columns, and the second is to specify a list of specific columns you wish to see. The second is quite handy when you want to find a specific column in a rather large list of them. Let's start out with using SELECT with * to specify all columns:

Листинг 65. Viewing the products table

MyDB=> SELECT * FROM products;
 product_id |  description   | price
------------+----------------+-------
          1 | A test product | 12.00
          2 | meat           |  6.79
          3 | soup           |  0.69
          4 | soda           |  1.79
(4 rows)

As shown here, all the data we inserted earlier is indeed in the table. Now let's say we only want to see the description and the price, and don't care about the product id. In this case we'll use the column specific SELECT form:

Листинг 66. Viewing the products table

MyDB=> SELECT description,price FROM products;
  description   | price
----------------+-------
 A test product | 12.00
 meat           |  6.79
 soup           |  0.69
 soda           |  1.79
(4 rows)

Now only the product and price is shown, letting us focus on only the important data. Now let's say that we want to see only the items that are greater than $2.00. Here's where the WHERE clause comes in handy:

Листинг 67. Viewing specific rows from the products table

MyDB=> SELECT description,price FROM products WHERE price > 2.00;
  description   | price
----------------+-------
 A test product | 12.00
 meat           |  6.79
(2 rows)

Now a listing of products over $2.00 is displayed, focusing the data even more. These forms of querying for information are very powerful, and can help create extremely useful reports.

2.3.4. Conclusion

This concludes the PostgreSQL Guide. A big thanks goes to Masatomo Nakano, the previous Gentoo PostgreSQL maintainer for his help in answering my questions. Any suggestions on this guide should be sent to Chris White. For more extensive documentation, see the PostgreSQL website.