sql-info.de

PostgreSQL Gotchas


PostgreSQL is a fully-featured, robust open-source database with strong SQL standards compliancy. As with all RDBMS products it has its odd little idiosyncracies, which while documented, are not obvious, counter-intuitive or just head-scratchingly odd.

This is work-in-progress; gotchas which have been removed are shown greyed out. Please send your favourite PostgreSQL gotchas and any comments, corrections and criticism to .

Note: this document contains many examples as they would appear in the command line client psql. For convenience, where statements are more than one line, the continuation prompt database-> has been omitted, to enable the statement to be copied directly from this page.

General SQL

SELECT column alias, ...

Affects: PostgreSQL <= 7.3

PostgreSQL requires AS when specifying alias names for output columns. Statements like the following will fail:

test=# SELECT id my_id     
         FROM test_table tt;
ERROR:  syntax error at or near "my_id" at character 11

Use SELECT id AS my_id FROM test_table tt instead.

In the SQL92 standard, the optional keyword AS is just noise and can be omitted without affecting the meaning. The PostgreSQL parser requires this keyword when renaming output columns because the type extensibility features lead to parsing ambiguities in this context. AS is optional in FROM items, however.

--http://www.postgresql.org/docs/7.3/static/sql-select.html

Most other databases (tested: DB2 8.2, Firebird 1.5.0, MySQL in all current versions and Oracle 8.1.7) support the SQL-92 standard.

(The author did not realise for a long time that AS was not mandatory in SQL, but nevertheless prefers to use it because it makes SELECT statements easier to read and potentially less ambiguous.)

Unquoted object names fold to lower case

All unquoted identifiers are assumed by PostgreSQL to be lower case by default. This means SELECT MY_COLUMN FROM my_table and SELECT my_column FROM MY_Table both refer to my_column and my_table. SELECT "MY_COLUMN" FROM my_table by contrast refers to a different column.

This can cause problems when converting mixed quoted and unquoted SQL from other databases with the opposite folding behaviour . For example, in Oracle SELECT "MY_COLUMN" FROM MY_TABLE and SELECT MY_COLUMN FROM "MY_TABLE" are equivalent; in PostgreSQL they are not.

Implicit date parsing

Affects: PostgreSQL <= 7.3

Prior to version 7.4 PostgreSQL silently swapped month and day values in ISO-style date strings YYYY-MM-DD if the month value was greater than 12, implicitly assuming the string to be YYYY-DD-MM:

database=> CREATE TABLE datetest (somedate DATE);
CREATE TABLE
database=> INSERT INTO datetest VALUES ('2002-16-12');
INSERT 640005 1
database=> INSERT INTO datetest VALUES ('2002-12-16');
INSERT 640006 1
database=> SELECT * FROM datetest;
  somedate
------------
 2002-12-16
 2002-12-16
(2 rows)
   

Beginning with version 7.4 PostgreSQL will refuse to insert the first value without an explicit datestyle setting:

database=> INSERT INTO datetest VALUES ('2002-16-12');
ERROR:  date/time field value out of range: "2002-16-12"
HINT:  Perhaps you need a different "datestyle" setting.
   

PostgreSQL can't multiply 256 * 256 * 256 * 256?!

Affects: All PostgreSQL versions

PostgreSQL returns an out-of-range error when attempting this apparently simple multiplication:

test=> SELECT 256 * 256 * 256 * 256;
ERROR:  integer out of range

This is because PostgreSQL assumes non-qualified integers to be of type INT4, i.e. 32-bit integers.

To avoid this problem, cast at least one of the integers as INT8:

test=> SELECT 256::INT8 * 256 * 256 * 256;
  ?column?
------------
 4294967296
(1 row)

Note that in PostgreSQL versions prior to 8.0 operations on an INT4 integer which produces a maximum result greater than 4294967296 are returned as 0:

test=> SELECT 256 * 256 * 256 * 256;
 ?column?
----------
        0
(1 row)

Implicit FROM item and unintended cross joins

PostgreSQL allows the omission of a FROM-clause in certain contexts. The most useful application of this is when computing the results of simple expressions (such as in the gotcha above):

test=> SELECT 1+1, 'Hello World';
 ?column? |  ?column?
----------+-------------
        2 | Hello World
(1 row)

Many other databases require a dummy one-row table for such operations, for example Oracle:

SQL> SELECT 1+1, 'Hello World' FROM DUAL;

       1+1 'HELLOWORLD
---------- -----------
         2 Hello World

In the default setup PostgreSQL will also add an implicit FROM-clause if the table name(s) are directly referenced in the SELECT:

test=> SELECT table1.id;
 id
----
  1
  2
  3
(3 rows)

While this is a useful abbreviation, it can cause unexpected cross-joins:

test=> SELECT table1.* FROM table1 t1;
NOTICE:  adding missing FROM-clause entry for table "table1"
 id
----
  1
  2
  3
  1
  2
  3
  1
  2
  3
(9 rows)

Here, the statement has been expanded by PostgreSQL to SELECT table1.* FROM table1 t1, table1, resulting in a cartesian join. PostgreSQL does provide a warning notice, but this is easy to overlook.

To disable this behaviour, set the ADD_MISSING_FROM parameter to FALSE:

test=> SET add_missing_from TO FALSE;
SET
test=>  SELECT table1.* FROM table1 t1;
ERROR:  missing FROM-clause entry for table "table1"

Note that this will not affect SELECT statements containing simple expressions, as in the first example.

The documentation contains further details.

Note: beginning with PostgreSQL 8.1, ADD_MISSING_FROM will be set to FALSE by default.

RANDOM() failures

Affects: PostgreSQL <= 8.?

A statement such as the following:

  SELECT id, is_true
    FROM (SELECT id, RANDOM() < 0.5 AS is_true FROM some_table) AS t_tmp
   WHERE is_true;

should only return boolean TRUE values in the column is_true. However is_true will contain ca. 50% FALSE values.

The cause is PostgreSQL's query planner flattening the statement into something along the lines of

  SELECT id, RANDOM() < 0.5 FROM some_table AS t_tmp
   WHERE RANDOM() < 0.5;

resulting in two RANDOM() calls producing different results.

A simple workaround is to add OFFSET 0 to the subquery, which prevents it being flattened into the upper query:

  SELECT id, is_true
    FROM (SELECT id, RANDOM() < 0.5 AS is_true FROM some_table OFFSET 0) AS t_tmp
   WHERE is_true;

COUNT(*) very slow

Operations using COUNT(*) tend to be slow, especially on large datasets, and also when compared to some other RDBMS systems. Generally PostgreSQL will use a sequential rather than an index scan to compute the result.

The basic reason for this is that index entries do not contain transaction information, i.e. whether a particular index entry is visible to the current transaction and thus count towards the total returned. This means that to obtain an accurate result meaning each index entry must be visited to determine its tuple's visibility.

For a detailed discussion on the technical reasons for this problem and possible solutions, see the thread "Much Ado About COUNT(*)".

Beginning with PostgreSQL 9.2, index-only scans have become available which may speed up COUNT(*) queries under certain circumstances. See the wiki for details.

MAX() / MIN() use sequential scans

Affects: PostgreSQL <= 8.0

Another form of aggregate function with poor performance are MAX() and MIN(). A statement like SELECT MAX(col) FROM my_table will generally be very slow, as PostgreSQL will invariably perform a sequential scan. In contrast to the COUNT(*) above, workarounds are available: use

     SELECT col FROM table ORDER BY col DESC LIMIT 1

to replace MAX() and

     SELECT col FROM table ORDER BY col ASC LIMIT 1

to replace MIN(). See also http://www.postgresql.org/docs/current/static/functions-aggregate.html.

Beginning with PostgreSQL 8.1, indexes will automatically be used for MAX() and MIN().

UNICODE means "UTF-8"

The encoding "UNICODE" provided by PostgreSQL is actually the Unicode encoding UTF-8. This will cause applications expecting for example a stream of 2-byte UTF-16 characters to function incorrectly.

Note that PostgreSQL accepts both UTF-8 and UTF8 as synonyms for UNICODE, for example when creating a database.

See http://www.postgresql.org/docs/current/static/multibyte.html for information on PostgreSQL's character set support.

to_date() thinks February 31st is March 3rd

This one has escaped me for a long time, but I can proudly announce that PostgreSQL has its own little way of implementing MySQL compatibility [*], for those moments of madness when you want to coerce February 31st to March 3rd:

    postgres=# SELECT to_date('2014-02-31','YYYY-MM-DD');
      to_date
    ------------
     2014-03-03
    (1 row)

This is on 9.4devel, not some ancient version from the days when renaming a column involved recreating the entire table.

The whole business of munging arbitrary text strings into valid date and time types is of course a highly tricky area (and personally I'd be more than happy if the entire world could unify on 'YYYY-MM-DD HH24:MI:SS' format - maybe with an optional identifier to specify the year counting system). Word from on high is that it's actually there for Oracle compatibilty but that's certainly not the case in Oracle 12c:

    SQL> SELECT to_date('2014-02-31','YYYY-MM-DD') FROM DUAL;
    SELECT to_date('2014-02-31','YYYY-MM-DD') FROM DUAL
                   *
    ERROR at line 1:
    ORA-01839: date not valid for month specified

Nevertheless it's still there as something "people are used to".

So if you are parsing date input from dodgy sources don't rely on to_date(). Casting the string literal to the appropriate type is a much more promising approach:

    postgres=# SELECT '2014/02/31'::DATE;
    ERROR:  date/time field value out of range: "2014/02/31"
    LINE 1: SELECT '2014/02/31'::DATE;
                   ^
    postgres=# SELECT '2014/02/28'::DATE;
        date
    ------------
     2014-02-28
    (1 row)

[*] Actually a recent-ish 5.5 version silently converts '2014-02-31' to '0000-00-00', but let's not go down that particular rabbit hole of insanity here.


Comments
An additional Gotcha:

Invalid byte sequences cause a lot of trouble

Affects: PostgreSQL >= 8.1

If you create a table with any text type (i.e. text, varchar(10), etc.), then you can insert an invalid byte sequence into that field using octal escapes.

For instance, if you have a UTF8-encoded database, you can do:

=> CREATE TABLE foo(t TEXT);
=> INSERT INTO foo VALUES(E'\377');

Now, if you COPY the table out, you can't COPY the resulting file back in. That means your pg_dump backups won't be able to restore. The only way to get your data back in is to re-escape that value.

This also prevents some 3rd party applications from working correctly, like Slony.
Posted by: Jeff | 2006-11-02 02:16
This page is nice. please give some more information about the postgres
Posted by: srini | 2007-03-09 12:52
Another gotcha for postgres (but still there are too few ....)

http://people.planetpostgresql.org/greg/index.php?/archives/98-LISTEN-and-NOTIFY-gotcha.html
Posted by: Walter Cruz | 2007-03-14 13:35
The RANDOM() failures problem has been fixed in 8.2. It will not flatten subqueries that contain volatile functions.
Posted by: Kris Jurka | 2007-04-10 22:21
> Now, if you COPY the table out, you can't COPY the resulting file back in. That means your pg_dump backups won't be able to restore


Hi Jeff. I have a smilar problem like the above. I have a .sql file that process some insert instructions on postgres 7.4.12. and my DB is created with Unicode support. The .sql file want to insert some data with different characters including Arabic and Turkish.

First, when I run the sql I got an error about the chacter encoding and execution terminated.Then I add "set client encoding=Latin1" commend at the beginning of the .sql file. This time I did not get any error but some characters are inserted in to database with a strange form that I can not decode them back with Java correctly. So the characters are seem like the originals. Do you have any idea about how to run the .sql file so that no encoding problem will accure ?

Please replay to my e-mail (aycute@gmail.com)

Thanks;
Posted by: Aykut | 2007-05-07 23:18
Here are a couple of gotchas I have run across.

1) Side effects for DO ALSO rules using NEW:

Try this:
create table test (id serial, value varchar);

create table test2 (id int);

create rule test_insert as on insert to test do also insert into test2 (id) values (new.id);

insert into test (value) values ('test1');

select * from test;

id | value
----+-------
1 | test1
(1 row)

select * from test2;

id
----
2
(1 row)

This is because the rule substitutes the default expression (the next value of the sequence) rather than the value itself in the new.id value.

The workaround is to use a trigger instead.

2) Unintended cross-joins

select my_set_returning_function(), column from table results in an unintended cross join.
Posted by: Chris Travers | 2007-07-05 22:27
hi,i have a problem,i have to write a project about postgreSQL,its features and it data management stratergies do you mind helping me,hope you do not mind,i wil be glad if you help me.
Posted by: fene | 2007-07-27 15:43
> This can cause problems when converting mixed quoted and unquoted
> SQL from other databases with the opposite folding behaviour . For
> example, in Oracle SELECT "MY_COLUMN" FROM MY_TABLE and SELECT
> MY_COLUMN FROM "MY_TABLE" are equivalent; in PostgreSQL they are
> not.

To be nitpicky, Oracle is also case sensitive, but this example
doesn't highlight that. The difference is that Oracle defaults to
uppercase. So these statements

SELECT my_column FROM my_table;
SELECT MY_COLUMN FROM my_TablE;
SELECT "MY_COLUMN" FROM "MY_TABLE";

are all equivalent to Oracle. This statement, with quotes,

SELECT "my_column" FROM my_table;

is not the same as the other three to Oracle because I've forced the
lowercase. The point is to watch out for DB defaults when converting
SQL among different DBs, and the PostgreSQL default is not the same as
the default in Oracle or MySQL.
Posted by: Kevin Hunter | 2007-08-15 07:11
It's unclear whether LIMIT and OFFSET arguments are to be constants or not. I tried this:

create table limoff ( l int, o int );
insert into limoff values ( 10,1 ); -- 10 elements starting from the second one
select a.* from atable a,limoff limit limoff.l offset limoff.o;

ERROR: argument of OFFSET must not contain variables

create or replace function flim()
returns int as $body$
select l from limoff;
$body$ language sql;

create or replace function foff()
returns int as $body$
select o from limoff;
$body$ language sql;

select * from atable limit flim() offset foff();

-- OK!

The real difference is that with functions the parser knows that the variable value returned is a scalar (in contrast to a vector or a setof).
This limitation makes little sense, as the runtime check would disallow non scalar values as LIMIT and OFFSET arguments.
While allowing non-constant arguments would unlock a useful feature already in place.
Posted by: Vincenzo Romano | 2007-11-19 17:52