Table of Contents
- General SQL
SELECT column alias, ...
- Unquoted object names fold to lower case
Implicit date parsing
- PostgreSQL can't multiply 256 * 256 * 256 * 256?!
Implicit FROM item and unintended cross joins RANDOM() failures
- COUNT(*) very slow
MAX() / MIN() use sequential scans
- UNICODE means "UTF-8"
to_date() thinks February 31st is March 3rd
- When NOT VALID is VALID - constraint checking
PostgreSQL is a fully-featured, robust open-source database with strong SQL standards compliancy. As with all RDBMS products it has its odd little idiosyncracies, which while documented, are not obvious, counter-intuitive or just head-scratchingly odd.
This is work-in-progress; gotchas which have been removed are shown greyed out. Please send your favourite PostgreSQL gotchas and any comments, corrections and criticism to <firstname.lastname@example.org>.
Note: this document contains many examples as they would appear in the command line client psql. For convenience, where statements are more than one line, the continuation prompt database-> has been omitted, to enable the statement to be copied directly from this page.
PostgreSQL requires AS when specifying alias names for output columns. Statements like the following will fail:
test=# SELECT id my_id FROM test_table tt; ERROR: syntax error at or near "my_id" at character 11
Use SELECT id AS my_id FROM test_table tt instead.
In the SQL92 standard, the optional keyword AS is just noise and can be omitted without affecting the meaning. The PostgreSQL parser requires this keyword when renaming output columns because the type extensibility features lead to parsing ambiguities in this context. AS is optional in FROM items, however.
Most other databases (tested: DB2 8.2, Firebird 1.5.0, MySQL in all current versions and Oracle 8.1.7) support the SQL-92 standard.
(The author did not realise for a long time that AS was not mandatory in SQL, but nevertheless prefers to use it because it makes SELECT statements easier to read and potentially less ambiguous.)
All unquoted identifiers are assumed by PostgreSQL to be lower case by default. This means SELECT MY_COLUMN FROM my_table and SELECT my_column FROM MY_Table both refer to my_column and my_table. SELECT "MY_COLUMN" FROM my_table by contrast refers to a different column.
This can cause problems when converting mixed quoted and unquoted SQL from other databases with the opposite folding behaviour . For example, in Oracle SELECT "MY_COLUMN" FROM MY_TABLE and SELECT MY_COLUMN FROM "MY_TABLE" are equivalent; in PostgreSQL they are not.
Affects: PostgreSQL <= 7.3
Prior to version 7.4 PostgreSQL silently swapped month and day values in ISO-style date strings YYYY-MM-DD if the month value was greater than 12, implicitly assuming the string to be YYYY-DD-MM:
database=> CREATE TABLE datetest (somedate DATE); CREATE TABLE database=> INSERT INTO datetest VALUES ('2002-16-12'); INSERT 640005 1 database=> INSERT INTO datetest VALUES ('2002-12-16'); INSERT 640006 1 database=> SELECT * FROM datetest; somedate ------------ 2002-12-16 2002-12-16 (2 rows)
Beginning with version 7.4 PostgreSQL will refuse to insert the first value without an explicit datestyle setting:
database=> INSERT INTO datetest VALUES ('2002-16-12'); ERROR: date/time field value out of range: "2002-16-12" HINT: Perhaps you need a different "datestyle" setting.
Affects: All PostgreSQL versions
PostgreSQL returns an out-of-range error when attempting this apparently simple multiplication:
test=> SELECT 256 * 256 * 256 * 256; ERROR: integer out of range
This is because PostgreSQL assumes non-qualified integers to be of type INT4, i.e. 32-bit integers.
To avoid this problem, cast at least one of the integers as INT8:
test=> SELECT 256::INT8 * 256 * 256 * 256; ?column? ------------ 4294967296 (1 row)
Note that in PostgreSQL versions prior to 8.0 operations on an INT4 integer which produces a maximum result greater than 4294967296 are returned as 0:
test=> SELECT 256 * 256 * 256 * 256; ?column? ---------- 0 (1 row)
PostgreSQL allows the omission of a FROM-clause in certain contexts. The most useful application of this is when computing the results of simple expressions (such as in the gotcha above):
test=> SELECT 1+1, 'Hello World'; ?column? | ?column? ----------+------------- 2 | Hello World (1 row)
Many other databases require a dummy one-row table for such operations, for example Oracle:
SQL> SELECT 1+1, 'Hello World' FROM DUAL; 1+1 'HELLOWORLD ---------- ----------- 2 Hello World
In the default setup PostgreSQL will also add an implicit FROM-clause if the table name(s) are directly referenced in the SELECT:
test=> SELECT table1.id; id ---- 1 2 3 (3 rows)
While this is a useful abbreviation, it can cause unexpected cross-joins:
test=> SELECT table1.* FROM table1 t1; NOTICE: adding missing FROM-clause entry for table "table1" id ---- 1 2 3 1 2 3 1 2 3 (9 rows)
Here, the statement has been expanded by PostgreSQL to SELECT table1.* FROM table1 t1, table1, resulting in a cartesian join. PostgreSQL does provide a warning notice, but this is easy to overlook.
To disable this behaviour, set the ADD_MISSING_FROM parameter to FALSE:
test=> SET add_missing_from TO FALSE; SET test=> SELECT table1.* FROM table1 t1; ERROR: missing FROM-clause entry for table "table1"
Note that this will not affect SELECT statements containing simple expressions, as in the first example.
The documentation contains further details.
Note: beginning with PostgreSQL 8.1, ADD_MISSING_FROM will be set to FALSE by default.
Affects: PostgreSQL <= 8.?
A statement such as the following:
SELECT id, is_true FROM (SELECT id, RANDOM() < 0.5 AS is_true FROM some_table) AS t_tmp WHERE is_true;
should only return boolean TRUE values in the column is_true. However is_true will contain ca. 50% FALSE values.
The cause is PostgreSQL's query planner flattening the statement into something along the lines of
SELECT id, RANDOM() < 0.5 FROM some_table AS t_tmp WHERE RANDOM() < 0.5;
resulting in two RANDOM() calls producing different results.
A simple workaround is to add OFFSET 0 to the subquery, which prevents it being flattened into the upper query:
SELECT id, is_true FROM (SELECT id, RANDOM() < 0.5 AS is_true FROM some_table OFFSET 0) AS t_tmp WHERE is_true;
Operations using COUNT(*) tend to be slow, especially on large datasets, and also when compared to some other RDBMS systems. Generally PostgreSQL will use a sequential rather than an index scan to compute the result.
The basic reason for this is that index entries do not contain transaction information, i.e. whether a particular index entry is visible to the current transaction and thus count towards the total returned. This means that to obtain an accurate result meaning each index entry must be visited to determine its tuple's visibility.
For a detailed discussion on the technical reasons for this problem and possible solutions, see the thread "Much Ado About COUNT(*)".
Beginning with PostgreSQL 9.2, index-only scans have become available which may speed up COUNT(*) queries under certain circumstances. See the wiki for details.
Affects: PostgreSQL <= 8.0
Another form of aggregate function with poor performance are MAX() and MIN(). A statement like SELECT MAX(col) FROM my_table will generally be very slow, as PostgreSQL will invariably perform a sequential scan. In contrast to the COUNT(*) above, workarounds are available: use
SELECT col FROM table ORDER BY col DESC LIMIT 1
to replace MAX() and
SELECT col FROM table ORDER BY col ASC LIMIT 1
to replace MIN(). See also http://www.postgresql.org/docs/current/static/functions-aggregate.html.
Beginning with PostgreSQL 8.1, indexes will automatically be used for MAX() and MIN().
The encoding "UNICODE" provided by PostgreSQL is actually the Unicode encoding UTF-8. This will cause applications expecting for example a stream of 2-byte UTF-16 characters to function incorrectly.
Note that PostgreSQL accepts both UTF-8 and UTF8 as synonyms for UNICODE, for example when creating a database.
See http://www.postgresql.org/docs/current/static/multibyte.html for information on PostgreSQL's character set support.
to_date() thinks February 31st is March 3rd
Resolved in PostgreSQL 9.5 (commit 5c4eb916)
This one has escaped me for a long time, but I can proudly announce that PostgreSQL has its own little way of implementing MySQL compatibility [*], for those moments of madness when you want to coerce February 31st to March 3rd:
postgres=# SELECT to_date('2014-02-31','YYYY-MM-DD'); to_date ------------ 2014-03-03 (1 row)
This is on 9.4devel, not some ancient version from the days when renaming a column involved recreating the entire table.
The whole business of munging arbitrary text strings into valid date and time types is of course a highly tricky area (and personally I'd be more than happy if the entire world could unify on 'YYYY-MM-DD HH24:MI:SS' format - maybe with an optional identifier to specify the year counting system). Word from on high is that it's actually there for Oracle compatibilty but that's certainly not the case in Oracle 12c:
SQL> SELECT to_date('2014-02-31','YYYY-MM-DD') FROM DUAL; SELECT to_date('2014-02-31','YYYY-MM-DD') FROM DUAL * ERROR at line 1: ORA-01839: date not valid for month specified
Nevertheless it's still there as something "people are used to".
So if you are parsing date input from dodgy sources don't rely on to_date(). Casting the string literal to the appropriate type is a much more promising approach:
postgres=# SELECT '2014/02/31'::DATE; ERROR: date/time field value out of range: "2014/02/31" LINE 1: SELECT '2014/02/31'::DATE; ^ postgres=# SELECT '2014/02/28'::DATE; date ------------ 2014-02-28 (1 row)
[*] Actually a recent-ish 5.5 version silently converts '2014-02-31' to '0000-00-00', but let's not go down that particular rabbit hole of insanity here.
When NOT VALID is VALID - constraint checking
While evaluating a client's schema recently with an eye to improving bulk loading performance, I noticed virtually all the non-foreign key constraints were defined like this:
CONSTRAINT col_check CHECK (some_col ~~ '.*foo.*') NOT VALID
The reason given was that the data was known to be good and no checking was required during loading, so the constraints were disabled for speed.
Unfortunately PostgreSQL is misleading on this point.
NOT VALID actually means do not validate rows existing before the constraint was added, which is useful when adding a constrain to a large table to avoid a potentially time-consuming check requiring an
ACCESS EXCLUSIVE lock (existing rows can be later verified with
ALTER TABLE name VALIDATE CONSTRAINT constraint_name).
Also unfortunately, PostgreSQL allows constraints to be defined as
NOT VALID when creating a table (even though this is not documented), further giving rise to the impression that constraints can be turned on and off at will.
postgres=# CREATE TABLE foo ( some_col TEXT, CONSTRAINT col_check CHECK (some_col ~~ '.*foo.*') NOT VALID ); CREATE TABLE postgres=# \d foo Table "public.foo" Column | Type | Modifiers ----------+------+----------- some_col | text | Check constraints: "col_check" CHECK (some_col ~~ '.*foo.*'::text) NOT VALID postgres=# INSERT INTO foo VALUES('bar'); ERROR: new row for relation "foo" violates check constraint "col_check" DETAIL: Failing row contains (bar). Time: 57.581 ms
For reference, the only way to turn an existing constraint off is to drop it entirely:
ALTER TABLE name DROP CONSTRAINT constraint_name