November 20, 2007

PostgreSQL | Converting tsearch2 to 8.3

One of the major features in the upcoming 8.3 release is the integration of the tsearch2 full text search extension as a core PostgreSQL feature. While there are no fundamental changes, there are some differences which make upgrading from an existing installation a little tricky. The following are my notes from upgrading a test version of the database which powers this website.

First off, to make administration easier, I've always kept the tsearch2 tables, functions etc. in their own schema (cunningly named tsearch) - this makes the job of separating the legacy tsearch2 elements from the rest of the database a lot easier.

  1. DML dump: Using 8.3's pg_dump, extract the DML except for the tsearch schema:
    ~/pg83b2/bin/pg_dump -p5431 -U postgres -N tsearch -s cms > /home/ian/outgoing/cms-dml.sql
    (The -N option, introduced in PostgreSQL 8.1, specifies schemas to exclude).

  2. Schema modification: remove any references to the tsearch schema; change index definitions from GIST to GIN. Note that the names of tsearch2 functions have changed, and are now prefixed with ts_, e.g. headline() is now ts_headline(). Any backend functions which use these will have to be modified as well. Import the modified schema into the new 8.3 database.

  3. Data dump: as with the DML, excluding the tsearch schema:
    ~/pg83b2/bin/pg_dump -p5431 -U postgres -N tsearch -a cms -Fc > /home/ian/outgoing/cms-data.pg
    Restore will probably require the --disable-triggers option:
    ~/pg83b2/bin/pg_restore --disable-triggers -U postgres -Fc -d cms -p5431 cms-data.pg

  4. Code modification: As mentioned above, tsearch2 function names have changed; any which are used in application code will need to be changed too. (A good argument for encapsulating this kind of stuff into server-side functions, but that's life with "legacy" software... Fortunately the application only uses a few standardized full text search queries, neatly contained in their own library, so modifying them was fairly simple).

And voila, it works - at least more-or-less. What's still missing here are any configuration changes to the original database's tsearch schema, e.g. the pg_ts_config table. (And having done all this, no doubt someone will come along and tell me about the easier way...)

(Added) some useful links:

Posted at 7:00 AM