March 6, 2013

PostgreSQL | A practical use for the HSTORE datatype

The HSTORE extension has been around quite a while, but until recently I've never found a situation where I can justify using it - here's a quick writeup of a simple use-case.

The application which runs this website is a homebrew one which I've been maintaining on-and-off for over a decade, initially so I could have a Perl'n'PostgreSQL-powered website which provided some functionality not otherwise available at the time; but also as a platform for experimenting with various database and web technologies. As such the underlying database schema has suffered some sprawl over the years, and recently I've been tidying things up.

One source of the sprawl are a couple of key/value tables I've created at some point to store arbitrary attributes to associate with records in other tables. For example, this application is basically a CMS which runs multiple sites from the same database; it has (and who'd've thought it) a table called "site", and associated with each site are a number of ad-hoc options which I've added as I've needed them. The table isn't very large and basically looks like this:

 site_id |          name          |       value        
       1 | admin_email            | ian@example.com
       2 | admin_email            | ian@example.com
       2 | outgoing_email_address | noreply@example.com
       2 | rss_style              | snippet
       3 | rss_style              | default
       5 | admin_email            | ian@example.com
       6 | admin_email            | ian@example.com
       7 | admin_email            | ian@example.com
       7 | site_login_path        | /login.html
       7 | site_register_path     | /register.html
       7 | outgoing_email_address | ian@example.com 
(11 rows)

Now, in the grand scheme of things this table is not causing any administrative or performance problems, and could happily stay as-is forever more, but it would also be an ideal candidate for conversion to an HSTORE field on the main site table. It also means I can remove a mildly ugly application method which simulates an "UPSERT", as storing a key/value pair in HSTORE simply overwrites an existing key's value.

A rundown of the steps needed to implement this shiny new HSTORE field:

Install the HSTORE extension on the PostgreSQL server:

For me this step is basically cd contrib/hstore && make && sudo make install, YMMV

As superuser, install the extension HSTORE in the desired database:
sitedb=# CREATE EXTENSION hstore;

Note: this extension installs a number of functions - ensure they get installed in the correct schema for the application user

Add an HSTORE field to the site table:

DEFAULT '' ensures an empty HSTORE will be created, which ensures "upserting" values into the HSTORE via concatenation will work

Copy the key/value pairs from the site_options table into the options HSTORE field:

This sounds simple but we need to flatten multiple key/value records into the matching HSTORE field.

To do this in a single query, we'll need to use the HSTORE(TEXT[],TEXT[]) function which constructs an HSTORE from separate key/value arrays; it draws the arrays from a CTE thusly:
  WITH attrs AS (
SELECT so.site_id ,
       ARRAY_AGG( name ORDER BY name) attr_key,
       ARRAY_AGG( value ORDER BY name) attr_value
  FROM site_option so
 GROUP BY so.site_id
UPDATE site si 
   SET options = HSTORE(attrs.attr_key, attrs.attr_value)
  FROM attrs 
 WHERE si.site_id =attrs.site_id

(Thanks to David J. and Andreas for the suggestions pn using ARRAY_AGG())

Update the application queries used to manage the site attributes:

Get attribute value
    SELECT value
      FROM site_option
     WHERE site_id=?
       AND name=?
    SELECT options->?
      FROM site
     WHERE site_id=?
Get all attributes
    SELECT name, value
      FROM site_option
     WHERE site_id=?
    SELECT key AS name, value FROM EACH(
      (SELECT options
         FROM site
        WHERE site_id=?)
Insert/update attribute
    SELECT 1
      FROM site_option
     WHERE name=?
       AND site_id=?

then either:

    UPDATE site_option
       SET value=?
     WHERE name=?
       AND site_id=?


INSERT INTO site_option
           (site_id, name, value)
     VALUES(?, ?, ?)
  UPDATE site
     SET options = options || HSTORE(?,?)
   WHERE site_id=?

Et voila. One less table.

Posted at 4:22 AM

Might want to also explore the array_agg() function instead of the array/subquery pair.
Posted by: David J. | 2013-03-06 04:53
Now check this out (9.3+):

CREATE TYPE site_options AS (admin_email text, outgoing_email_address text, rss_style text, site_login_path text, site_register_path text, some_integer_option integer, some_date_option date);

SELECT * FROM site, populate_record(null::site_options, site.options);

I wrote that right here in the comment box so there may be an error or two in there.
Posted by: xocolatl | 2013-03-06 07:01
@David J: I initially thought any solution would involve ARRAY_AGG(), but couldn't find a way of assembling the data in a way suitable for the HSTORE() function (i.e. a single text array with contents in "key","value" order, or twin text arrays containing keys and values respectively). If there is another way I'd be interested to hear it.
Posted by: Ian Barwick | 2013-03-06 08:35
This is how to solve the problem with array_agg().

WITH attrs AS (
SELECT so.site_id,
array_agg(name ORDER BY name) attr_key,
array_agg(value ORDER BY name) attr_value
FROM site_option so
GROUP BY so.site_id
UPDATE site si
SET options = HSTORE(attrs.attr_key, attrs.attr_value)
FROM attrs
WHERE si.site_id =attrs.site_id;
Posted by: Andreas | 2013-03-06 09:16
@Andreas: thanks, that makes sense. I didn't realise it was that simple to use ORDER BY in that context.
Posted by: Ian Barwick | 2013-03-06 10:18
I'd like to bring to attention an hstore-related question I asked on Stack Overflow sometime ago: "how to rename hstore keys". It also highlights a problem with NULL fields.

Posted by: Jeff | 2013-03-06 16:13
@xocolatl: that does indeed work as written in 9.3devel - interesting to know, thanks. (However the whole point of an adhoc key/value list is flexibility, coercing a predefined set into a type rather defeats the purpose ;) ).
Posted by: Ian Barwick | 2013-03-07 04:53