July 11, 2006

PostgreSQL | 5th Anniversary

Just back from Japan, where I've been busy on various private projects (using PostgreSQL wherever a database is required, of course), and haven't had much time to pursue the mailing lists, so hadn't really noticed that the 10th anniversary celebrations were underway. Congratulations all round!

Side note: Japan is a remarkably PostgreSQL-friendly country; an unscientific survey of various major bookstores in Tokyo showed that, for the open source databases, PostgreSQL books were in the majority. I'd say compared to a certain dolphin-orientated RDBMS there's about a 5:3 ratio in favour of PostgreSQL.

No, the title isn't a mistake: it occurred to me that it's almost exactly 5 years since I first used PostgreSQL. That was back in the waning days of the dot-com boom, when I was entrusted with the development of a CRM system for my former employer, Germany's first e-learning platform. The challenge was to unite an Oracle-based system with a MySQL-based system, and at least-possible cost (because the magic money pots were slowly drying up). Oh yes, the new system also had to take care of a fairly complex user authentication and session handling system for the new website.

Fortunately I had cut my SQL teeth on Oracle (version 7), and while this wasn't an option, neither was MySQL, which at the time completely lacked basic things such as foreign keys and transaction support (I think this was the period when MySQL was feuding with its transactional engine supplier about domain names). As a DB backend for an application which was to handle financial transactions, this was a total no-go (and this was long before I'd got my teeth into the infamous gotchas list). I had vaguely heard of PostgreSQL though, and decided to try it out, and while it had a few niggles (altering table definitions was a pain), I liked it.

The only real problem was speed. PostgreSQL as it then was, was noticeably slower than MySQL - at least if it was used like MySQL. I was able to demonstrate that, for example, while a for-loop in the application used to update a couple of dozen records with the current date took about three times as long in PostgreSQL, the same operation was at least as quick in PostgreSQL when using a subselect instead - and had the advantage that it was an atomic operation which could be embedded in a real transaction. Pointy-haired blessings all round.

Nevertheless we had absolutely no experience with PostgreSQL as a website backend for session management, and due to time constrictions didn't have any leeway to experiment. The "conventional" solution would have been a hasty rewrite of the entire system for MySQL ("it's only the database, SQL is pretty much the same isn't it?" etc). As I like sleeping at night though, this wasn't an option. A late evening at work produced a remarkably simple solution - a DIY replication system, using triggers on the critical tables to note changes in a replication table, and a simple Perl script running as a cron job to push the changes to a MySQL "slave". It sounds like a bit of a hack, and had the slight disadvantage that the (non-mission-critical) user session data was only available via the MySQL database; but it worked without hitch (excepting the odd network problem and occasional mysterious crashes on the "slave") for several years longer than intended (as is the way with stopgap solutions), by which time PostgreSQL had more than caught up with MySQL speedwise.

Since then I've used PostgreSQL in a variety of situations, and particularly in high-performance web-environments, and for the most part it has been a pleasure to work with. In fact I'd say that it is one of two major applications which have given me the least grief and have been the most pleasure to work with (the other application is Apache). I'd also say it has one of the most helpful support communities and I'd like to say a big thankyou to everyone involved - here's to the next 10 years!

Posted at 7:46 PM