Peter Geoghegan's blog: 2012

Tuesday, December 4, 2012

Finding plans in pg_stat_plans easily with pg_find_plans

As I recently blogged about, pg_stat_plans is a PostgreSQL satellite project I've been working on that aims to support earlier versions of Postgres that cannot use the new pg_stat_statements, and to track execution costs at the plan rather than the query granularity. It allows the user to easily explain each stored query text to see the plan for the entry, and has features that facilitate monitoring planner regressions.

Since PostgreSQL 9.0, support for machine-readable EXPLAIN output has existed. I'm not aware that anyone else got around to actually doing something interesting with this capability, though. I knew that in order to get the most benefit from pg_stat_plans, it ought to be possible to leverage this capability to search for plans based on arbitrary criteria, directly from SQL.

I've written an experimental submodule of pg_stat_plans, called pg_find_plans, that is designed to do just that - to quickly find plans and their execution costs, for those plans that, say, perform a sequential scan on a known large table.

Notes on index-only scans

One of the most important performance features in Postgres 9.2 is index-only scans: the ability for certain types of queries to be performed without retrieving data from tables, potentially greatly reducing the amount of I/O needed. I recently completely overhauled the Index-only scans PostgreSQL wiki page, so that the page is now targeted at experienced PostgreSQL users that hope to get the most out of the feature.

My apologies to the authors of the feature, Robert Haas, Ibrar Ahmed, Heikki Linnakangas and Tom Lane, if my handling of the topic seems to focus on the negatives. Any reasonable article about any given index-only scan implementation would have to extensively discuss that implementation's limitations. Any discussion of Postgres index-only scans that focussed on the positives would be much shorter, and would essentially just say: "Index-only scans can make some of your queries go much faster!".

Saturday, October 20, 2012

First release of pg_stat_plans

Anyone who attended my recent talk at Postgres Open, which was co-presented with my 2ndQuadrant colleague Greg Smith, "Beyond Query Logging", will be aware that pg_stat_statements, the standard contrib module that assigns execution costs to queries and makes them available from a view in the database, has been improved considerably in the recent 9.2 Postgres release. It has been improved in a way that we believe will alter the preferred approach to workload analysis on PostgreSQL databases away from log analysis tools, which just don't offer the performance, flexibility or granularity of this new approach.

We also announced a new open source tool that addresses a related but slightly different problem (the analysis of plan execution costs, and planner regressions), as well as making most of the benefits of pg_stat_statements on 9.2 available to users stuck on earlier versions of Postgres. This new tool is called pg_stat_plans, and is itself based on pg_stat_statements.

Sorting improvements in PostgreSQL 9.2: the case for micro-optimisation

There has been much discussion of performance improvements in the upcoming 9.2 release of PostgreSQL. Recently, I noticed that Regina Obe and Leo Hsu's new book, "PostgreSQL: Up and running" prominently listed "Sorting improvements that improve in-memory sorting operations by as much as 20%" as a performance feature of that release. While they do get things about right there, I'm not sure that this improvement warrants such prominent placement, at least in sheer terms of its likely impact on the performance of production PostgreSQL systems - we packed a lot of great performance improvements into 9.2. The likely reason that it was picked up on in the book, and the real reason for this blogpost, is the story behind the development of the optimisation, which I for one find kind of interesting, and worth sharing. It's more interesting from the perspective of someone with a general interest in systems programming or PostgreSQL's design philosophy than a casual user, though. If you're a casual user, the short version is that simple queries that perform in-memory sorting of integers and floats will be about 23% faster.

Towards 14,000 write transactions per second on my laptop

Postgres 9.2 will have many improvements to both read and write scalability. Simon Riggs and I collaborated on a performance feature that greatly increased the throughput of small write transactions. Essentially, it accomplishes this by reducing the lock contention surrounding an internal lock called WALWriteLock. When an individual backend/connection holds this lock, it is empowered to write WAL from wal_buffers, an area of shared memory that temporarily holds WAL until it is written, and ultimately flushed to persistent storage.

Original update.sql "new group commit" benchmark, January 2012. This made it into Postgres 9.2. Here, we compare the performance of my original patch (red line) and Postgres master in January (green line). 9.1 performance on this benchmark would probably be very similar to that of the baseline seen here.

Much improved statement statistics coming to Postgres 9.2

There is a tendency for people with an interest in improving databases performance to imagine that it mostly boils down to factors outside of their application - the hardware, operating system configuration, and database settings. While these are obviously crucially important, experience suggests that in most cases, by far the largest gains are to be had by optimising the application’s interaction with the database. Doing so invariably involves analysing what queries are being executed in production, their costs, and what the significance of the query is to the application or business process that the database supports.

PostgreSQL has had a module available in contrib since version 8.4 - pg_stat_statements, originally developed by Takahiro Itagaki. The module blames execution costs on queries, so that bottlenecks in production can be isolated to points in the application. It does so by providing a view that is continually updated, giving real-time statistical information. Here is an example from the Postgres 9.2 docs:

Power consumption in Postgres 9.2

One of the issues of major concern to CPU vendors is optimising the power consumption of their devices. In a world where increasingly, computing resources are purchased in terms of fairly abstract units of work, and where, when selecting the location of a major data-centre, the local price of a kilowatt hour is likely to be weighed just as heavily as the wholesale price of bandwidth, this is quite understandable.

Globally, data centres consumed between 1.1 and 1.5 percent of electricity in 2010 (Source: Koomey). The economic and ecological importance of minimizing that number is fairly obvious.

Peter Geoghegan's blog

Tuesday, December 4, 2012

Finding plans in pg_stat_plans easily with pg_find_plans

Friday, November 16, 2012

Notes on index-only scans

Saturday, October 20, 2012

First release of pg_stat_plans

Wednesday, August 1, 2012

Sorting improvements in PostgreSQL 9.2: the case for micro-optimisation

Monday, June 4, 2012

Towards 14,000 write transactions per second on my laptop

Thursday, March 29, 2012

Much improved statement statistics coming to Postgres 9.2

Saturday, January 28, 2012

Power consumption in Postgres 9.2