Peter Geoghegan's blog: First release of pg_stat

Saturday, October 20, 2012

First release of pg_stat_plans

Anyone who attended my recent talk at Postgres Open, which was co-presented with my 2ndQuadrant colleague Greg Smith, "Beyond Query Logging", will be aware that pg_stat_statements, the standard contrib module that assigns execution costs to queries and makes them available from a view in the database, has been improved considerably in the recent 9.2 Postgres release. It has been improved in a way that we believe will alter the preferred approach to workload analysis on PostgreSQL databases away from log analysis tools, which just don't offer the performance, flexibility or granularity of this new approach.

We also announced a new open source tool that addresses a related but slightly different problem (the analysis of plan execution costs, and planner regressions), as well as making most of the benefits of pg_stat_statements on 9.2 available to users stuck on earlier versions of Postgres. This new tool is called pg_stat_plans, and is itself based on pg_stat_statements.

The 9.2 pg_stat_statements feature of particular importance, the ability to "normalise" non-prepared statements that the large majority of applications use exclusively is now brought to earlier versions (versions 9.0 and 9.1, though pg_stat_plans works fine on 9.2 too). Since pg_stat_plans fingerprints plans rather than query trees, the way this works is slightly different to pg_stat_statements, and perhaps doesn't quite match people's intuitive expectations about how normalisation ought to behave in some cases. These differences have been extensively documented.

pg_stat_plans also has the ability to EXPLAIN a stored, representative SQL text, in order to facilitate deeper analysis of plan execution costs. Plan total_cost and startup_cost is tracked over time for each plan, for example, so that the "crossover point" at which the planner begins to prefer an alternative plan can sometimes be observed, and the planner's "reasoning" can perhaps be better understood.

pg_stat_plans is distributed under the PostgreSQL licence. I'd originally hoped to offer plan fingerprinting within pg_stat_statements itself, and said as much at the PostgreSQL developer's meeting in May, but I ultimately felt that due to the demand for this on earlier Postgres versions, the Postgres community would be best served by having the module as a satellite project. I'm naturally willing to accept third-party contributions through Github's "pull request" mechanism.

I am pleased to announce the first release of pg_stat_plans, 1.0 beta 1.

Those of you who missed the talk in Chicago can catch it at next week's PostgreSQL Conference Europe in Prague. See you there!

Official pg_stat_plans page: http://www.2ndQuadrant.com/en/pg_stat_plans

5 comments:

Josh BerkusOctober 20, 2012 at 12:11 PM
Peter,

As I said at Open, I'm really excited about this functionality and what it adds for database activity monitoring. We should catch up in Prague; I'd love to discuss what would be required to completely replace all log-parsing.
ReplyDelete
Replies
Peter GeogheganOctober 20, 2012 at 12:46 PM
Thanks Josh.

There is a fairly obvious direction that the development of pg_stat_plans ought to follow. There should be another extension distributed with pg_stat_plans, let's call it pg_plan_analysis, which is dependent on pg_stat_plans (though pg_stat_plans will continue to work fine on its own).

The first job of pg_plan_analysis is to maintain a current json explain datum for each plan tracked. So there'd be a new table, say pg_json_plans, that's full of json explains, that you can join against the pg_stat_plans view. This table is kept consistent over time with a function, that you call from psql in a cronjob from time to time (the infrastructure that this function would use already exists within pg_stat_plans). You can then write queries like this:

select * from pg_stat_plans p
left join pg_json_plans j
on (p.userid = j.userid and p.dbid = j.dbid and p.planid = j.planid)

where
plan_has_node('bitmap_index_scan', j.plan)
and
plan_has_index('idx_my_index', j.plan);

Here, the functions in the predicate are simple pl/python functions that parse json. It isn't a whole lot of work to add any of this - I just haven't had time. If someone else wants to write that code, I'd be happy to review it, though.
ReplyDelete
Replies

Add comment