Tuesday, December 4, 2012

Finding plans in pg_stat_plans easily with pg_find_plans

As I recently blogged about, pg_stat_plans is a PostgreSQL satellite project I've been working on that aims to support earlier versions of Postgres that cannot use the new pg_stat_statements, and to track execution costs at the plan rather than the query granularity. It allows the user to easily explain each stored query text to see the plan for the entry, and has features that facilitate monitoring planner regressions.

Since PostgreSQL 9.0, support for machine-readable EXPLAIN output has existed. I'm not aware that anyone else got around to actually doing something interesting with this capability, though. I knew that in order to get the most benefit from pg_stat_plans, it ought to be possible to leverage this capability to search for plans based on arbitrary criteria, directly from SQL.

I've written an experimental submodule of pg_stat_plans, called pg_find_plans, that is designed to do just that - to quickly find plans and their execution costs, for those plans that, say, perform a sequential scan on a known large table.

Here's the description of pg_find_plans from its documentation:

pg_find_plans is written in PL/Python and PL/PgSQL. It is intended to provide users with a better way to ask questions like "what are the execution costs of all plans tracked since last statistics reset that involve a sequential scan against mytable, and have more than 2 joins?". That might be written as:
mydb=# select
  pg_stat_plans p
  stored_plans sp on (p.userid=sp.userid and p.dbid=sp.dbid and p.planid=sp.planid)
  join_count(json_plan) > 2
  contains_node(json_plan, 'Seq Scan', 'mytable');
order by
  1 desc nulls last;
Users should have a high degree of confidence that their queries on plan's structure are free of detectable errors, and pg_find_plans ensures this by carefully sanitising user input. For example, if the node of interest was specified as 'seq scan' above, the query would raise an error - to do any less might result in a false sense of security about the actual costs of plans that sequentially scan the tablemytable, since the implementation might then naively ignore sequential scan nodes, as a case-sensitive comparison is used internally. In general, making the interface hard to use incorrectly is even more important than making it easy to use correctly.
Strictly speaking, pg_find_plans is nothing more than a simple set of functions for storing JSON explain texts of plans that appear as pg_stat_plans entries into a dedicated table, and subsequently parsing those plans to answer interesting questions using SQL. However, pg_find_plans is a module that is likely to make pg_stat_plans much more useful than it might otherwise be. pg_find_plans is by no means feature complete or especially polished. The author's ambitions for the tool are described under "limitations" below.

pg_find_plans is distributed with pg_stat_plans, version 1.0 beta 3, as a submodule. I'm reasonably confident that there will be a stable release of pg_stat_plans soon.

While there are some problems with using a query text as a proxy for a plan that was once produced by that query text, these cases are handled reasonably well, though the "limitations" section of the pg_find_plans documentation should be understood by users. Still, pg_find_plans exists mostly to "test the waters" for a better-principled implementation. It remains to be seen just how much demand there is for this kind of functionality.

I must say that working on this gave me a new-found appreciation for JSON as a data-interchange format - it dawned on me just why some people consider the Postgres 9.2 JSON datatype so compelling a feature. The last time I needed to write some code that used a lowest common denominator interchange format, that format was the ludicrously verbose XML. Having a format that maps almost perfectly onto scripting language data structures cut down on the amount of boilerplate required considerably. JSON interacts well with Python's dynamic, strong typing, because a piece of JSON data almost looks like a declaration of a nested Python data structure, and can be fairly easily made to be manipulated as one too.

No comments:

Post a Comment