Since PostgreSQL 9.0, support for machine-readable EXPLAIN output has existed. I'm not aware that anyone else got around to actually doing something interesting with this capability, though. I knew that in order to get the most benefit from pg_stat_plans, it ought to be possible to leverage this capability to search for plans based on arbitrary criteria, directly from SQL.
I've written an experimental submodule of pg_stat_plans, called pg_find_plans, that is designed to do just that - to quickly find plans and their execution costs, for those plans that, say, perform a sequential scan on a known large table.
Here's the description of pg_find_plans from its documentation:
pg_find_plans is written in PL/Python and PL/PgSQL. It is intended to provide users with a better way to ask questions like "what are the execution costs of all plans tracked since last statistics reset that involve a sequential scan against mytable, and have more than 2 joins?". That might be written as:mydb=# select join_count(json_plan), p.* from pg_stat_plans p join stored_plans sp on (p.userid=sp.userid and p.dbid=sp.dbid and p.planid=sp.planid) where from_our_database and join_count(json_plan) > 2 and contains_node(json_plan, 'Seq Scan', 'mytable'); order by 1 desc nulls last;Users should have a high degree of confidence that their queries on plan's structure are free of detectable errors, and pg_find_plans ensures this by carefully sanitising user input. For example, if the node of interest was specified as 'seq scan' above, the query would raise an error - to do any less might result in a false sense of security about the actual costs of plans that sequentially scan the tablemytable, since the implementation might then naively ignore sequential scan nodes, as a case-sensitive comparison is used internally. In general, making the interface hard to use incorrectly is even more important than making it easy to use correctly.Strictly speaking, pg_find_plans is nothing more than a simple set of functions for storing JSON explain texts of plans that appear as pg_stat_plans entries into a dedicated table, and subsequently parsing those plans to answer interesting questions using SQL. However, pg_find_plans is a module that is likely to make pg_stat_plans much more useful than it might otherwise be. pg_find_plans is by no means feature complete or especially polished. The author's ambitions for the tool are described under "limitations" below.
pg_find_plans is distributed with pg_stat_plans, version 1.0 beta 3, as a submodule. I'm reasonably confident that there will be a stable release of pg_stat_plans soon.
While there are some problems with using a query text as a proxy for a plan that was once produced by that query text, these cases are handled reasonably well, though the "limitations" section of the pg_find_plans documentation should be understood by users. Still, pg_find_plans exists mostly to "test the waters" for a better-principled implementation. It remains to be seen just how much demand there is for this kind of functionality.
I must say that working on this gave me a new-found appreciation for JSON as a data-interchange format - it dawned on me just why some people consider the Postgres 9.2 JSON datatype so compelling a feature. The last time I needed to write some code that used a lowest common denominator interchange format, that format was the ludicrously verbose XML. Having a format that maps almost perfectly onto scripting language data structures cut down on the amount of boilerplate required considerably. JSON interacts well with Python's dynamic, strong typing, because a piece of JSON data almost looks like a declaration of a nested Python data structure, and can be fairly easily made to be manipulated as one too.
No comments:
Post a Comment