Peter Geoghegan's blog

Sunday, March 23, 2014

What I think of jsonb

Unsurprisingly, there has been a lot of interest in the jsonb type, which made it into the upcoming 9.4 release of Postgres. I was initially a reviewer of jsonb, although since I spent weeks polishing the code, I was ultimately credited as a co-author.

Jsonb is a new datatype for Postgres. It is distinct from the older json datatype in that its internal representation is binary, and in that it is internally typed. It also makes sophisticated nested predicates within queries on jsonb indexable. I've occasionally described the internally-typed scalar values as having “shadow types” unknown to the core SQL parser. This has several implications. For example, if you sort two Jsonb values containing only scalar numbers, the implementation invokes the numeric comparator (which the jsonb default B-Tree opclass comparator is defined in terms of). The on-disk representation of jsonb includes the same representation as is used for, say, numerics (as the internal binary representation of JSON primitive numbers, for example). Plus, JSON objects are de-duplicated by key on input, and optimized for cheap binary searches within a single jsonb. Still, like the earlier json type, jsonb in every sense “speaks JSON”. There are some limitations on what can be represented as a jsonb number, but those are exactly the same limitations that apply to the core numeric type (plus some limitations imposed by the JSON RFC, such as not accepting NaN values). I hope it suffices to say that these limitations are virtually irrelevant, and that many implementations have similar or worse limitations. All of these minor implementation-defined restrictions are explicitly anticipated and allowed for by the recent JSON RFC-7159.

Moving on

Today was my last day at 2ndQuadrant.

The experience of working with 2ndQuadrant in the last couple of years has been very positive. I just decided it was time for a change. Being able to work on interesting problems during my time at 2ndQuadrant, both as a Postgres developer and as a consultant has been great fun, and very personally rewarding. I would like to acknowledge the invaluable support of both Simon Riggs and Greg Smith - thank you both. I wish the entire 2ndQuadrant staff all the best.

I expect to remain active as a Postgres developer, and already have plans to relocate to work for another company that is well known within the community.

Tuesday, December 4, 2012

Finding plans in pg_stat_plans easily with pg_find_plans

As I recently blogged about, pg_stat_plans is a PostgreSQL satellite project I've been working on that aims to support earlier versions of Postgres that cannot use the new pg_stat_statements, and to track execution costs at the plan rather than the query granularity. It allows the user to easily explain each stored query text to see the plan for the entry, and has features that facilitate monitoring planner regressions.

Since PostgreSQL 9.0, support for machine-readable EXPLAIN output has existed. I'm not aware that anyone else got around to actually doing something interesting with this capability, though. I knew that in order to get the most benefit from pg_stat_plans, it ought to be possible to leverage this capability to search for plans based on arbitrary criteria, directly from SQL.

I've written an experimental submodule of pg_stat_plans, called pg_find_plans, that is designed to do just that - to quickly find plans and their execution costs, for those plans that, say, perform a sequential scan on a known large table.

Notes on index-only scans

One of the most important performance features in Postgres 9.2 is index-only scans: the ability for certain types of queries to be performed without retrieving data from tables, potentially greatly reducing the amount of I/O needed. I recently completely overhauled the Index-only scans PostgreSQL wiki page, so that the page is now targeted at experienced PostgreSQL users that hope to get the most out of the feature.

My apologies to the authors of the feature, Robert Haas, Ibrar Ahmed, Heikki Linnakangas and Tom Lane, if my handling of the topic seems to focus on the negatives. Any reasonable article about any given index-only scan implementation would have to extensively discuss that implementation's limitations. Any discussion of Postgres index-only scans that focussed on the positives would be much shorter, and would essentially just say: "Index-only scans can make some of your queries go much faster!".