Friday, May 18, 2018

Visualizing a column's space overhead using pg_hexedit

pg_hexedit recently gained the ability to annotate the space taken up by each individual column/attribute within each individual tuple. This works with tables, and with B-Tree indexes.

I had to come up with a way of passing the pg_hexedit frontend utility the relevant pg_attribute metadata to make this work. This metadata describes the "shape" of individual tuples in a relation (backend code uses a closely related structure called a "tuple descriptor"). My approach works seamlessly in simple cases, but can still be used when manually running the pg_hexedit command line tool.

pg_attribute system catalog table with column annotations/tags

This new capability could be applied to optimizing the data layout of a table that is expected to eventually have a massive number of rows. Carefully choosing the order and type of each column can reduce the total on-disk footprint of a table by an appreciable amount, especially when the final table ends up with several 1 byte columns that get packed together.

I am aware of several PostgreSQL users that found it worthwhile to have a highly optimized tuple layout, going so far as to use their own custom dataypes. Alignment-aware micro-optimization of a Postgres client application's schema won't help much in most cases, but it can help noticeably with things like fact tables, or tables that contain machine-generated event data. Developing a sense of proportion around storage overhead should now be easier, and more intuitive.

No comments:

Post a Comment