Friday, October 2, 2015

Avoid naming a constraint directly when using ON CONFLICT DO UPDATE

PostgreSQL 9.5 will have support for a feature that is popularly known as "UPSERT" - the ability to either insert or update a row according to whether an existing row with the same key exists. If such a row already exists, the implementation should update it. If not, a new row should be inserted. This is supported by way of a new high level syntax (a clause that extends the INSERT statement) that more or less relieves the application developer from having to give any thought to race conditions. This common operation for client applications is set to become far simpler and far less error-prone than legacy ad-hoc approaches to UPSERT involving subtransactions.

When we worked on UPSERT, many edge-cases were carefully considered. A technique called "unique index inference" allows DML statement authors to be very explicit about what condition they want to take the alternative (UPDATE or NOTHING) path on. That alternative path can only be taken in the event of a would-be duplicate violation in an "arbiter" unique index (for the DO NOTHING variant, a would-be exclusion violation is also a possible reason to take the alternative NOTHING path). The ability to write UPSERT statements explicitly and safely while also having lots of flexibility is an important differentiator for PostgreSQL's UPSERT in my view.



As the 9.5 INSERT documentation explains, the inference syntax contains one or more column_name_index (columns) and/or expression_index expressions (expressions), and perhaps an optional index_predicate (for partial unique indexes, which are technically not constraints at all). This is internally used to figure out which of any available unique indexes ought to be considered as an arbiter of taking the alternative path. If none can be found, the optimizer raises an error.

The inference syntax is very flexible, and very tolerant of variations in column ordering, whether or not a partial unique index predicate is satisfied, and several other things. It can infer multiple unique indexes at a time, which is usually not necessary, but can be in the event of a migration. CREATE INDEX CONCURRENTLY supports creating unique indexes, and it's easy to imagine someone reasonably having two logically equivalent unique indexes (or equivalent in all the ways that matter to certain UPSERT statements) for a while. Plus, unique indexes cannot be named directly, since they happen to not be cataloged as constraints. We considered that. Use the inference syntax, and you're unlikely to have any problems like this at all.

DML statement authors must be explicit when writing a statement using DO UPDATE in any case -- omitting some particular condition to take the UPDATE path on is simply disallowed (DO NOTHING does not have this restriction). We also added an escape hatch to name a constraint directly, ON CONFLICT ON CONSTRAINT <constraint_name>. This could be useful for exclusion constraints, but its use is generally discouraged since it does not handle these edge-cases.

No comments:

Post a Comment