Kory Porter.

Postgres Exclusion Constraints

Cover Image for Postgres Exclusion Constraints

This post explores Postgres exclusion constraints through a simple but evolving scenario, exclusion constraints have been in Postgres since 9.0.

Don't exclude me!

I've recently been introduced to the concept of treating database tables like they are ledgers (where you don't edit the original, you instead create a new record that extends from the old) as an alternative to using history tables to track changes to rows. Schema design with tables acting as ledgers introduced new technical challenges to me. One example (that I'll expand on later) is that you can no longer use a unique index constraint to guarantee that a combination of values are never re-used in a different row (without further normalising).

Scenario

Let's pretend we're designing a schema for a bus transportation company, called Bus-sy. Bus-sy wants to keep a register of their buses so they have a single place to look up the following information. They've told us that:

  • a bus has a unique id
  • a bus has a registration number
  • a bus optionally has a driver that can change

We aren't given any constraints from Bus-sy yet, other than they want us to treat the register like a ledger.

DISCLAIMER: This is a somewhat contrived scenario to support using a single ledger table in place of instead normalising these tables, this might be considered using exclusion constraints for the wrong reasons, or hacking around the problem. You could avoid this by instead favouring three tables to put the schema into 2NF: bus_register, bus_registration, and bus_driver, all you would need is then some unique constraints. With that design it's likely bus_driver is the only table that gets updated as well.

create table bus_register_ledger(
  -- auto-incrementing id, useful if you ever need to reference a specific ledger entry at a point in time
  id serial primary key,
  -- the "ledger" id, this gets re-used across all updates to this entity
  bus_id text not null,
  registration text not null,
  driver text,
  created_at timestamptz not null default clock_timestamp()
);

When would you use an exclusion constraint?

Simply put, exclusion constraints allow you to perform check(s) across two rows, and if all checks are truthy, then the constraint has failed (I'll demonstrate this manually later). These checks are likely to be equality (=), inequality (<>), and overlaps (&&), however, there is an abundance of operators that can be used on gist indexes, see the builtin gist op classes here.

Implementing an exclusion constraint

Bus-sy has told us that they'd like us to capture the fact that buses cannot share a registration, nor can it ever be re-used by another bus. In other words, we need a constraint that enforces that new rows cannot use an already used registration value UNLESS it is already associated with the id of the bus we're adding a new row for.

Right now, there's nothing stopping us from inserting rows into our bus_register_ledger table that would break the above requirement.

insert into
  bus_register_ledger(bus_id, registration)
values
  ('bus-id-1', 'bussy1'),
  ('bus-id-2', 'bussy1');
-- INSERT 0 2

select * from bus_register_ledger;
-- id  bus_id    registration  driver  created_at
-- 1   bus-id-1  bussy1        <null>  2023-03-06 19:39:00.353883+00
-- 2   bus-id-2  bussy1        <null>  2023-03-06 19:39:00.354673+00

As expected, nothing is stopping new rows from being inserted that break the business requirement. Let's fix that, maybe we could add a unique index here instead?

create unique index unique_bus_id_registration_pair on bus_register_ledger (bus_id, registration);

insert into
  bus_register_ledger(bus_id, registration)
values
  ('bus-id-1', 'bussy1'),
  ('bus-id-2', 'bussy1');
-- INSERT 0 2

Well, that didn't work, makes sense too, that unique index only guarantees that a bus+registration pair are never re-used. What if we tweaked it to also instead have multiple unique indexes?

create unique index unique_bus_id_registration_pair on bus_register_ledger (bus_id, registration);

create unique index unique_bus_id on bus_register_ledger (bus_id);

create unique index unique_registration on bus_register_ledger (registration);

insert into
  bus_register_ledger(bus_id, registration)
values
  ('bus-id-1', 'bussy1');

insert into
  bus_register_ledger(bus_id, registration)
values
  ('bus-id-2', 'bussy1');

-- duplicate key value violates unique constraint "unique_registration"
-- DETAIL:  Key (registration)=(bussy1) already exists.

Now we're getting somewhere... But what if we wanted to update the driver for a given bus? Remember that we're operating with ledgers here, so we need to roll forward with the new value instead of updating it in place. We're probably going to have some issues. Let's try anyway!

-- same three indexes as before
create unique index unique_bus_id_registration_pair on bus_register_ledger (bus_id, registration);

create unique index unique_bus_id on bus_register_ledger (bus_id);

create unique index unique_registration on bus_register_ledger (registration);

insert into
  bus_register_ledger(bus_id, registration, driver)
values
  ('bus-id-1', 'bussy1', 'jessica');

-- assign new driver to the bus
insert into
  bus_register_ledger(bus_id, registration, driver)
select
  bus_id,
  registration,
  'adam'
from
  bus_register_ledger
where
  bus_id = 'bus-id-1';

-- duplicate key value violates unique constraint "unique_bus_id_registration_pair"
-- DETAIL:  Key (bus_id, registration)=(bus-id-1, bussy1) already exists.

The constraints are doing what we told them to do here... But they're stopping us from rolling forward with the old row data. To solve for this, lets introduce an exclusion constraint!

create extension if not exists btree_gist;

alter table
  bus_register_ledger
add
  constraint unique_bus_id_registration_pair exclude using gist (registration with =, bus_id with <>);

Before we test if the above works (hint: it does 🤓), let's break the SQL apart a bit.

The first line adds the btree_gist extension, useful exclusion constraints require you to use gist indexes (or sp-gist). If your exclusion constraint works with a standard btree index, it could've been implemented more efficiently as an ordinary unique index. We're using a gist index here for two reasons. It gives us access to the <> (not equals) operator in the constraint, and its internal data structure is better suited to efficiently compare values across multiple rows.

The second line alters the table to add an exclusion constraint. We've specified two expressions in our constraint, one for bus_id inequality, and another for registration equality. The with keyword is baked into the syntax of an exclusion constraint, do not confuse this with the with keyword you would see in a CTE. In plain English, we've told Postgres that if we modify or insert a new row, it must not be using a registration that is otherwise NOT already assigned to the bus. Exclusion constraints compare two rows at a time, let's demonstrate that by emulating these checks manually - Postgres would be doing this a lot more efficiently by using gist indexes.

-- continuing from the above (we've created the exclusion constraint)

insert into
  bus_register_ledger(bus_id, registration, driver)
values
  ('bus-id-1', 'bussy1', 'jessica');
-- INSERT 0 1

select * from bus_register_ledger;
-- id  bus_id    registration  driver   created_at
-- 1   bus-id-1  bussy1        jessica  2023-03-06 19:48:33.662441+00

The constraint allows this change because no other rows exist to compare it against. There is no check for us to emulate, let's add another row.

insert into
  bus_register_ledger(bus_id, registration, driver)
values
  ('bus-id-2', 'bussy2', 'john');
-- INSERT 0 1

select * from bus_register_ledger;
-- id  bus_id    registration  driver   created_at
-- 1   bus-id-1  bussy1        jessica  2023-03-06 19:48:33.662441+00
-- 2   bus-id-2  bussy2        john     2023-03-06 19:48:33.670534+00

We've added another row, it was successful because there is no overlap on registration. But what did the check look like? Remember that exclusion constraints run the checks across two rows at a time. There's only one row in the table at this time, so we'll compare our new update with the existing row.

Row with id 1, has

  • bus_id = bus-id-1
  • registration = bussy1

The row we've tried to insert has

  • bus_id = bus-id-2
  • registration = bussy2

Our constraint has two checks, bus_id with <> and registration with =.

Evaluating the bus_id with <> check looks like: bus-id-1 !== bus-id-2, so we have TRUE.

Evaluating the registration with = check looks like: bussy1 === bussy2, so we have FALSE.

Inserts/updates are only considered to violate exclusion constraints if ALL expressions in the constraint evaluate to be true. Therefore this check has not violated the constraint. This is the only row to compare to, so the insert is allowed!

Let's add another row, this time we'll change the driver of the bus with id bus-id-1. In the example below we're carrying forward the bus_id and registration.

insert into
  bus_register_ledger(bus_id, registration, driver)
select
  bus_id,
  registration,
  'adam'
from
  bus_register_ledger
where
  bus_id = 'bus-id-1';
-- INSERT 0 1

select * from bus_register_ledger;
-- id  bus_id    registration  driver   created_at
-- 1   bus-id-1  bussy1        jessica  2023-03-06 19:48:33.662441+00
-- 2   bus-id-2  bussy2        john     2023-03-06 19:48:33.670534+00
-- 3   bus-id-1  bussy1        adam     2023-03-06 19:48:33.680098+00

Again, the insert was successful, our new row is the third one, but what did the check look like? Now that we've got multiple rows in the table, we'll need to execute the check expressions across both of them.

Row with id 1, has

  • bus_id = bus-id-1
  • registration = bussy1
  • driver = jessica

The row we've tried to insert has

  • bus_id = bus-id-1
  • registration = bussy1
  • driver = adam

Our constraint has two checks, registration with = and bus_id with <>.

Evaluating the bus_id with <> check looks like: bus-id-1 !== bus-id-1, so we have FALSE.

Evaluating the registration with = check looks like: bussy1 === bussy1, so we have TRUE.

An exclusion constraint is only violated if ALL expressions in the constraint evaluate to be true, so semantically we have FALSE AND TRUE, which is FALSE. Therefore this insert is allowed against this row. Let's check the next row.

Row with id 2, has

  • bus_id = bus-id-2
  • registration = bussy2
  • driver = john

The row we've tried to insert has

  • bus_id = bus-id-1
  • registration = bussy1
  • driver = adam

Our constraint has two checks, registration with = and bus_id with <>.

Evaluating the bus_id with <> check looks like: bus-id-2 !== bus-id-1, so we have TRUE.

Evaluating the registration with = check looks like: bussy2 === bussy1, so we have FALSE.

An exclusion constraint is only violated if ALL expressions in the constraint evaluate to be true, so semantically we have TRUE AND FALSE, which is FALSE. Therefore this insert is allowed against this row.

This insert was compared against all available rows and the exclusion constraint wasn't violated, this insert was allowed.

Alright, so far so good. Let's try and break the constraint and then understand why by looking at the truth table. To break the constraint, we'll attempt to register a new bus with id bus-id-3 using the same registration as bus-id-2.

insert into
  bus_register_ledger(bus_id, registration, driver)
values
  ('bus-id-3', 'bussy2', 'brendan');

-- conflicting key value violates exclusion constraint "unique_bus_id_registration_pair"
-- DETAIL:  Key (registration, bus_id)=(bussy2, bus-id-3) conflicts with existing key (registration, bus_id)=(bussy2, bus-id-2).

That insert failed, good stuff! Let's compare our insert against all three existing rows and see which row caused the constraint to fail.

Row with id 1, has

  • bus_id = bus-id-1
  • registration = bussy1
  • driver = jessica

The row we've tried to insert has

  • bus_id = bus-id-3
  • registration = bussy2
  • driver = brendan

Our constraint has two checks, registration with = and bus_id with <>.

Evaluating the bus_id with <> check looks like: bus-id-1 !== bus-id-3, so we have TRUE.

Evaluating the registration with = check looks like: bussy1 === bussy2, so we have FALSE.

An exclusion constraint is only violated if ALL expressions in the constraint evaluate to be true, so semantically we have TRUE AND FALSE, which is FALSE. Therefore this insert is allowed against this row.

Row with id 2, has

  • bus_id = bus-id-2
  • registration = bussy2
  • driver = john

The row we've tried to insert has

  • bus_id = bus-id-3
  • registration = bussy2
  • driver = brendan

Our constraint has two checks, registration with = and bus_id with <>.

Evaluating the bus_id with <> check looks like: bus-id-2 !== bus-id-3, so we have TRUE.

Evaluating the registration with = check looks like: bussy === bussy2, so we have TRUE.

An exclusion constraint is only violated if ALL expressions in the constraint evaluate to be true, so semantically we have TRUE AND TRUE here, which is TRUE. Therefore this insert has violated the constraint against the second row in our bus register.

So the check against row 1 was allowed as the registrations are different. The check against row 2 was NOT allowed. This is because both expressions returned TRUE. The bus_id's are different, but the registration's are the same. I didn't show it here, but the check against the third row would've been allowed as the registrations are different. Wouldn't surprise me however if the query planner is smart enough to realise that the checks are executing on the same values, and its skips them, making this highly efficient in a ledger environment.

Hopefully, this helped you visualise how exclusion constraints work.