What are Summary Tables?
The trick follows this formula:
- Create a table with the summary data as you'd like to see it.
- Build a stored procedure that updates the summary table.
- Setup a trigger that will run when your original tables are updated.
We're going to begin with a simple data model: a table for species of animals, including how many legs each species has, and a table for pets, who belong to a specific species. We want to easily query the database for the number of pets who have, say, four legs. Here's the structure for our animal tables, minus any keys:
CREATE TABLE species ( id integer NOT NULL, name text, leg_count integer ); CREATE TABLE pets ( id integer NOT NULL, name text, species_id integer );
And here's the query that gives us a tally of how many pets have each number of legs:
SELECT species.leg_count as leg_count, count(*) as pet_count FROM pets JOIN species on pets.species_id = species.id GROUP BY species.leg_count
Now, to the magic.
Create the Summary Table
Our summary table is easy; it has the same format as the output of our complex join query:
CREATE TABLE leg_counts ( id integer NOT NULL, leg_count integer, pet_count integer );
Each record stores a leg count, and the number of pets that have that many legs.
Build the Summary Table
Here's the code:
CREATE FUNCTION update_leg_counts() RETURNS trigger LANGUAGE plpgsql AS $$ DECLARE pet_leg_count INTEGER; pet_count INTEGER; BEGIN SELECT INTO pet_leg_count species.leg_count FROM species WHERE species.id = NEW.species_id; SELECT INTO pet_count count(*) FROM pets JOIN species ON species.id = pets.species_id WHERE species.leg_count = pet_leg_count; DELETE FROM leg_counts where leg_counts.leg_count = pet_leg_count; INSERT INTO leg_counts (leg_count, pet_count) values (pet_leg_count, pet_count); RETURN NULL; END; $$;
Now that's a mouthful! We start by creating the function, with a return type of "trigger". This particular stored procedure has no parameters - it will be triggered by the creation of a new "pets" record, and that new record will be available within the function as "NEW". We set the language of the stored procedure as plpgsql, and declare a couple of variables that will be used in our code.
We set pet_leg_count to the number of legs that our new pet has. This is important, because we don't want to update ALL the records in our summary table, just the one that keeps track of our pet's same-legged friends. Next, we set pet_count to the number of pets with this many legs. Finally, we delete any summary table records with the same number of legs, and recreate it with the new leg count. We do it this way so our code works whether a record for that many legs previously existed, or not.
Setup the Trigger
Now for probably the easiest part of this process: telling Postgres to execute the stored procedure whenever a new pet is added:
CREATE TRIGGER update_leg_counts AFTER INSERT ON pets FOR EACH ROW EXECUTE PROCEDURE update_leg_counts();
I gave the trigger the same name as the stored procedure, sorry if that's confusing. I didn't see a good reason to give them distinct names.
The difference is...magic? As we add pet records, we can measure the speed of querying pet counts by number of legs, both before and after the summary table. These were my actual performance numbers, which you can recreate with the code in the git repository I provided at the top of this article.
|Number of Pets||Query Time w/o Summary||Query Time with summary||Speed Improvement|
The speed of querying the summary table remains constant no matter how many pet records we add to the database, while performing the complex join query gets increasingly sluggish. Even at just a hundred pet records, there is marked benefit. By the time we're storing 10,000 pets, we're saving over 97% of the time it takes to find out how many same-legged friends Fido has. Magic!