Postgres Pitfall: Using Cascading Triggers

Outline

Introduction

Sold on the idea of pushing application code to the database [1] , I embarked on a journey to implement most of my complex data crunching logic - serving paginated posts from hackernews depending on wether the user has seen them before or not - which would normally take me two or three queries from the application side to just one stored function call.

I figured why not do everything with postgres? So long have I treated one of the most powerful tools in my stack as a dumb data store.

As it is with everything, there are gotchas and pitfalls that one has to be aware of. Cascading Triggers are one of them.

credits: xkcd

Triggers

Triggers are statements that get run when a certain event happens on a table as specified by the database programmer. They could be really useful to perform before or after insertion validation like checking if the user's email is of proper format and other data oriented logic, or pre-calculating or updating the records on other tables when a record is updated.

They strive to keep the database consistent.

Here's an example of a trigger that I have, to update the link of scrapped reddit posts.

 -- trigger sample
create trigger insert_link
    before insert
    on multi_reddit_posts
    for each row
execute procedure insert_link();

Insert link function goes like this

 -- insert_link function
create function insert_link() returns trigger
language plpgsql
as
$$
BEGIN
    IF new.link IS NULL OR new.link = '' THEN
        new.link = 'https://www.reddit.com' || new.permalink;
    END IF;
END
$$;

The write performance would always be better in a table that has no triggers compared to a table that has triggers.

Logic dictates, its better to do nothing than to do something.

Problem: Cascading triggers

Cascading triggers happen when you have a trigger that triggers other triggers in other tables. This overloads the database leading to unpredictable query performance as the triggers would have to lock the records when they are updating them.

A batch insert of say 1000 records, would place a lock on other tables that the triggers affect. These 1000 records would trigger other 1000 triggers which would then trigger other 1000 triggers leading to n * 1000 triggers being run.

Problem gets worse: Lack of traceability

This problem gets worse because of the fact that you would have a hard time tracing the unpredictable performance that your queries would sometime produce due to these cascading triggers being fired in the background and hogging resources.

This is a nightmare for maintainability and makes it really hard to refine your queries, as put forth by gitguardian's engineering team's decision to remove triggers in most of their tables. [2]

You would find yourself having to scale your database instance to handle the load of these triggers.

It is always easier to scale your application than to scale your database.

Conclusion

  • Should we stop using triggers? Hell no, but really understand the pitfalls of overusing them and the background resources that they consume.

notes:

  1. https://sive.rs/pg
  2. https://blog.gitguardian.com/love-death-triggers/

About the author

Namaste!, this is suvarna narayanan and I sorta write about stuff that I find interesting here! Check out my other apps: here and have fun!

Liked it? Share the love!