PostgreSQL and lock timeout issues – how to deal with them also in Ruby on Rails?

Hi! I wanted to discuss a topic that keeps many developers awake at night – lock timeout issues in PostgreSQL. If you work with large databases and complex transactions on a daily basis, you know exactly what I'm talking about. I'll show you some practical ways to tame these annoying errors.

Why do timeouts occur?

Before we get into the specifics, let's take a moment to look at the source of the problem. In PostgreSQL, when one transaction tries to access a resource (such as a table or row) that is already locked by another, it simply has to wait. lock_timeout is a parameter that specifies how long a session should wait for the lock to be released before throwing an error. Similarly, statement_timeout monitors the maximum execution time of a single query.

Problems arise when long database migrations come into play, blocking a table to add a column or index. In production, where the application constantly queries the database, such a migration may wait indefinitely for its turn and eventually give up with a timeout error.

Quick solution at the role level

One of the easiest ways to work around the problem for one-off actions, such as implementing a new migration, is to temporarily raise the limits for a specific database user. This can be done with simple SQL commands.

Let's assume that our user is postgres_user_one. To give them more time, we execute:

ALTER ROLE postgres_user_one SET lock_timeout = '10min';
ALTER ROLE postgres_user_one SET statement_timeout = '10min';

This will give our migration 10 minutes to acquire a lock and another 10 minutes to execute each command. This should do the trick in most cases. After a successful migration, don't forget to restore the old values. You don't want to leave the door open for problematic, long-running queries in the future.

ALTER ROLE postgres_user_one SET lock_timeout = '10s';
ALTER ROLE postgres_user_one SET statement_timeout = '1min';

If you want to check the current settings, you can use the queries SHOW lock_timeout; and SHOW statement_timeout;.

Managing timeouts in Ruby on Rails migrations

If you are working in a Ruby on Rails environment, manually fiddling with settings can be a bit inconvenient. Fortunately, timeouts can be managed directly in the migration code. This gives you more control and ensures that the changes only work where they are needed.

We can wrap the problematic part of the migration in a block, inside which we will temporarily set higher limits.

def change
  # Temporarily increase timeouts for this specific operation
  execute "SET lock_timeout = '20min'"
  execute "SET statement_timeout = '20min'"

  add_reference :substitute_requests,
                :substitutional_staff_member_period,
                foreign_key: { to_table: :staff_member_periods },
                index: { algorithm: :concurrently }

  # Reset timeouts to their default values
  execute 'RESET lock_timeout'
  execute 'RESET statement_timeout'
end

In this example, before adding the reference and index, we increase lock_timeout and statement_timeout to 20 minutes. Importantly, after the entire operation, we reset them to their default values. Using algorithm: :concurrently when creating an index is another good practice – it allows you to avoid long table locking.

Global configuration with StrongMigrations

For those who prefer a system-wide approach, the strong_migrations gem is a great tool. It allows you to centrally manage the timeout policy for all migrations in your application.

In the configuration file (e.g., config/initializers/strong_migrations.rb), you can define global settings:

# How long will we wait for a lock to become available.
StrongMigrations.lock_timeout = 10.minutes

# Any single SQL statement in the migration can run for at most 1 hour.
StrongMigrations.statement_timeout = 1.hour

# If we fail due to a lock being unavailable, retry the entire migration this many times.
StrongMigrations.lock_timeout_retries = 3

# Between fails sit this long hoping for whatever's holding the lock to complete.
StrongMigrations.lock_timeout_retry_delay = 30.seconds

This approach not only solves the timeout problem, but also introduces a retry mechanism. If the migration fails due to lack of access to the lock, strong_migrations will try to execute it again after a specified time. This really increases the reliability of deployments.

I hope these examples will help you combat Postgres timeouts. Remember that the key is to understand the problem and choose the right tool – from quick, one-off commands to automated, global configurations.

Happy migrating!