Database shrading and database partitioning in Rails

Sharding involves horizontally partitioning a database into smaller subsets, or shards, based on criteria such as geographic location or user activity. Each shard is stored on a separate server or set of servers, and queries are distributed across the shards. Sharding is typically used in distributed systems where data needs to be processed and accessed quickly and efficiently across multiple servers. Sharding can improve query performance and scalability, but it adds complexity to the system and requires careful planning and maintenance.
A real-world example of database sharding is in e-commerce applications where customer data is stored in a database. As the number of customers grows, the database becomes larger and query performance can degrade. Using sharding, the customer data can be divided into smaller subsets based on geographic location, order history, or other criteria. Each subset can be stored on a separate server or set of servers. This improves query performance by reducing the amount of data that needs to be scanned. So how to do this? Let's take a look:

Create a separate database for each shard

You can create multiple databases on the same server or on different servers. For example, if you want to shard based on geographic location, you can create a separate database for each region, such as customers_usa, customers_europe, customers_asia, and so on. Let's take a look at how we can achieve that in Rails.

Configure Rails to use multiple databases

In your database.yml file, you can define multiple database connections for each shard.

# Create separate databases for each shard
development:
  database: users_development
  username: postgres
  password: password

customers_usa:
  database: users_usa
  username: postgres
  password: password

customers_europe:
  database: users_europe
  username: postgres
  password: password

customers_asia:
  database: users_asia
  username: postgres
  password: password

You can then define a separate connection for each database you create.

# Configure Rails to use multiple databases
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
end

class CustomersUsa < ApplicationRecord
  establish_connection :customers_usa
end

class CustomersEurope < ApplicationRecord
  establish_connection :customers_europe
end

class CustomersAsia < ApplicationRecord
  establish_connection :customers_asia
end

Define a shard selection strategy:

# Define a shard selection strategy
class User < ApplicationRecord
  def self.find_by_id(user_id, request_location='us')
    shard(request_location).find_by(id: user_id)
  end
  
  def self.shard(request_location)
    if request_location == 'us'
      CustomersUsa
    elsif request_location == 'eu'
      CustomersEurope
    else
      CustomersAsia
    end
  end
end

For example, you can shard based on the user's geographic location by using the user's IP address to determine which region the user belongs to.
By using sharding in this way, you can spread the load of your user table across multiple databases, which can help improve query performance and scalability.
However, it's important to note that sharding adds complexity to your application and requires careful planning and maintenance. It's also not a silver bullet, and you should consider other strategies such as indexing, caching, and optimizing your queries before resorting to sharding. In addition, you'll need to carefully consider how to handle data consistency and migrations across multiple databases, monitor and scale your sharded databases over time.

What about database partitioning?

On the other hand, Partitioning divides a database into smaller subsets, or partitions, based on some criteria, such as date range or data type. Each partition is stored on the same server, and queries are executed locally within each partition. Partitioning is typically used when data needs to be managed and queried more efficiently, such as in data warehouses or systems that handle large amounts of time-series data. Partitioning can improve query performance and simplify database administration, but it can also introduce data corruption and requires careful planning and monitoring.
Let's see how we can do this in Postgres using native partitioning (we don't need to use pg_pathman anymore) and Rails migration.

class RenameUsersToUsersUnpartitioned < ActiveRecord::Migration[6.1]
  def change
    # let's say we have a table users already
    rename_table :users, :users_unpartitioned
  end
end

class AddPartitionedUsersTable < ActiveRecord::Migration[7.1]
  def up
    execute <<-SQL
      CREATE TABLE users_partitioned (
        id SERIAL PRIMARY KEY,
        name TEXT,
        email TEXT,
        age INT
      ) PARTITION BY HASH (id);
      
      CREATE TABLE users_partition_1 PARTITION OF users_partitioned FOR VALUES WITH (MODULUS 4, REMAINDER 0);
      CREATE TABLE users_partition_2 PARTITION OF users_partitioned FOR VALUES WITH (MODULUS 4, REMAINDER 1);
      CREATE TABLE users_partition_3 PARTITION OF users_partitioned FOR VALUES WITH (MODULUS 4, REMAINDER 2);
      CREATE TABLE users_partition_4 PARTITION OF users_partitioned FOR VALUES WITH (MODULUS 4, REMAINDER 3);

      INSERT INTO users_partitioned (id, name, email, age)
      SELECT id, name, email, age
      FROM users_unpartitioned;
      
      DROP TABLE users_unpartitioned;
    SQL
  end
end
sample migration files

This migration will create a partitioned table named users_partitioned, partitioned by the id column using hash partitioning with 4 partitions. It will also insert the data from the old users table into the new partitioned table and drop the old table. So how do you use this source? That's easy!

User.where("age > ?", 18)

This will generate a SQL query that includes all 4 partitions of the users_partitioned table.
As you can see, all the logic is behind the database and we don't need any special support inside our Rails application - so you can use regular queries like before.