Database shrading and database partitioning in Rails
Sharding involves horizontally partitioning a database into smaller subsets, or shards, based on criteria such as geographic location or user activity. Each shard is stored on a separate server or set of servers, and queries are distributed across the shards. Sharding is typically used in distributed systems where data needs to be processed and accessed quickly and efficiently across multiple servers. Sharding can improve query performance and scalability, but it adds complexity to the system and requires careful planning and maintenance.
A real-world example of database sharding is in e-commerce applications where customer data is stored in a database. As the number of customers grows, the database becomes larger and query performance can degrade. Using sharding, the customer data can be divided into smaller subsets based on geographic location, order history, or other criteria. Each subset can be stored on a separate server or set of servers. This improves query performance by reducing the amount of data that needs to be scanned. So how to do this? Let's take a look:
Create a separate database for each shard
You can create multiple databases on the same server or on different servers. For example, if you want to shard based on geographic location, you can create a separate database for each region, such as customers_usa
, customers_europe
, customers_asia
, and so on. Let's take a look at how we can achieve that in Rails.
Configure Rails to use multiple databases
In your database.yml
file, you can define multiple database connections for each shard.
# Create separate databases for each shard
development:
database: users_development
username: postgres
password: password
customers_usa:
database: users_usa
username: postgres
password: password
customers_europe:
database: users_europe
username: postgres
password: password
customers_asia:
database: users_asia
username: postgres
password: password
You can then define a separate connection for each database you create.
# Configure Rails to use multiple databases
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
end
class CustomersUsa < ApplicationRecord
establish_connection :customers_usa
end
class CustomersEurope < ApplicationRecord
establish_connection :customers_europe
end
class CustomersAsia < ApplicationRecord
establish_connection :customers_asia
end
Define a shard selection strategy:
# Define a shard selection strategy
class User < ApplicationRecord
def self.find_by_id(user_id, request_location='us')
shard(request_location).find_by(id: user_id)
end
def self.shard(request_location)
if request_location == 'us'
CustomersUsa
elsif request_location == 'eu'
CustomersEurope
else
CustomersAsia
end
end
end
For example, you can shard based on the user's geographic location by using the user's IP address to determine which region the user belongs to.
By using sharding in this way, you can spread the load of your user table across multiple databases, which can help improve query performance and scalability.
However, it's important to note that sharding adds complexity to your application and requires careful planning and maintenance. It's also not a silver bullet, and you should consider other strategies such as indexing, caching, and optimizing your queries before resorting to sharding. In addition, you'll need to carefully consider how to handle data consistency and migrations across multiple databases, monitor and scale your sharded databases over time.
What about database partitioning?
On the other hand, Partitioning divides a database into smaller subsets, or partitions, based on some criteria, such as date range or data type. Each partition is stored on the same server, and queries are executed locally within each partition. Partitioning is typically used when data needs to be managed and queried more efficiently, such as in data warehouses or systems that handle large amounts of time-series data. Partitioning can improve query performance and simplify database administration, but it can also introduce data corruption and requires careful planning and monitoring.
Let's see how we can do this in Postgres using native partitioning (we don't need to use pg_pathman
anymore) and Rails migration.
This migration will create a partitioned table named users_partitioned
, partitioned by the id
column using hash partitioning with 4 partitions. It will also insert the data from the old users
table into the new partitioned table and drop the old table. So how do you use this source? That's easy!
User.where("age > ?", 18)
This will generate a SQL query that includes all 4 partitions of the users_partitioned
table.
As you can see, all the logic is behind the database and we don't need any special support inside our Rails application - so you can use regular queries like before.