PostgreSQL indexing for newbies - Practical Guide

Are your database queries running slower than a snail on vacation? Do you find yourself waiting endlessly for results to load? If you're nodding along, it might be time to learn about PostgreSQL indexing - one of the most powerful tools in your database optimization toolkit.

In this article, we'll dive into the world of PostgreSQL indexing with practical examples that will help you understand how to make your queries lightning-fast. No complicated jargon, just straightforward explanations, and real-world applications.

What are database indexes anyway?

Think of database indexes like the index section at the back of a book. When you want to find information about a specific topic, you don't read the entire book page by page - you check the index first to find exactly where to look.

Similarly, database indexes store information about where specific data rows are located in a table. Without an index, PostgreSQL has to scan the entire table to find what you're looking for (called a sequential scan). With a proper index, it can jump directly to the relevant rows.

Indexes are their own data structures stored on disk alongside your data tables. They're part of PostgreSQL's data definition language (DDL) and are automatically maintained by the database when you insert, update, or delete data.

Types of PostgreSQL indexes

PostgreSQL offers several types of indexes, each designed for specific use cases:

B-tree indexes: The all-purpose solution

B-tree indexes are the default and most common type in PostgreSQL. They're excellent for general-purpose indexing on columns you frequently query.

Let's create a simple B-tree index on a weather events table:

SELECT * FROM payment WHERE event_type='Payment Made'

-- Create a B-tree index
CREATE INDEX idx_payment_type ON payment(event_type)

Before adding this index, our query execution time was around 37.8 milliseconds. After adding the index, it dropped to just 2 milliseconds!

Multi-column B-tree indexes

Sometimes you need to query on multiple columns together. In these cases, a multi-column index can be more efficient:

-- Query using multiple conditions
SELECT * FROM payment WHERE event_type='Payment Cancelled' AND amount > '0'"

-- Create a multi-column index
CREATE INDEX idx_payment_cancelled_amount ON payment(event_type, amount)

BRIN indexes: For large time-series data

Block Range INdexes (BRIN) are specially designed for very large datasets where data is naturally ordered, like timestamps or date ranges. They're incredibly space-efficient and perfect for time-series data.

-- Query for time-range data
SELECT device_name, value FROM devices WHERE ts BETWEEN '2023-07-13 0:00' AND '2023-07-14 0:00'

-- Create a BRIN index
CREATE INDEX device_time ON iot USING brin(ts)

GIST indexes: For spatial data

Generalized Search Tree (GIST) indexes are perfect for spatial data and full-text search. If you're working with geographic information, these are your best friends.

-- Count how many parks are within residential districts
SELECT COUNT(parks.park_id) 
FROM city_parks parks
JOIN residential_zones zones 
ON ST_Contains(zones.geometry, parks.geometry)
WHERE zones.district_name LIKE 'Downtown%';

-- Create a GIST index for spatial data on the parks table
CREATE INDEX city_parks_geometry_idx 
ON city_parks 
USING GIST (geometry);

-- Find all schools within 500 meters of major roads
SELECT schools.school_name, roads.road_name
FROM public_schools schools
JOIN major_roads roads
ON ST_DWithin(schools.location, roads.centerline, 500)
ORDER BY schools.school_name;

GIN indexes: For JSON and array data

Generalized Inverted Indexes (GIN) are ideal when you have multiple values in a single column, like arrays or JSON data.

-- Find products with a specific category in JSON data
SELECT 
    data -> 'product_name' as product_name,
    data -> 'price' as price
FROM inventory 
WHERE data @> '{"category": "electronics"}';

-- Create a GIN index for JSON data to improve query performance
CREATE INDEX inventory_json_idx ON inventory USING gin(data);

Using EXPLAIN ANALYZE to measure performance

When working with indexes, you need a way to measure their impact. PostgreSQL's EXPLAIN ANALYZE command is your best friend here:

EXPLAIN ANALYZE SELECT * FROM projects WHERE type='Active';

This command shows you the query plan, execution time, and whether PostgreSQL is using your indexes. Look for these key indicators:

  • Seq Scan: Means PostgreSQL is scanning the entire table (slow for large tables)
  • Index Scan or Bitmap Index Scan: Means PostgreSQL is using your index (faster)

Maintaining your indexes

Indexes aren't "set and forget". Here are some maintenance tips:

Run ANALYZE regularly

The ANALYZE command collects statistics about your tables that help PostgreSQL decide when to use indexes:

-- Run ANALYZE on a specific table
ANALYZE users;

--- Or on the entire database
ANALYZE;

Check for unused indexes

Unused indexes waste disk space and slow down write operations. Find them with:

SELECT schemaname || '.' || relname AS table,
       indexrelname AS index,
       pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
       idx_scan AS index_scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique
  AND idx_scan < 50
  AND pg_relation_size(relid) > 5 * 8192
ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST,
         pg_relation_size(i.indexrelid) DESC

Check your index hit rate

A good index hit rate should be above 95% for tables with more than 10,000 rows:

SELECT relname,
       100 * idx_scan / (seq_scan + idx_scan) AS percent_of_times_index_used,
       n_live_tup AS rows_in_table
FROM pg_stat_user_tables
WHERE seq_scan + idx_scan > 0
ORDER BY n_live_tup DESC

When to use indexes (and when not to)

Indexes aren't always the answer. Here are some guidelines:

Use indexes when:

  • You frequently query a column
  • Your table has many rows
  • The column has high cardinality (many unique values)
  • Read operations outnumber write operations

Avoid indexes when:

  • Your table is very small
  • The column has low cardinality (few unique values)
  • The table is frequently updated
  • You rarely query on that column

Finding the right balance

Remember, indexes take up disk space and slow down write operations (INSERT, UPDATE, DELETE) because the index must be updated along with the data. It's about finding the right balance - not too many indexes, not too few, but just right.

As a rule of thumb:

  • Cache hit ratio should be > 99%
  • Index hit ratio should be > 95% (on tables with more than 10,000 rows)

Summary

PostgreSQL indexing is a powerful way to speed up your database queries. We've covered the main types of indexes:

  • B-tree indexes for general-purpose queries
  • BRIN indexes for large time-series data
  • GIST indexes for spatial data and full-text search
  • GIN indexes for JSON and array data

We've also learned how to measure performance with EXPLAIN ANALYZE and maintain indexes for optimal performance.

Remember, the goal isn't to add indexes everywhere, but to strategically place them where they'll have the most impact. Start by identifying your slowest queries, add appropriate indexes, and measure the results.

Happy indexing!