PostgreSQL Indexing Explained: A Practical Guide for Faster Queries
Learn how B-tree, GIN, partial, and composite indexes work in PostgreSQL, when to use each type, and how to avoid the indexing mistakes that slow production databases.
PostgreSQL Indexing Explained: A Practical Guide for Faster Queries
Slow queries rarely announce themselves with a flashing error. They arrive as p95 latency creeping upward, dashboards timing out, and engineers reaching for SELECT * workarounds. In PostgreSQL, the first durable fix is almost always understanding indexing: which access paths the planner can use, how indexes are stored on disk, and how your schema choices help or hurt the optimizer.
This guide walks through index types, read patterns they serve, and a repeatable workflow for adding indexes without guessing.
How PostgreSQL Uses Indexes
When you run a query, the planner estimates cost for several strategies: sequential scan, index scan, bitmap index scan, and index-only scan when the visibility map allows it. An index is a separate data structure—usually a B-tree—that maps indexed column values to heap tuple locations (TIDs).
Without a useful index, PostgreSQL reads every row in a heap page range. That is fine for small tables or when you need most rows anyway. Once a table grows past a few hundred thousand rows and your queries filter on narrow subsets, sequential scans become expensive.
-- See what the planner would do (no execution)
EXPLAIN (COSTS OFF)
SELECT id, email
FROM users
WHERE email = 'alex@example.com';
A plan showing Seq Scan on users on a million-row table is a signal—not always wrong, but worth investigating if this query runs often.
B-tree: The Default Workhorse
CREATE INDEX without a method creates a B-tree index. B-trees support equality and range comparisons: =, <, >, BETWEEN, IN, IS NULL, and pattern matches with a suitable operator class for LIKE 'prefix%'.
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_orders_created_at ON orders (created_at DESC);
Column order matters in composite B-tree indexes. The leftmost column must appear in predicates for later columns to be used efficiently (the classic left-prefix rule).
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
-- Uses index well: filter on user_id
SELECT * FROM orders WHERE user_id = 42 AND status = 'shipped';
-- May not use idx_orders_user_status efficiently:
-- no equality/range on user_id
SELECT * FROM orders WHERE status = 'shipped';
Hash, GIN, GiST, and BRIN
PostgreSQL offers specialized access methods:
- Hash — equality only; rarely needed because B-tree handles
=well. - GIN — inverted indexes for arrays,
jsonb, full-text search. - GiST — geometric data, exclusion constraints, some full-text setups.
- BRIN — block-range summaries for very large, naturally ordered tables (timestamps in append-only logs).
-- JSONB containment
CREATE INDEX idx_events_payload ON events USING GIN (payload jsonb_path_ops);
SELECT * FROM events
WHERE payload @> '{"type": "checkout"}';
-- Time-series append log
CREATE INDEX idx_logs_created_brin ON logs USING BRIN (created_at);
Choose GIN when you query inside documents or arrays. Choose BRIN when rows are inserted in roughly sorted order and you scan wide time ranges.
Composite, Covering, and Partial Indexes
Composite Indexes for Multi-Column Filters
Design composites to match real query shapes: equality columns first, then range, then columns used only for sorting if the index can avoid a sort step.
CREATE INDEX idx_invoices_customer_date
ON invoices (customer_id, issued_at DESC);
Index-Only Scans and INCLUDE
PostgreSQL 11+ supports covering indexes with INCLUDE columns stored in the index leaf but not part of the search key:
CREATE INDEX idx_users_email_covering
ON users (email) INCLUDE (display_name, created_at);
If the query only needs email, display_name, and created_at, the planner may satisfy it entirely from the index—fewer heap fetches, lower I/O.
Partial Indexes for Skewed Data
When a minority of rows matter for hot queries, index only those rows:
CREATE INDEX idx_users_active_email
ON users (email)
WHERE deleted_at IS NULL;
Partial indexes are smaller, cheaper to maintain, and often faster than indexing the full table when your WHERE clause always matches the predicate.
Unique, Expression, and Functional Indexes
Unique indexes enforce constraints and speed lookups:
CREATE UNIQUE INDEX idx_accounts_slug ON accounts (slug);
Expression indexes index computed values:
CREATE INDEX idx_users_lower_email ON users (lower(email));
SELECT * FROM users WHERE lower(email) = lower('user@example.com');
Without the expression index, a function on email prevents a plain btree on email from being used.
Measuring Impact Before and After
Never add indexes in production without validating plans on realistic data volumes.
EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.total
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.email = 'alex@example.com'
AND o.status = 'pending'
ORDER BY o.created_at DESC
LIMIT 20;
Watch for:
- Actual rows vs estimated rows — large gaps suggest stale statistics; run
ANALYZE. - Buffers: shared hit/read — high read counts imply cache pressure or missing indexes.
- Sort or Hash Join steps that disappear after indexing.
Pro Tips for Index Hygiene
- Run
ANALYZEafter bulk loads or large migrations so the planner has accurate row counts. - Use
pg_stat_user_indexesto find indexes withidx_scan = 0over weeks—candidates for removal. - Prefer
CREATE INDEX CONCURRENTLYin production to avoid blocking writes during builds. - Keep fillfactor at default unless you have heavy
UPDATEchurn on indexed columns; then consider lowering it on specific indexes to delay page splits. - Rebuild bloated indexes with
REINDEX INDEX CONCURRENTLYduring maintenance windows when bloat metrics justify it.
Common Indexing Mistakes
Indexing every column. Each index slows INSERT, UPDATE, and DELETE because PostgreSQL must maintain index tuples. Audit unused indexes regularly.
Wrong column order in composites. Putting a low-cardinality column first (for example status before user_id) often produces plans that scan large index ranges.
Indexing low-selectivity columns alone. A boolean is_active index rarely helps unless combined with other predicates or used as a partial index condition.
Functions on indexed columns in WHERE. WHERE date(created_at) = '2025-05-01' cannot use a plain index on created_at; use a range on the raw column or an expression index.
Ignoring duplicate indexes. Two indexes on (user_id) and (user_id, created_at) may overlap; the composite often subsumes the single-column index for user_id lookups—verify with EXPLAIN before dropping.
Building indexes during peak traffic without CONCURRENTLY. Standard CREATE INDEX takes SHARE lock that blocks writes; use concurrent builds and monitor progress in pg_stat_progress_create_index.
A Practical Indexing Workflow
- Capture the slow query from logs or
pg_stat_statements. - Run
EXPLAIN (ANALYZE, BUFFERS)on staging with production-like statistics. - Identify filter, join, and order-by columns.
- Propose the smallest index that matches predicate order (consider partial or
INCLUDE). - Create concurrently, re-analyze, compare plans and latency.
- Schedule review of
idx_scanafter two to four weeks.
SELECT
schemaname,
relname AS table,
indexrelname AS index,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;
Index Maintenance and Bloat
Indexes age like tables: updates and deletes leave dead tuples in the heap and can leave index bloat when pages split or fill unevenly. Monitor index size relative to table size and watch for queries that suddenly regress after months of stable performance.
SELECT
indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
AND relname = 'orders';
When bloat is confirmed, schedule REINDEX INDEX CONCURRENTLY on the affected index rather than rebuilding the entire table unless heap bloat is equally severe. Pair reindex operations with VACUUM (ANALYZE) on the parent table so the planner sees accurate page densities afterward.
Hot tables with heavy UPDATE on indexed columns benefit from thoughtful fillfactor tuning on the most contentious indexes—trading a little space for fewer splits during write spikes. Document every non-default fillfactor in your migration notes so future engineers understand why the setting exists.
Conclusion
PostgreSQL indexing is not a checklist of “add index on foreign keys” rituals. It is alignment between how data is queried and how B-tree and specialized structures expose ordered, selective access paths. Start with evidence from EXPLAIN, favor partial and covering indexes when they match real filters, and treat unused indexes as technical debt with a measurable write and storage cost.
When you internalize left-prefix composite rules, expression indexes for transformed columns, and the difference between GIN and B-tree, most “mysterious” slow queries become predictable engineering work—and your database stops being the hidden bottleneck behind an otherwise healthy application.
Index review ritual
Monthly, pull unused and duplicate indexes from pg_stat_user_indexes. Validate selective predicates still match query plans after data growth. When migrations add columns, ask immediately whether partial indexes need updates. Document index purpose in migration comments—future you will forget why a composite exists. Indexes are not free storage; they are contracts with your write path.
Index review ritual
Monthly, pull unused and duplicate indexes from pg_stat_user_indexes. Validate selective predicates still match query plans after data growth. When migrations add columns, ask immediately whether partial indexes need updates. Document index purpose in migration comments—future you will forget why a composite exists. Indexes are not free storage; they are contracts with your write path.
Workshop: apply this week
Pick one idea from this article and ship it before Friday. Write a short internal note explaining what changed, what metric you expect to move, and how you will verify the result. Share the note with your team so the learning compounds. If the experiment fails, document the failure mode—it is as valuable as success for the next engineer reading this guide.
Frequently asked questions
- How many indexes should a PostgreSQL table have?
- There is no fixed number. Add indexes for predicates and joins you run frequently, but remember every index slows writes and consumes disk. A table with five to eight well-chosen indexes is common; dozens of unused indexes usually hurt more than they help.
- Does CREATE INDEX CONCURRENTLY block writes?
- It avoids long exclusive locks on the table for reads and writes, but it takes longer and can fail if a duplicate key violates a unique index. Retry failed concurrent builds after fixing data issues.
- When should I use a partial index?
- Use a partial index when queries always filter on the same condition, such as status = 'active' or deleted_at IS NULL. The index stays smaller and faster because it only stores matching rows.
Comments
Discussion is coming soon. Share this article and join the conversation on social media.
Enjoyed this article?
Get weekly engineering guides delivered to your inbox.