Introduction
As web applications scale and data volumes grow, backend systems often face database bottleneck issues like query latencies.
Running join (JOIN) operations or complex search queries on tables with hundreds of thousands of records without proper index optimizations can cause database CPU spikes, leading to slow response times for end users.
Designing database indices is a powerful way to address these performance bottlenecks. This article explains how database indices work, details B-Tree structures, and shares guidelines for designing effective indices.
1. Why Do Indices Speed Up Queries?
Think of a database index as the index section at the back of a textbook.
Without an index, searching for a specific topic requires you to scan every page from start to finish. In databases, this is called a Full Table Scan (All Table Scan). As data volumes increase, this linear scanning process takes longer.
By creating an index on specific columns, the database organizes the data in a sorted structure and records references (pointers) to the physical rows. This allows the query engine to find the target records quickly.
2. The Mechanics of B-Tree Indices
Relational databases (like MySQL and PostgreSQL) commonly use the B-Tree (Balanced Tree) Index structure.
B-Trees organize keys into a balanced tree layout consisting of a root node, intermediate nodes, and leaf nodes.
[Root Node]
/ \
[100] [200]
/ \ / \
[...] [...] [...] [Leaf Nodes] -> References to physical rows
How Searches are Executed
- The search query begins at the top-level Root Node.
- The query engine compares the target key value with the node values to determine which child branch pointer to follow.
- The process repeats down to the Leaf Nodes, which contain pointers to the physical rows on the disk (like ROWIDs), allowing the engine to fetch the matching record.
- The Benefit: No matter how large the table grows, the search complexity scales logarithmically as
O(log N). This keeps lookup times fast, typically resolving in under a few milliseconds.
3. Three Guidelines for Designing Database Indices
While indices improve search speeds, creating them on every column is counterproductive. Every write operation (INSERT, UPDATE, DELETE) requires updating the corresponding index structures, which adds overhead that can slow down write performance.
Use these three guidelines to decide when to create an index:
Guideline 1: Target Columns with High Cardinality (Selectivity)
Cardinality refers to the number of unique values stored in a column.
- High Cardinality (Recommended for indexing):
Columns like
user_id,email, orserial_number, where values are unique or rarely repeat. - Low Cardinality (Not recommended for indexing):
Columns like
genderorstatusflags. Indexing a column where most values are duplicates is inefficient, as the query engine will often revert to a full table scan to retrieve the data.
Guideline 2: Respect Left-to-Right Ordering in Composite Indices
When creating a Composite Index (an index spanning multiple columns), the order of the columns in the definition is critical.
For example, if you define an index as INDEX (category_id, created_at):
- The index will optimize queries like:
WHERE category_id = 5(matches the first column)WHERE category_id = 5 AND created_at > '2025-01-01'(matches both columns) - The index will not optimize queries like:
WHERE created_at > '2025-01-01'(bypasses the first column)
Composite indices are only effective when queries match columns from left to right as defined in the index.
Guideline 3: Index Join Keys and Sorting Columns
In addition to search criteria (WHERE clauses), index foreign keys used in join operations (JOIN ON) and columns used for sorting (ORDER BY). This helps the database engine retrieve sorted data directly from the index, avoiding expensive memory-based sorting operations.
Conclusion
Designing database indices is a key step in backend performance tuning.
- Analyze query execution plans to identify columns with high cardinality for indexing.
- Design composite indices matching the left-to-right order of your query filters.
- Avoid redundant indices to keep write performance optimal.
Use the EXPLAIN statement in your SQL client during development to verify that your queries are using the indices as intended.
