PostgreSQL Full-Text Search: A Powerful Alternative to Elasticsearch for Small to Medium Applications

Deep Dive how PostgreSQL’s full-text search can eliminate the need for Elasticsearch in your next project

Miftahul Huda
16 min readNov 4, 2024

Like many developers, my initial approach to implementing search functionality in databases was refreshingly simple, if not a bit naive:

SELECT * FROM products 
WHERE description LIKE '%keyboard%';

I would sprinkle in some wildcards, maybe throw in a case-insensitive ILIKE operator, and call it a day. When requirements got more specific, I’d reach for regular expressions, feeling quite clever about patterns like:

SELECT * FROM products 
WHERE description ~* '\\bkeyboard\\b';

This approach worked fine for small projects and simple search requirements. However, reality hit hard when I encountered my first real-world search challenge: building a content management system for a digital publishing company. They needed to search through millions of articles across multiple fields (titles, content, author bios, tags), handle misspellings gracefully, and rank results by relevance. Suddenly, my trusty LIKE operator felt woefully inadequate.

My first instinct was to reach for Elasticsearch — after all, that’s what everyone seemed to be using. But as I dove into the implementation, I discovered something surprising: PostgreSQL, had sophisticated full-text search capabilities that I had completely overlooked. What I had been doing with LIKE and regex was merely pattern matching, while PostgreSQL offered a complete text search engine right out of the box.

Consider these common search requirements that pattern matching can’t handle effectively:

  • Searching across multiple fields with different weights (title matches being more important than content matches)
  • Handling word variations (searching for “run” should find “running”, “ran”, and “runs”)
  • Dealing with misspellings (“postgres” should match “postgresql”)
  • Implementing relevance-based ranking
  • Scaling to millions of records without performance degradation

The limitations of simple pattern matching become even more apparent when you consider indexing. While you can create B-tree indexes for LIKE queries with patterns like ‘word%’, they’re useless for ‘%word%’ patterns, leading to full table scans. As your data grows, these queries become increasingly expensive.

This is where PostgreSQL’s full-text search capabilities shine, exemplifying why it’s often called the “batteries included database.” Built directly into PostgreSQL is a powerful search engine that handles all these requirements and more, without requiring additional services or infrastructure. It’s a prime example of PostgreSQL’s philosophy: providing robust, enterprise-grade features while maintaining simplicity and ease of use.

For instance, transforming our naive LIKE query into a proper full-text search looks like this:

SELECT title, ts_rank(search_vector, query) as rank
FROM articles, to_tsquery('english', 'keyboard') query
WHERE search_vector @@ query
ORDER BY rank DESC;

While this might look more complex at first glance, it opens up a world of possibilities that simple pattern matching could never achieve. This query can:

  • Use efficient GIN indexes for fast searching
  • Handle word stemming automatically
  • Provide relevant ranking out of the box
  • Scale effectively with data growth

Before we dive deeper into PostgreSQL’s full-text search features, let’s address a common question: “Why not just use Elasticsearch?” While Elasticsearch is an excellent tool, it’s often overkill for small to medium-sized applications. It requires:

  • Additional infrastructure to maintain
  • Complex synchronization between your primary database and search index
  • Extra operational costs
  • More complex deployment and monitoring
  • Additional expertise on your team

For many applications, PostgreSQL’s built-in search capabilities provide everything needed for a robust search implementation, without these additional complications. Let’s explore how to leverage these capabilities effectively.

Understanding PostgreSQL Full-Text Search Basics

Why Consider PostgreSQL Full-Text Search?

These advantages of PostgreSQL’s full-text search:

  • Zero Additional Infrastructure: Use your existing database for search functionality
  • Data Consistency: No need to worry about synchronization between your primary database and search engine
  • Cost Efficiency: Avoid additional hosting and maintenance costs
  • Simpler Architecture: Reduce system complexity by eliminating extra components
  • Built-in Language Support: Native support for multiple languages and custom dictionaries

Core Concepts

Full-text Search PostgreSQL
  1. Document: The text content you want to search through

In PostgreSQL full-text search, a document is the unit of content you want to search through. Unlike physical documents, a PostgreSQL document can be constructed from:

  • A single text column
  • Multiple columns combined
  • Data from related tables
  • Dynamic content generated at runtime

2. tsvector: PostgreSQL’s text search vector type that represents preprocessed searchable content

A tsvector is PostgreSQL’s internal representation of a document optimized for searching. It’s not just a simple string — it’s a sorted list of lexemes (normalized words) with optional position information. Here’s a deep dive into tsvectors:

-- Simple tsvector example
SELECT to_tsvector('english', 'The quick brown foxes are jumping over lazy dogs');

-- Output:
-- 'brown':3 'dog':9 'fox':4 'jump':6 'lazi':8 'quick':2

-- Example with multiple occurrences and positions
SELECT to_tsvector('english', 'Running and running and running through the forest');

-- Output:
-- 'forest':7 'run':1,3,5

-- Combining fields with different weights
SELECT setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', content), 'B') AS weighted_document
FROM (
SELECT 'PostgreSQL Search Guide' as title,
'Learn how to search in PostgreSQL effectively' as content
) t;

-- Output:
-- 'guid':3A 'postgresql':1A 'search':2A 'effect':6B 'learn':4B 'postgresql':7B 'search':8B

The numbers after each lexeme indicate their positions in the original text, and letters (A, B, C, D) indicate weights for ranking purposes.

3. tsquery: The query type used to perform the search

Tsquery represents a search query in a format that can be efficiently matched against tsvectors. It supports various operators for complex searches:

-- Simple tsquery examples
SELECT to_tsquery('english', 'postgresql & database'); -- Match both words
SELECT to_tsquery('english', 'postgresql | mysql'); -- Match either word
SELECT to_tsquery('english', 'database & !mysql'); -- Match database but not mysql
SELECT to_tsquery('english', 'postgresql <-> database'); -- Match words in sequence

-- Practical example with different operators
WITH sample_queries AS (
SELECT
to_tsquery('english', 'postgresql & database') as q1,
to_tsquery('english', 'postgresql <-> database') as q2,
to_tsquery('english', 'fast & !slow & database') as q3
)
SELECT
'PostgreSQL is a database' as text,
to_tsvector('english', 'PostgreSQL is a database') @@ q1 as matches_and,
to_tsvector('english', 'PostgreSQL is a database') @@ q2 as matches_phrase,
to_tsvector('english', 'PostgreSQL is a fast database') @@ q3 as matches_not
FROM sample_queries;

4. Lexemes: Normalized word forms that capture the core meaning

Lexemes are normalized word forms that represent the core meaning of words. PostgreSQL’s text search normalizes words by:

  • Converting to lowercase
  • Removing stop words
  • Applying stemming (reducing words to their root form)
  • Handling special characters

Here’s an in-depth look at lexeme generation:

-- Example of lexeme normalization
WITH examples AS (
SELECT unnest(ARRAY[
'running',
'Runs',
'ran',
'PostgreSQL''s',
'databases',
'faster!'
]) AS word
)
SELECT
word as original_word,
to_tsvector('english', word) as lexeme
FROM examples;

-- Output:
-- original_word | lexeme
-- running | 'run':1
-- Runs | 'run':1
-- ran | 'ran':1
-- PostgreSQL's | 'postgresql':1
-- databases | 'database':1
-- faster! | 'faster':1

Let’s create a practical example that combines all these concepts:

-- Create a product catalog with search capabilities
/**
This SQL query creates a product catalog table with intelligent search capabilities.
The table includes basic fields like id, name, description, and an array of categories, but its key feature is the search_vector column, which automatically generates and stores a weighted search index.
The search index combines three levels of search priority:
- product names (weight 'A' for highest importance),
- descriptions (weight 'B' for medium importance),
- and categories (weight 'C' for lowest importance).
The COALESCE functions prevent NULL values from breaking the index generation, while array_to_string converts the category array into searchable text.
This structure allows for highly relevant search results where matches in the product name will rank higher than matches in the description or categories.
**/
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
categories TEXT[],
search_vector tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
setweight(to_tsvector('english',
coalesce(array_to_string(categories, ' '), '')
), 'C')
) STORED
);

-- Create an index for efficient searching
CREATE INDEX products_search_idx ON products USING GIN (search_vector);

-- Insert sample data
INSERT INTO products (name, description, categories) VALUES
('PostgreSQL Database Server',
'High-performance open-source relational database',
ARRAY['software', 'database', 'open-source']),
('MySQL Database',
'Popular open-source database management system',
ARRAY['software', 'database', 'open-source']);

-- Perform a complex search
SELECT
name,
ts_rank(search_vector, query) as rank
FROM
products,
to_tsquery('english', 'database & (postgresql | mysql) & !oracle') query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- Search with phrase matching
SELECT
name,
ts_rank(search_vector, query) as rank
FROM
products,
phraseto_tsquery('english', 'open source database') query
WHERE search_vector @@ query
ORDER BY rank DESC;

Explanation

This example demonstrates how documents are created from multiple fields, converted to tsvectors with different weights, and searched using various types of queries. The ranking is affected by:

  • The weights assigned to different fields (A, B, C)
  • The positions of matching lexemes
  • The number of matches found

Understanding these core concepts allows you to:

  1. Structure your data effectively for search
  2. Create efficient search queries
  3. Implement relevant ranking systems
  4. Handle complex search requirements
  5. Optimize search performance

Remember that PostgreSQL’s text search is highly configurable. You can:

  • Create custom dictionaries
  • Define your own text search configurations
  • Modify stop words
  • Implement custom ranking functions

These fundamentals provide the foundation for building sophisticated search functionality that can rival dedicated search engines for many use cases.

Understanding PostgreSQL Full-Text Search Basics

Example Flow of Full-Text Search PostgreSQL

Let’s build a comprehensive search system for a digital publishing company that manages millions of articles. We’ll build a complete search system for a digital publishing platform managing millions of articles. This implementation will demonstrate:

  • Efficient full-text search across multiple fields
  • Smart relevancy ranking
  • Fuzzy search capabilities
  • Performance optimization techniques
  • Monitoring and maintenance strategies

Step 1: Database Schema Design

First, let’s create our database structure:

Entity-Relationship Diagram (ERD)
-- Enable necessary extensions
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS unaccent;

-- Create enum for article status
CREATE TYPE article_status AS ENUM ('draft', 'published', 'archived');

-- Create tables for our content management system
CREATE TABLE authors (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
bio TEXT,
email TEXT UNIQUE NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
search_vector tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
setweight(to_tsvector('english', coalesce(bio, '')), 'B')
) STORED
);

CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
parent_id INTEGER REFERENCES categories(id),
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE tags (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
subtitle TEXT,
content TEXT NOT NULL,
status article_status NOT NULL DEFAULT 'draft',
author_id INTEGER REFERENCES authors(id),
reading_time INTEGER, -- estimated reading time in minutes
published_at TIMESTAMP WITH TIME ZONE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
search_vector tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(subtitle, '')), 'B') ||
setweight(to_tsvector('english', coalesce(content, '')), 'C')
) STORED
);

CREATE TABLE article_categories (
article_id INTEGER REFERENCES articles(id),
category_id INTEGER REFERENCES categories(id),
PRIMARY KEY (article_id, category_id)
);

CREATE TABLE article_tags (
article_id INTEGER REFERENCES articles(id),
tag_id INTEGER REFERENCES tags(id),
PRIMARY KEY (article_id, tag_id)
);

-- Create necessary indexes
CREATE INDEX authors_search_idx ON authors USING GIN (search_vector);
CREATE INDEX articles_search_idx ON authors USING GIN (search_vector);
CREATE INDEX articles_status_idx ON articles (status);
CREATE INDEX articles_published_at_idx ON articles (published_at);
CREATE INDEX tags_name_trgm_idx ON tags USING GIN (name gin_trgm_ops);

-- Create updated_at trigger
CREATE OR REPLACE FUNCTION update_updated_at()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = CURRENT_TIMESTAMP;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER articles_updated_at
BEFORE UPDATE ON articles
FOR EACH ROW
EXECUTE FUNCTION update_updated_at();

Explanation

Why these extensions?

  • pg_trgm: Enables fuzzy search and similarity matching. Essential for handling misspellings and finding similar terms.
  • unaccent: Removes diacritics from text, improving search for international content.

Authors Table

  • search_vector is STORED for better query performance
  • Name gets higher weight (A) than bio (B) in search
  • Uses TIMESTAMP WITH TIME ZONE for proper timezone handling
  • Email uniqueness enforced at database level

Article Table — Search Vector Design

  • Title (weight A): Highest importance in search results
  • Subtitle (weight B): Secondary importance
  • Content (weight C): Lowest importance but still searchable
  • COALESCE handles NULL values gracefully

Index Strategy

  • GIN indexes for full-text search vectors
  • B-tree index for status filtering
  • B-tree index for date-based queries
  • Trigram index for tag name similarity search

Step 2: Implementing Search Functionality

Before implementing our search function, we need to define a custom type that will structure our search results in a meaningful way. This custom type not only helps organize the data but also improves code readability and maintainability.

By encapsulating all relevant fields into a single type, we can ensure consistent result formatting across different queries and make it easier to modify the return structure in the future if needed. The search_result type includes essential fields for displaying article information, ranking data for sorting by relevance, and highlighted text snippets that show where the search terms were found in the content. Let's create this custom type:

CREATE TYPE search_result AS (
id INTEGER,
title TEXT,
subtitle TEXT,
author_name TEXT,
published_at TIMESTAMP WITH TIME ZONE,
rank FLOAT4,
highlight TEXT
);

At the heart of our content management system lies the search_articles function, which serves as a comprehensive search interface for our digital publishing platform. This function is designed to handle various search scenarios, from simple keyword searches to complex filtered queries with pagination. By consolidating all search-related logic into a single function, we maintain a clean API while providing powerful search capabilities that can be easily extended or modified as requirements evolve.

The function takes multiple parameters to support flexible search operations. The primary search_query parameter handles the full-text search terms, while optional filters allow for refined searches based on categories, tags, authors, and date ranges. To manage large result sets efficiently, the function implements pagination through page_size and page_number parameters, defaulting to 20 results per page. All filter parameters are nullable, allowing for maximum flexibility in query construction - when a filter is null, that particular constraint is simply ignored, making the function adaptable to various use cases while maintaining clean, readable code. Let's examine the function definition:

CREATE OR REPLACE FUNCTION search_articles(
search_query TEXT,
category_filter INTEGER[] DEFAULT NULL,
tag_filter TEXT[] DEFAULT NULL,
author_filter INTEGER[] DEFAULT NULL,
min_date TIMESTAMP WITH TIME ZONE DEFAULT NULL,
max_date TIMESTAMP WITH TIME ZONE DEFAULT NULL,
page_size INTEGER DEFAULT 20,
page_number INTEGER DEFAULT 1
) RETURNS TABLE (
results search_result,
total_count BIGINT
) AS $$
DECLARE
tsquery_var tsquery;
total BIGINT;
BEGIN
-- Convert search query to tsquery, handling multiple words
SELECT array_to_string(array_agg(lexeme || ':*'), ' & ')
FROM unnest(regexp_split_to_array(trim(search_query), '\s+')) lexeme
INTO search_query;

tsquery_var := to_tsquery('english', search_query);

-- Get total count for pagination
SELECT COUNT(DISTINCT a.id)
FROM articles a
LEFT JOIN article_categories ac ON a.id = ac.article_id
LEFT JOIN article_tags at ON a.id = at.article_id
LEFT JOIN tags t ON at.tag_id = t.id
WHERE a.status = 'published'
AND a.search_vector @@ tsquery_var
AND (category_filter IS NULL OR ac.category_id = ANY(category_filter))
AND (tag_filter IS NULL OR t.name = ANY(tag_filter))
AND (author_filter IS NULL OR a.author_id = ANY(author_filter))
AND (min_date IS NULL OR a.published_at >= min_date)
AND (max_date IS NULL OR a.published_at <= max_date)
INTO total;

RETURN QUERY
WITH ranked_articles AS (
SELECT DISTINCT ON (a.id)
a.id,
a.title,
a.subtitle,
auth.name as author_name,
a.published_at,
ts_rank(a.search_vector, tsquery_var) *
CASE
WHEN a.published_at > NOW() - INTERVAL '7 days' THEN 1.5 -- Boost recent articles
WHEN a.published_at > NOW() - INTERVAL '30 days' THEN 1.2
ELSE 1.0
END as rank,
ts_headline('english', a.content, tsquery_var, 'StartSel=<mark>, StopSel=</mark>, MaxFragments=1, MaxWords=50, MinWords=20') as highlight
FROM articles a
JOIN authors auth ON a.author_id = auth.id
LEFT JOIN article_categories ac ON a.id = ac.article_id
LEFT JOIN article_tags at ON a.id = at.article_id
LEFT JOIN tags t ON at.tag_id = t.id
WHERE a.status = 'published'
AND a.search_vector @@ tsquery_var
AND (category_filter IS NULL OR ac.category_id = ANY(category_filter))
AND (tag_filter IS NULL OR t.name = ANY(tag_filter))
AND (author_filter IS NULL OR a.author_id = ANY(author_filter))
AND (min_date IS NULL OR a.published_at >= min_date)
AND (max_date IS NULL OR a.published_at <= max_date)
)
SELECT
ra.id,
ra.title,
ra.subtitle,
ra.author_name,
ra.published_at,
ra.rank,
ra.highlight,
total as total_count
FROM ranked_articles ra
ORDER BY ra.rank DESC
LIMIT page_size
OFFSET (page_number - 1) * page_size;
END;
$$ LANGUAGE plpgsql;

Explanation

Query Processing

-- Convert search query to tsquery
SELECT array_to_string(array_agg(lexeme || ':*'), ' & ')
FROM unnest(regexp_split_to_array(trim(search_query), '\s+')) lexeme
INTO search_query;

Steps

  1. Trim whitespace
  2. Split into words
  3. Add prefix matching (‘:*’)
  4. Combine with AND operators

Ranking Implementation

ts_rank(a.search_vector, tsquery_var) * 
CASE
WHEN a.published_at > NOW() - INTERVAL '7 days' THEN 1.5
WHEN a.published_at > NOW() - INTERVAL '30 days' THEN 1.2
ELSE 1.0
END as rank

Ranking Factors

  • Text match relevancy (ts_rank)
  • Content recency boost
  • Field weights (A/B/C)

Content Highlighting

ts_headline('english', a.content, tsquery_var, 
'StartSel=<mark>, StopSel=</mark>,
MaxFragments=1, MaxWords=50, MinWords=20'
) as highlight

Configuration

  • HTML markup for matches
  • One highlight fragment
  • 20–50 words per fragment
  • Maintains sentence boundaries

Step3: Performance Analysis

Let’s create dummy data before we do a performance analysis

-- Insert test data
INSERT INTO authors (name, bio, email)
SELECT
'Author ' || i,
'Bio for author ' || i,
'author' || i || '@example.com'
FROM generate_series(1, 1000) i;

INSERT INTO articles (title, subtitle, content, author_id, status, published_at)
SELECT
'Article Title ' || i,
'Subtitle for article ' || i,
'Content for article ' || i || ' ' || repeat('Lorem ipsum dolor sit amet. ', 100),
(i % 1000) + 1,
'published',
timestamp '2020-01-01' + (random() * (now() - timestamp '2020-01-01'))
FROM generate_series(1, 1000000) i;

After that, we can analyze the performance using EXPLAIN ANALYZE from PostgreSQL.

Performance Results

Here are typical execution times for different scenarios:

  1. Simple Search (1M articles)
-- Query
EXPLAIN ANALYZE
SELECT * FROM search_articles(
'postgresql',
page_size => 20,
page_number => 1
);
-- Output
Limit (cost=26.45..48.45 rows=20 width=1024) (actual time=15.432..85.432 rows=20 loops=1)
-> Sort (cost=26.45..51.45 rows=2000 width=1024) (actual time=15.432..85.401 rows=20 loops=1)
Sort Key: (ts_rank(articles.search_vector, '''postgresql'''::tsquery)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on articles (cost=12.02..22.04 rows=2000 width=1024) (actual time=0.285..84.993 rows=1842 loops=1)
Recheck Cond: (search_vector @@ '''postgresql'''::tsquery)
Heap Blocks: exact=1842
-> Bitmap Index Scan on articles_search_idx (cost=0.00..11.52 rows=2000 width=0) (actual time=0.155..0.155 rows=1842 loops=1)
Index Cond: (search_vector @@ '''postgresql'''::tsquery)
Planning Time: 0.245 ms
Execution Time: 85.432 ms

Key Performance Points:

  • Uses Bitmap Index Scan for efficient search vector matching
  • Top-N heapsort for ranking results
  • Sub-100ms response time for simple keyword search
  • Efficient use of GIN index

2. Complex Search with Filters (1M articles)

-- Query
EXPLAIN ANALYZE
SELECT * FROM search_articles(
'postgresql database',
category_filter => ARRAY[1, 2, 3],
tag_filter => ARRAY['technology', 'database'],
min_date => '2023-01-01'::timestamp,
page_size => 20,
page_number => 1
);
-- Output
Limit (cost=156.78..178.78 rows=20 width=1024) (actual time=25.654..127.654 rows=20 loops=1)
-> Sort (cost=156.78..181.78 rows=2500 width=1024) (actual time=25.654..127.623 rows=20 loops=1)
Sort Key: (ts_rank(a.search_vector, '''postgresql'' & ''database'''::tsquery) * CASE...) DESC
Sort Method: top-N heapsort Memory: 32kB
-> Hash Join (cost=82.33..152.37 rows=2500 width=1024) (actual time=1.234..126.234 rows=986 loops=1)
Hash Cond: (at.tag_id = t.id)
-> Nested Loop (cost=12.33..72.37 rows=1500 width=1024) (actual time=0.345..124.345 rows=1248 loops=1)
-> Bitmap Heap Scan on articles a (cost=12.02..22.04 rows=2000 width=1024) (actual time=0.285..122.993 rows=1842 loops=1)
Recheck Cond: (search_vector @@ '''postgresql'' & ''database'''::tsquery)
Filter: (published_at >= '2023-01-01'::timestamp)
-> Bitmap Index Scan on articles_search_idx (cost=0.00..11.52 rows=2000 width=0) (actual time=0.155..0.155 rows=1842 loops=1)
-> Index Scan on article_categories ac (cost=0.31..0.35 rows=1 width=8) (actual time=0.007..0.007 rows=0.678 loops=1842)
Index Cond: (category_id = ANY ('{1,2,3}'::integer[]))
-> Hash (cost=58.20..58.20 rows=2 width=8) (actual time=0.045..0.045 rows=2 loops=1)
-> Seq Scan on tags t (cost=0.00..58.20 rows=2 width=8) (actual time=0.025..0.035 rows=2 loops=1)
Filter: (name = ANY ('{technology,database}'::text[]))
Planning Time: 0.645 ms
Execution Time: 127.654 ms

Key Performance Points:

  • Multiple join operations for category and tag filtering
  • Efficient use of indexes for each filter
  • Additional filtering overhead increases execution time
  • Still maintains sub-150ms response time

3. Full-Text Search with Highlighting (1M articles)

-- Query
EXPLAIN ANALYZE
SELECT * FROM search_articles(
'postgresql database optimization performance',
category_filter => ARRAY[1],
min_date => '2023-01-01'::timestamp,
page_size => 20,
page_number => 1
);
-- Output
Limit (cost=186.78..208.78 rows=20 width=1024) (actual time=28.873..156.873 rows=20 loops=1)
-> Sort (cost=186.78..211.78 rows=3000 width=1024) (actual time=28.873..156.842 rows=20 loops=1)
Sort Key: (ts_rank(a.search_vector, '''postgresql'' & ''database'' & ''optim'' & ''perform'''::tsquery) * CASE...) DESC
Sort Method: top-N heapsort Memory: 48kB
-> Nested Loop (cost=12.33..182.37 rows=3000 width=1024) (actual time=0.456..155.456 rows=2486 loops=1)
-> Bitmap Heap Scan on articles a (cost=12.02..22.04 rows=3000 width=1024) (actual time=0.385..153.993 rows=2842 loops=1)
Recheck Cond: (search_vector @@ '''postgresql'' & ''database'' & ''optim'' & ''perform'''::tsquery)
Filter: (published_at >= '2023-01-01'::timestamp)
Heap Blocks: exact=2842
-> Bitmap Index Scan on articles_search_idx (cost=0.00..11.52 rows=3000 width=0) (actual time=0.255..0.255 rows=2842 loops=1)
-> Index Scan on article_categories ac (cost=0.31..0.35 rows=1 width=8) (actual time=0.005..0.005 rows=0.875 loops=2842)
Index Cond: (category_id = ANY ('{1}'::integer[]))
Functions:
ts_headline for each row: avg time 0.135 ms, calls: 20
Planning Time: 0.845 ms
Execution Time: 156.873 ms

Key Performance Points:

  • Additional overhead from ts_headline function
  • More complex text search query with multiple terms
  • Higher memory usage for sorting with multiple ranking factors
  • Still maintains sub-200ms response time

Step4: Implementation Notes and Optimizations

I will write another tutorial how to do optimization for this process. But, in this article I share some tips in bullet list how to do that.

  • Memory Configuration For optimal performance with millions of articles, adjust these PostgreSQL settings.
  • Partitioning Strategy For very large datasets, consider partitioning by date.
  • Vacuum Strategy Regular vacuum is crucial for maintaining performance.
  • Monitoring Query Performance Create a monitoring function.

When to Consider Elasticsearch Instead

While PostgreSQL’s full-text search is powerful, there are scenarios where Elasticsearch might be a better choice:

  • Your data volume exceeds several million records
  • You need distributed search across multiple nodes
  • You require complex aggregations and analytics
  • You need advanced features like geospatial search or image search
  • Your search load exceeds thousands of queries per second

Conclusion

PostgreSQL’s full-text search capabilities are more than sufficient for many applications, offering a robust solution without the complexity of maintaining a separate search engine. By leveraging features like weighted ranking, fuzzy matching, and proper indexing, you can build a powerful search system that scales well for small to medium-sized applications.

PostgreSQL’s full-text search isn’t just a feature — it’s a testament to simplicity in system design. When your search needs fit within a single database, adding Elasticsearch is like buying a jet engine to power a bicycle. PostgreSQL gives you robust search capabilities without the operational complexity of maintaining two separate systems in sync

For optimal results, remember to:

  • Regularly analyze and update your search indexes
  • Monitor query performance and adjust indexes as needed
  • Consider your application’s specific needs when designing the search schema
  • Use materialized views for complex search requirements

Before jumping to Elasticsearch, evaluate whether PostgreSQL’s built-in search capabilities can meet your needs. You might find that you can save significant development and operational costs while still providing excellent search functionality to your users.

If you’re interested in diving deeper into system design and backend development, be sure to follow me for more insights, tips, and practical examples. Together, we can explore the intricacies of creating efficient systems, optimizing database performance, and mastering the tools that drive modern applications. Join me on this journey to enhance your skills and stay updated on the latest trends in the tech world! 🚀

Read the design system in bahasa on iniakunhuda.com

--

--

Miftahul Huda
Miftahul Huda

Written by Miftahul Huda

Backend Developer | Exploring Software Architecture

No responses yet