Why I Vacuum My Tables — Part 3

Improve performance and prevent data clutter with proactive maintenance

BatCat
4 min readAug 11, 2024

In the previous articles, I described various VACUUM operations in PostgreSQL.

Part 1 focuses VACUUM FULL, VACUUM ANALYZE, and standard VACUUM, highlighting their use cases: VACUUM FULL reclaims disk space by creating a new table, making it suitable for frequently updated tables but requiring exclusive locks; VACUUM ANALYZE optimizes query execution by collecting statistics without locking the table; and standard VACUUM removes dead tuples with minimal locking, ideal for routine maintenance.

Part 2 focuses on the VACUUM FREEZE and VACUUM VERBOSE. VACUUM FREEZE is used to mark very old rows as frozen to prevent issues like XID wraparound, especially in tables with rarely modified data, reducing the workload of future vacuum operations. VACUUM VERBOSE provides detailed output during the vacuum process, aiding in troubleshooting and understanding the operation’s behavior.

I also provided examples of Python scripts to automate these VACUUM operations using the psycopg2 library.

Now it is time for a few lesser known VACUUM commands to shine!

At the moment Postgres 16 is active and the following options exist:

VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
where option can be one of:    FULL [ boolean ]
FREEZE [ boolean ]
VERBOSE [ boolean ]
ANALYZE [ boolean ]
DISABLE_PAGE_SKIPPING [ boolean ]
SKIP_LOCKED [ boolean ]
INDEX_CLEANUP { AUTO | ON | OFF }
PROCESS_MAIN [ boolean ]
PROCESS_TOAST [ boolean ]
TRUNCATE [ boolean ]
PARALLEL integer
SKIP_DATABASE_STATS [ boolean ]
ONLY_DATABASE_STATS [ boolean ]
BUFFER_USAGE_LIMIT size
and table_and_columns is: table_name [ ( column_name [, ...] ) ]

Vacuum Only Database Stats

VACUUM (ONLY_DATABASE_STATS) my_table;

When query performance degrades, one possible cause can be outdated statistics. This situation is often indicated when there are no dead tuples left to remove. Accurate and up-to-date statistics about data distribution are a must-have for the query planner to make optimal decisions. You can refresh the statistics to help the query planner generate better query execution plans.

How is that different from vacuum analyze?

VACUUM ANALYZE does the vacuuming first and then collects statistics on the contents of the vacuumed table.

VACUUM ONLY_DATABASE_STATS focuses only on updating the statistics, making it faster because it doesn’t spend time on removing dead tuples and reclaiming storage.

Vacuum Disable Page Skipping

VACUUM (DISABLE_PAGE_SKIPPING) my_table;

PostgreSQL Page

A page is a fundamental unit of storage used to manage and organize data on disk. The page size defaults to 8 KB, and it the smallest unit of data storage that PostgreSQL reads from or writes to disk. Each page contains a portion of the table’s data or index information. By organizing data into pages, I/O operations, caching, and transaction management can be more efficient.

A page consists of the header, data area, some free space and the footer. I won’t go into too much detail, but structure is the following:

+--------------------------+
| Header |
| - Page Number |
| - Page Flags |
| - Next/Previous Pointers |
+--------------------------+
| Data Area |
| - Tuple 1 |
| (Row Data) |
| - Tuple 2 |
| (Row Data) |
| - Tuple 3 |
| (Row Data) |
| ... |
| - Tuple N |
| (Row Data) |
+--------------------------+
| Free Space |
| - Available for New Tuples|
| - (Unused Space) |
+--------------------------+
| Footer |
| - Checksum |
| - Other Metadata |
+--------------------------+

The VACUUM DISABLE_PAGE_SKIPPING command in PostgreSQL forces a thorough scan of the table. It makes sure that every page is examined, even those that might be skipped.

Standards vacuuming can skip pages based on heuristics like the number of dead tuples or the age of the page.

Data integrity is essential in scientific research, space exploration, healthcare, financial services, retail, e-commerce, etc.VACUUM DISABLE_PAGE_SKIPPING can be useful in such environments because it guarantees that no data pages are overlooked. For example, if you suspect table corruption or have experienced data integrity issues, running this command will cause the data to be examined thoroughly.
It’s also beneficial when a significant number of updates or deletions have occurred in a table, especially if these changes are not evenly distributed across that table.

Example

Table state before vacuuming with uneven dead tuple distribution, some pages have a higher percentage of dead rows:

Table: sales_data

+---------+---------+---------+---------+
| Page 1 | Page 2 | Page 3 | Page 4 |
+---------+---------+---------+---------+
| Data A | Data B | Data C | Data D |
| Dead Tuples: 50% | Dead Tuples: 20% | Dead Tuples: 30% | Dead Tuples: 10% |
+---------+---------+---------+---------+

Standard vacuuming might skip Page 1 and 3 if they fall below the threshold for vacuuming, resulting with:

+---------+---------+---------+---------+
| Page 1 | Page 2 | Page 3 | Page 4 |
+---------+---------+---------+---------+
| Data A | Data B | Data C | Data D |
| Dead Tuples: 0% | Dead Tuples: 10% | Dead Tuples: 0% | Dead Tuples: 0% |
+---------+---------+---------+---------+

Vacuuming with page skipping disabled will ensure that every page is cleaned, regardless of their initial condition:

+---------+---------+---------+---------+
| Page 1 | Page 2 | Page 3 | Page 4 |
+---------+---------+---------+---------+
| Data A | Data B | Data C | Data D |
| Dead Tuples: 0% | Dead Tuples: 0% | Dead Tuples: 0% | Dead Tuples: 0% |
+---------+---------+---------+---------+

--

--

BatCat
BatCat

Written by BatCat

My mission is to share some of the solutions I find during my journey as a data engineer. I mostly write about PostgreSQL and Python.