Why I Vacuum My Tables Regularly

BatCat
4 min readJul 22, 2024

--

Improve performance and prevent data clutter with proactive maintenance.

The Postgres docs say that vacuuming will help “garbage-collect and optionally analyze a database”.

From the many options available, I tend to use VACUUM ANALYZE, FULL VACUUM AND VACUUM and will describe these options. So when to use which?

At the moment Postgres 16 is active and the following options exist:

VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]

where option can be one of:

FULL [ boolean ]
FREEZE [ boolean ]
VERBOSE [ boolean ]
ANALYZE [ boolean ]
DISABLE_PAGE_SKIPPING [ boolean ]
SKIP_LOCKED [ boolean ]
INDEX_CLEANUP { AUTO | ON | OFF }
PROCESS_MAIN [ boolean ]
PROCESS_TOAST [ boolean ]
TRUNCATE [ boolean ]
PARALLEL integer
SKIP_DATABASE_STATS [ boolean ]
ONLY_DATABASE_STATS [ boolean ]
BUFFER_USAGE_LIMIT size

and table_and_columns is:

table_name [ ( column_name [, ...] ) ]

Vacuum Full

Full vacuum will help you reclaim space. It seemingly reorganizes the table but under the hood, it creates a new table, into which copies all data. During the process we cannot use the table because this operation puts an exclusive lock on the objects inside of the table. This means that all other operations (reads and writes) are blocked while the full vacuum is in progress. This is why it is best to select a time when user activity is low. When the new table is ready, all operations are directed towards this new table.

Performing full vacuum is necessary when lots of write operations like delete and update happen.

Full vacuum will return the freed up space to the operating system.

When I used it: the ETL tool I built synced data from MongoDB to Postgres. Occasionally, all tables would be updated, with the freshest data. No downtime was allowed and gigabytes of data needed to be synced. Many rows were updated, however, I saw that the space the table claimed doubled. I ran VACUUM FULL once per day for the most important and most frequently used tables, resulting with the table size going back to normal.


import psycopg2

def vacuum_full(table_name):
conn = psycopg2.connect(dbname="my_db", user="my_pg_user", password="my_pass", host="127.0.0.1")
cur = conn.cursor()
cur.execute(f"VACUUM FULL {table_name};")
conn.commit()
cur.close()
conn.close()

if __name__ == "__main__":
vacuum_full_table("my_schema.my_table_name_to_vacuum")

CAUTION!
Running VACUUM FULL without specifying the table name will run a full database vacuum. Stopping it needs to be done carefully:

-- Find the process
SELECT pid, query, state
FROM pg_stat_activity
WHERE query LIKE 'VACUUM FULL%';

-- Cancel the process:
SELECT pg_cancel_backend(pid)
FROM pg_stat_activity
WHERE query LIKE 'VACUUM FULL%';

-- Terminate the Process if cancel is not working:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE query LIKE 'VACUUM FULL%';

-- Run Vacuum
VACUUM shema_name.table_name;

After stopping a full vacuum, the temporary space used for the copy of the table will be released, along with locks that were acquired. This should allow the table to be accessed normally, just like before the full vacuum operation was started.

It is possible that the table remains partially compacted or that some space was freed up if VACUUM FULL did not complete all of its tasks. If the operation completes fully, more space will be reclaimed. This additional space reclamation might only be visible after running another VACUUM FULL on the table.

If you don’t have enough disk space, the VACUUM FULL operation can still be started but it will result with a failure. Luckily, the original table is intact and unchanged. No changes will be committed and the original table remains as it was before the operation started.

Vacuum Analyze

Using this operation, the DBMS collects statistics and marks deleted rows that can be freed up and reused. We can say that by running this operation we can ensure efficient query execution. As part of regular maintenance, we should always run this operation, but also after write operations, especially after modifying a larger part of the database, like bulk inserts. In case a database crash happens or query execution becomes slow, this operation can help improve the query planner’s decisions and replace outdated statistics with new ones.

VACUUM ANALYZE does not reduce the table size and does not put an exclusive lock on the objects inside of the table.

When I used it: During the day, I would run VACUUM ANALYZE after the most frequently updated and most frequently used tables finished syncing like user, order, while country, user_type, address were usually skipped.

import psycopg2

def vacuum_analyze():
conn = psycopg2.connect(dbname="my_db", user="my_pg_user", password="my_pass", host="127.0.0.1")
cur = conn.cursor()
cur.execute("VACUUM ANALYZE;")
conn.commit()
cur.close()
conn.close()

if __name__ == "__main__":
vacuum_analyze()

Vacuum

Standard vacuuming is part of the routine maintenance. It scans through one or multiple tables to identify unwanted rows. Rows that were updated or deleted can accumulate over time and waste disk space. VACUUM updates statistics and removes these rows called “dead” tuples and lets you reclaim some of the space. There is minimal locking, it does not block other database operations and allows concurrent reads and writes.

Standard vacuum does not return the freed up space to the operating system.

It can be used on specific tables, but also it can be ran on the database level like this:

import psycopg2

def vacuum():
conn = psycopg2.connect(dbname="my_db", user="my_pg_user", password="my_pass", host="127.0.0.1")
cur = conn.cursor()
cur.execute("VACUUM;")
conn.commit()
cur.close()
conn.close()

if __name__ == "__main__":
vacuum()

When I used it: VACUUM is less efficient than VACUUM FULL and has little impact on the database performance. It was part of my regular housekeeping.

--

--

BatCat
BatCat

Written by BatCat

My mission is to share some of the solutions I find during my journey as a data engineer. I mostly write about PostgreSQL and Python.

No responses yet