Improve performance and prevent data clutter with proactive maintenance
In the previous article, I described various VACUUM operations in PostgreSQL, including VACUUM FULL, VACUUM ANALYZE, and standard VACUUM, highlighting their specific use cases and benefits. VACUUM FULL reclaims disk space by creating a new table and copying data into it, making it suitable for tables with frequent updates and deletions but requiring exclusive locks. VACUUM ANALYZE collects statistics and marks deleted rows for reuse, optimizing query execution without locking the table. Standard VACUUM removes dead tuples and updates statistics with minimal locking, making it ideal for routine maintenance. I also provided examples of Python scripts to automate these VACUUM operations using the psycopg2
library.
Now it is time for a few more VACUUM commands to shine!
At the moment Postgres 16 is active and the following options exist:
VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
where option can be one of:
FULL [ boolean ]
FREEZE [ boolean ]
VERBOSE [ boolean ]
ANALYZE [ boolean ]
DISABLE_PAGE_SKIPPING [ boolean ]
SKIP_LOCKED [ boolean ]
INDEX_CLEANUP { AUTO | ON | OFF }
PROCESS_MAIN [ boolean ]
PROCESS_TOAST [ boolean ]
TRUNCATE [ boolean ]
PARALLEL integer
SKIP_DATABASE_STATS [ boolean ]
ONLY_DATABASE_STATS [ boolean ]
BUFFER_USAGE_LIMIT size
and table_and_columns is:
table_name [ ( column_name [, ...] ) ]
Vacuum Verbose
If you want to understand the behavior of the VACUUM process, adding the VERBOSE option will print out additional information about the operation as it progresses. This option also can help with troubleshooting.
import psycopg2
def vacuum_verbose(table_name):
conn = psycopg2.connect(dbname="my_db", user="my_pg_user", password="my_pass", host="127.0.0.1")
cur = conn.cursor()
cur.execute(f"VACUUM VERBOSE {table_name};")
conn.commit()
cur.close()
conn.close()
if __name__ == "__main__":
vacuum_verbose("my_schema.my_table_name_to_vacuum")
Vacuum Freeze
We all have data that is rarely modified, for example, historical data. We might not want to vacuum all the entries in those tables every time. There is an option to skip such rows by using the VACUUM FREEZE command.
import psycopg2
def vacuum_freeze(table_name):
conn = psycopg2.connect(dbname="my_db", user="my_pg_user", password="my_pass", host="127.0.0.1")
cur = conn.cursor()
cur.execute(f"VACUUM FREEZE {table_name};")
conn.commit()
cur.close()
conn.close()
if __name__ == "__main__":
vacuum_freeze("my_schema.my_table_name_to_vacuum")
PostgreSQL tracks all changes to rows a transaction makes. Transactions can be uniquely identified with a 32-bit integer called the transaction ID or XID. Each row version is saved by Postgres, along with the transaction ID that created or last modified the row. XIDs will increment with each new transaction. This means that after a few billion transactions, this ID will be reset to 0 which is called the XID wraparound.
In PostgreSQL, rows (or tuples) have an age. The age is determined by the number of transactions that have occurred since the most recent write operation (insert or update) on that row:
Row Age = Current XID − Latest Write XID
Where:
- Current XID is the transaction ID of the current transaction.
- Latest Write XID is the transaction ID of the most recent transaction that inserted or updated the row. If old XIDs were reused, PostgreSQL would have difficulty distinguishing between very old and very new transactions, and might treat old data as if it were new. It could happen that Postgres returns incorrect query results.
If you have never needed to use the VACUUM FREEZE
command manually, it's because PostgreSQL manages XID wraparound effectively through its built-in processes. Autovacuum and transaction ID logging operate by default, causing you to see wraparound issues very rarely.
Configuration parameters also help manage XID wraparound. For example, autovacuum_freeze_max_age
specifies the maximum age of a table’s rows before an automatic VACUUM FREEZE
operation is triggered. This parameter is set to 200 million by default.
This way very old tuples are marked as frozen, reducing the potential issues with XID reuse and maintaining data integrity.
When you run VACUUM the next time, the frozen rows will not be vacuumed again. The process will focus on active data, and this way the amount of work the vacuum process needs to do can be reduced. Subsequent vacuum operations will be faster.
So why would you run VACUUM FREEZE if Postgres runs it for you?
PostgreSQL’s autovacuum process helps manage bloat by cleaning up dead rows, but it might not always run frequently enough in high-write environments. This can lead to excessive bloat. In such cases, manual interventions can be used. These actions help reclaim disk space and optimize performance. Autovacuum is here to help but regular maintenance and understanding how to tune the autovacuum settings can save resources. And by that, I do not only mean disk space and RAM but also CPU usage, network bandwidth, backup processes, maintenance windows, operational costs, and even developer time.
Or let’s just simply say: MONEY.