In PostgreSQL, we have 3 major maintenance task
- Vacuum
- Analyze
- Re-index
Vacuum:
- This process cleans the dead tuples in the files
- Due to MVCC architecture, each update and delete create internal duplicate record. This requires manual cleanup to avoid performance impact
- To delete the rows marked for delete from a physical file, we have to run the VACUUM command.
Example:
- To run on one table: Vacuum
- To run on all tables: Vacuum
To free the space at file level we have to run the "Vacuum Full" command. It Puts a full table lock on the table
Syntax:
- To free the space in one table: Vacuumfull
- To free the space in all tables: Vacuumfull
If you want the detailed output on executing vacuum command
- vacuum verbose
It is sometimes advisable to use the cost-based vacuum delay feature.
Analyze:
- It is done to collect the latest statistics on all tables
- Calculate stats on tables for a good query plan.
- stats are updated in pg_stats tables
Why we should run?
- Postgress follows a cost-based optimizer.
- The execution plan for the queries are prepared based on the statistics available on the tables
- Analyze commands gathers the statistics on the table.
Example:
- To run on one table in the database: analyze
- To run on all tables in the database: analyze
Instead, we do manual, it is automated using background process "autovaccum".
This process runs automatically and analyzes frequently on tables.
It decided on the basis of certain criteria.
Criteria:
- Autovaccum_naptime = 5min is default
- It checks which tables require to be vacuumed and clean those then validate which tables need to be analyzed and then analyze these tables.
- Autovacuum is having the least priority anywhere.
Which tables? How to decided
For Vacuum:
- AutoVaccum_Vaccum threshold:
- Specifies the minimum number of updated or deleted tuples needed to trigger a vacuum in any one table.
- Autovaccum_Vaccum_Scale_factor:
- Specifies a fraction of the table size to add to autovaccum_vaccum_threshold when deciding whether to trigger a VACCUM.
- The default is 0.2 (20% of table size)
For Analyze:
- Autovaccum_Analyzed_Threshold: Number of rows decided if it's ready for analysis
- Autovaccum_Analyze_Scalefactor: % of records. default is 01 .i.e 10%
How Many workers:
If one background process, it will take longer. we can give max_workers so the default is 3, We can increase workers to reduce duration but I/o and CPU load will increase.
Reindex:
- The index should be rebuilt if there is around 30% change in the parent table
- It helps in rebuilding indexes
- Re-index will reduce external fragmentation
- Syntax:
- Reindex (Index| Table|Database|SYSTEM} name [FORCE]
I feel there is a need to look for and find more about different aspects and many like Postgre SQL and other database complexities.
ReplyDeleteSSIS Postgresql Write