Maintenance (Vacuum, Analyze, Re-index) in PostgreSQL




In PostgreSQL, we have 3 major maintenance task 

  1. Vacuum
  2. Analyze
  3. Re-index

Vacuum: 
  1. This process cleans the dead tuples in the files
  2. Due to MVCC architecture, each update and delete create internal duplicate record.  This requires manual cleanup to avoid performance impact
  3. To delete the rows marked for delete from a physical file, we have to run the VACUUM command.


Example: 
  • To run on one table: Vacuum
  • To run on all tables: Vacuum 


To free the space at file level we have to run the "Vacuum Full" command. It Puts a full table lock on the table

Syntax:

  • To free the space in one table: Vacuumfull
  • To free the space in all tables: Vacuumfull


If you want the detailed output on executing vacuum command

  • vacuum verbose







It is sometimes advisable to use the cost-based vacuum delay feature.


Analyze: 


  • It is done to collect the latest statistics on all tables
  • Calculate stats on tables for a good query plan.
  • stats are updated in pg_stats tables


Why we should run?

  • Postgress follows a cost-based optimizer.
  • The execution plan for the queries are prepared based on the statistics available on the tables
  • Analyze commands gathers the statistics on the table.


Example:

  • To run on one table in the database: analyze
  • To run on all tables in the database: analyze



Instead, we do manual, it is automated using background process "autovaccum". 
This process runs automatically and analyzes frequently on tables.  
It decided on the basis of certain criteria.

Criteria:

  • Autovaccum_naptime = 5min  is default
  • It checks which tables require to be vacuumed and clean those then validate which tables need to be analyzed and then analyze these tables.
  • Autovacuum is having the least priority anywhere.


Which tables? How to decided

For Vacuum:

  • AutoVaccum_Vaccum threshold:  
    • Specifies the minimum number of updated or deleted tuples needed to trigger a vacuum in any one table.


  • Autovaccum_Vaccum_Scale_factor:  
    • Specifies a fraction of the table size to add to autovaccum_vaccum_threshold when deciding whether to trigger a VACCUM. 
    • The default is 0.2 (20% of table size)

For Analyze:


  • Autovaccum_Analyzed_Threshold: Number of rows decided if it's ready for analysis
  • Autovaccum_Analyze_Scalefactor:  % of records. default is 01 .i.e 10%



How Many workers: 
If one background process, it will take longer. we can give max_workers so the default is 3, We can increase workers to reduce duration but I/o and CPU load will increase.




Reindex: 


  • The index should be rebuilt if there is around 30% change in the parent table
  • It helps in rebuilding indexes
  • Re-index will reduce external fragmentation
  • Syntax:
    • Reindex (Index| Table|Database|SYSTEM} name [FORCE]

1 comment:

  1. I feel there is a need to look for and find more about different aspects and many like Postgre SQL and other database complexities.

    SSIS Postgresql Write


    ReplyDelete