Every SQL server DBA must have face this question in there career at least once. Either it was asked during interview or it came with curiosity. Many of them would have thought to find out, some of them have tried searching and few of them must know why. And I am from the last category.
So let find out here by understanding these
- What is a log file
- What is the significance of log file for SQL engine
- Why database need log file
- Architecture of log file
- Pros and cons of log file
What is a log file?
Log file is a physical file requires satisfying one of the properties of Relational database systems. This property is called as Durability (ACID Properties). As per durability any transaction occurring over RDBMS system should be durable .i.e. Either complete or should be able to rollback and to rollback any transaction sql server should know what is done till now.
So Log files contain sql at-least to rollback any transaction in case of system failure.
Every SQL Server database has a transaction log that records all transactions and the database modifications made by each transaction. The transaction log must be truncated on a regular basis to keep it from filling up. However, some factors can delay log truncation
What is the significance of log file for SQL engine?
Sql server is bound with rules of RDBMS and for durability sql server engine have implemented protocol “Write Ahead Log” also called as WAL. And this protocol makes sure that at-least undo part of every transaction should first go to log file and then to data cache. So to implement WAL sql server uses log buffer, physical log file and data cache.
The transaction log is a critical component of the database and, if there is a system failure, the transaction log might be required to bring your database back to a consistent state. The transaction log should never be deleted or moved unless you fully understand the behavior
Why database need log file?
As discussed Log file write every transaction detail in log file before moving to data pages but still why separate file, why can’t it adjust all logs in same data file?
So here we have 2 basic reasons
1. If I keep both log and data in same file and there is corruption occur then even though I have full back up I can do point in time recovery
2. The basic idea of data file is to hold data and keep reusing whereas log file to keep data only till transaction is running ( durability) and once transaction is completed or (kept in log backup) it’s no longer required.
Architecture of log file:
1. Log file is made of VLF’s also called as virtual log file, each "chunk" that is added, is divided into VLFs at the time the log growth.
2. Log file is a circular linked list of VLF’s
3. Sql server writes transaction in log file serially.
4. If any database is having more than 1 log file then also, Second can’t be used, until first file is full
5. No. of vlfs in a log file can be identified by using command “DBCC LOGINFO”
6. VLF allows log file to get reused once all transactions in particular log file is committed or rolled back.
No. of VLF added on every growth in log file depends on below formula:
Chunks less than 64MB = 4 VLFs
Chunks of 64MB and less than 1GB = 8 VLFs
Chunks of 1GB and larger = 16 VLFs
Cons of log file:
1. Transactions written in log file are serial in nature, adding one more file will not help to increase I/O performance.
2. While the log is ‘growing’ then it is essentially locked, any processes trying to do insert/update/delete activity will block until the growth has completed.
3. Too many VLF can cause fragmentation (slow down database startup and also log backup and restore operations)
4. Few VLF with large size of VLF can cause VLF will remain active and can take much time to clear VLF
Minimum LSN for database recovery and the Minimum LSN for log truncation are not the same thing
Minimum LSN for log truncation: dbcc log (dbbname ,3)
Minimum LSN for database recovery: Indicates the starting timestamp for the oldest transaction that has not yet been committed. The number of log records reaches the number the Database Engine estimates it can process during the time specified in the recovery interval option.