Factors Causing Latency in Replication

Latency in Replication depends on many factors I have mentioned the most common factors below

  1. Deadlocks
  2. Blocking
  3. Resource contention
  4. Location of distributor
  5. Extra indices or triggers on the subscription table
  6. Communication link between distributor and publisher(Network)
  7. CPU and memory contention at distributor
  8. CPU, memory, and disk contention at publisher
  9. Cleanup of replication tables
  10. Rebuild index and Update stats on Me_repl tables
  11. Bigger T-Log size of the publisher
  12. Huge batch jobs running

I have try to explain these in detail below for better understanding.

Deadlocks: Deadlock is something which can kill you replication from publisher to subscriber and sql server have to re-initiate transaction. Unless you find RCA and fix it , this issue will re-occur.

Blocking: Blocking is consider good, because it makes sure there is no deadlock but some times it becomes worse when replication is blocked by high priority transactions. Ideally you should have notifications in your server for blocking more than a min.

Location of distributor: If the publisher is an intensively busy server. We should consider using a remote distributor here we are using local distributor

Extra indices or triggers on the subscription table: Many people use the subscription database for other purposes, such as reporting or DSS type of activities. Sometimes extra indices or triggers are added to the subscription tables. This can slow down the distribution agent’s rate of applying commands at the subscriber, and can cause long latencies during peak hours.

Communication link between distributor and subscriber: When there are large number of commands to be applied from the distribution database to the subscriber, or if we need to have low latency, it is important that the communication link between the distributor and subscriber is reliable and high speed.

CPU and memory contention : latency will be more if any processor or memory contentions on the server

Cleaning up replicated transactions from the distribution database :It is advisable to test and cleanup the replicated commands as soon as they are replicated .Default is 72 hours but it can be decreased as per the tests on the production instance .

Rebuilding Indexes and updating stats on Msrepl_commands and msrepl_transactions : These two tables  are the most heavily used tables and thus we need to make sure that these tables are periodically reindexed and statistics are updated for these two tables in a timely fashion .

CPU, memory, and disk contention at subscriber: We need to watch for any processor, memory or disk contentions at subscriber. Customers should especially watch out the case that subscriber’s resources become the bottleneck and thus “throttling back” the data flow from distributor to subscriber.

Other activities at subscriber: Some times we need to do maintenance work at the subscriber, such as index rebuilding, de-fragmentation, or consistency check (DBCC). Such activities will demand system resources and may block or slow down command delivery to the subscription database.

No comments:

Post a Comment