Wednesday 6 August 2014

SQL Database Supect Mode due to MSDTC Transaction

A database may go into suspect mode for various reason most of them are related to corruption of disk or data but SQL server may also put a database in a suspect mode if it’s not able to recover the database properly on restart
We had one such scenario recently, On a restart of planned outage one of the database went in to Suspect mode and we saw the following error in the SQL error log,


2014-08-01 10:37:31.760 spid5s       Attempting to recover in-doubt distributed transactions involving Microsoft Distributed Transaction Coordinator (MS DTC). This is an informational message only. No user action is required.
2014-08-01 10:39:13.480 spid5s       SQL Server detected a DTC/KTM in-doubt transaction with UOW  {14EBFD4C-0955-4C25-91B1-C418EE182E64}.Please resolve it following the guideline for Troubleshooting DTC Transactions.
2014-08-01 10:39:13.480 spid5s       Error: 3437, Severity: 21, State: 3.
2014-08-01 10:39:13.480 spid5s       An error occurred while recovering database 'DBName'. Unable to connect to Microsoft Distributed Transaction Coordinator (MS DTC) to check the completion status of transaction (0:75299367). Fix MS DTC, and run recovery again.
2014-08-01 10:39:13.480 spid5s       Error: 3414, Severity: 21, State: 2.
2014-08-01 10:39:13.480 spid5s       An error occurred during recovery, preventing the database 'DBNAME' (6:0) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

We already know the application using the database uses MSDTC for various sources and any unexpected restart put an In-doubt transaction on DTC. Basically In-Doubt transactions are transaction which the DTC services cannot decide to abort or resolve by itself but since the UOW for the transaction exists
in SQL, SQLOS does not differentiate it aa transaction in In-doubt status and kill it and the transaction will be marked as active till is resolved or aborted from DTC. This is similar to the topic which I wrote earlier in the blog regarding “-2 SPID Blocking”.

when the database tries to recover a DB with UOW of in-doubt transaction on restart it could not rollback or roll forward the transaction so SQL puts database eventually in suspect mode failing to recover completely. You go Component Services->Computers->My Computer->Distributed transaction Coordinator->Local DTC->Transaction List for standalone servers or  Component Services->Computers->My Computer->Distributed transaction Coordinator->Clustered DTC->Transaction List for a cluster instance, Right Click on all transaction marked In-Doubt either commit or abort the transaction depending on your requirements. And restart the SQL Services which should recover the database clean.

But the clean way of doing a restart of a SQL with DTC transactions is to switch of the application before the SQL restart which will avoid this issue in the first place. Hope this article helps!