Max connection pool size reached
Published: 11/19/2012 10:56:54 PM
Some time ago I was told to troubleshoot a problem in one application (ASP.NET 2.0). The problem seemed to occur randomly, and when it occurred the application would halt and begin generate the following error:
Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
It seemed that the connection pool was maxed out, but I couldnt understand why. Considering the current load on the application it would have been impossible to reach the maximum limit on the connection pool, which is 100 by default. After some quick searches on the internet I found that one common cause could be that the database connections weren't closed properly, but in this case a framework was used to communicate with the database so we didn't actually open or close the connections ourself.
So what do you do in these situations? You turn to Windbg of course! Once the problem occurred the next time, a memory dump was taken before the application was restarted. The first thing I did was to confirm the problem by determining the amount of objects in the connection pool. Please note that the output in the Windbg command prompt have been abbreviated to focus on the relevant information.
0:000> !dumpheap -stat -type DbConnectionPool
MT Count TotalSize Class Name
000007fef2ceff20 1 176 System.Data.ProviderBase.DbConnectionPool
0:000> !dumpheap -mt 000007fef2ceff20
Address MT Size
000000013f448668 000007fef2ceff20 176
total 1 objects
0:000> !do 000000013f448668
Name: System.Data.ProviderBase.DbConnectionPool
Fields:
MT Field Offset Type VT Attr Value Name
000007fef9dfed78 4001551 98 System.Int32 1 instance 100 _totalObjects
By looking at the DbConnectionPool object in memory I could see that it contained 100 pooled objects (_totalObjects). 100 is the default maximum pool size in .NET, but to be sure that it hadn't been modified somewhere I decided to verify that as well. This information can be found in the DbConnectionPoolGroupOptions object.
0:000> !dumpheap -stat -type System.Data.ProviderBase.DbConnectionPoolGroupOptions
MT Count TotalSize Class Name
000007fef2cf0a58 1 40 System.Data.ProviderBase.DbConnectionPoolGroupOptions
0:000> !dumpheap -mt 000007fef2cf0a58
Address MT Size
000000013f52fd08 000007fef2cf0a58 40
total 1 objects
0:000> !do 000000013f52fd08
Name: System.Data.ProviderBase.DbConnectionPoolGroupOptions
Fields:
MT Field Offset Type VT Attr Value Name
000007fef9dfed78 400159b c System.Int32 1 instance 100 _maxPoolSize
_maxPoolSize is 100, which confirms that we actually have maxed out the connection pool. But why? To answer that question I decided to dig deeper into the memory dump and try to see the SQL queries that are being executed. It turns out that we can access the SQL query by looking at the _outBuff of the TdsParserStateObject. _outBuff is a byte array, so to be able to take a peek at it I will have to write it to a file using .writemem. I randomly picked one object, determined the address (000000014752a238) and size (0x1f58 bytes) of the array and wrote the content to a textfile:
0:000> !dumpheap -stat -type System.Data.SqlClient.TdsParserStateObject
MT Count TotalSize Class Name
000007fef2cf46a0 100 27200 System.Data.SqlClient.TdsParserStateObject
0:000> !dumpheap -mt 000007fef2cf46a0
Address MT Size
0000000147524cb8 000007fef2cf46a0 272
(...99 more...)
Total 100 objects
0:000> !do 0000000147524cb8
Name: System.Data.SqlClient.TdsParserStateObject
Fields:
MT Field Offset Type VT Attr Value Name
000007fef9dffb48 4001bdb 28 System.Byte[] 0 instance 000000014752a238 _outBuff
0:000> !do 000000014752a238
Name: System.Byte[]
Size: 8024(0x1f58) bytes
0:000> .writemem c:\temp\000000014752a238.txt 000000014752a238 L?0x1f58
Writing 1f58 bytes....
By repeating the steps above I could verify that all objects were executing the same query. Hmm, interesting...and suspicious. After scanning the code for occurrences that could have generated the query I ended up with two possible suspects: one normal method and one method that executed in a COM+ transaction. It was not obvious which one was causing the problems, but finally I decided to go for the transactional method. The main reason for my choice was the _isInStasis flag that I noticed in the System.Data.SqlClient.SqlInternalConnectionTds object during the dump analysis. I don't know for sure what that flag actually means, but from the little information I gathered from the internet I got the impression that it is related to transactions in some way (for example that the object is waiting for a transaction to complete).
So, transactions... The application were using COM+ transactions, which is done by having a transaction class derive from the ServicedComponent class. Transactions can be commited in three ways: by calling the SetComplete/SetAbort methods in the method, by decorating the method with the AutoComplete attribute, or by doing nothing at all (which is causing .NET to commit the trancation at some point). Normally the AutoComplete attribute would have been used but for some reason it wasn't being used on this particular method. Instead it wasn't doing anything to commit the transaction explicitly, so .NET had to do it implicitly.
Unfortunately, by leaving it to .NET to commit the transaction we are not guaranteed that it will be done immediately. Microsoft even warns that the default commit "can decrease the performance of your application" (see Voting in an Automatic Transaction). Upon discovering this I quickly added the AutoComplete attribute to the method, and the problem has not reoccured since then.
The importance of AutoComplete in COM+ transactions have become quite clear now. What AutoComplete does is that the method will automatically commit the transaction once it returns, unless an unhandled exception is thrown in which case the transaction is rollbacked. By using AutoComplete we get better control over the transaction and resources will be released quicker. I assume that calling SetComplete and SetAbort would yield the same result, but the neat thing about AutoComplete is that it will happen automatically.