Thursday, September 17, 2009

Windbg Not Getting the Attention it Needs: Doesn’t Work With 2.x

If you’ve never used Windbg, you can probably stop reading – this isn’t for you.

This is for the few, the proud and the brave, banging their heads trying to figure out why CPUs are suddenly spiking; it’s especially for those still supporting legacy apps with literally thousands of DataTables in memory.

With the help of SOS (aka “Son of Strike”), SOSEX and the inherent traps in large-scale .NET deployments (60% CPU is considered normal), what was once an esoteric tool is gaining more and more traction. And because it’s predicated on the fact that you will always have issues that only surface in production (with inadequate instrumentation to tell the whole story), its significance is unlikely to diminish anytime soon.

Windbg is the be-all and end-all in production troubleshooting and it’s surprising that it’s not receiving the attention that it deserves from the CLR team.

There are literally 2 posts describing the use of the !DumpDataTables command. (DataTables are particularly significant as they’re an easy spot to catch uncached data-access or uncapped queries returning 10,000+ rows.) Alik and Tom, both presumably escalation engineers at Microsoft, detail case studies that demonstrate use of the command [1]. But what’s only mentioned in the comments of Alik’s post (can also be derived from Tom’s post through the use of .do –v which is replaced by –da for 2.x) is that !DumpDataTables isn’t available with SOS (externally) for 2.x assemblies as yet.

(For those counting, yes, we’re months into the .NET 4 preview and only a few months away from the 4 year anniversary of the 2.0 release – suffice it to say, it’s not coming.)

!DumpASPNETCache, !DumpDataTables, !DumpAllExceptions, !FindDebugTrue, !ASPXPages, !DumpConfig, !DumpHttpContext and a host of others [6] aren’t available for 2.0 assemblies:

image

“Doesn’t work with 2.x” is all you get. And to add insult to injury, the latest CLR10\SOS available keeps flashing the ‘tip’ banner telling you how wonderful these commands are.

You might be thinking who cares, you can always script this [3] [4][6]; but even if the output of DumpAllExceptions is equivalent to Tess’ script below, it misses the point:

.foreach (ex {!dumpheap -type Exception -short}){.echo "********************************";!pe –nested ${ex} }

In my experience, add-on scripts take longer to execute, have offsets that need to be uniquely configured, are more verbose and muddied in output and error out frequently when some written text info is displayed upon command execution. (Try Tess’ script for dumping out current and recent ASPX requests, for instance.)

The point is: if a SOS.dll with support for 2.x assemblies has been in use for years by the support team, why not release it on CodePlex without support or warranty and save users endless hours of frustration in having to sift through workarounds?

And why hasn’t the market filled the gap for GUI tools for Windbg, along the lines of SOS Assist, that simplify this [8] instead of leaving us with more of the same? Such a tool needs to be a core citizen of .NET right alongside Visual Studio and the Reflector.

But what about SOSEX?

Definitely, SOSEX is indispensible: its support for displaying value types and revealing the full stack are basic prerequisites for any problem. But it doesn’t provide the commands above; and modernizing Windbg is going to require more than one developer.

Some other notes that might help…

Start with !address open:

image001

How to read this:

  • ~1.5GB out of 2GB (1578940 KB) (max is 32-bit) is in use;
  • ~500MB (518148) available (MEM_FREE)
  • Largest contiguous block is 46,368 => memory already fragmented!
  • RegionUsageIsVAD (.NET allocates here ~1.1GB)

More on the implications of the largest contiguous block being less than 64Mb [9]:

Large Object Heap

You should also pay attention to large objects, which lead to memory fragmentation. When .NET based application (as ASP.NET) needs to allocate new memory, it does so looking for (and reserving) chunks of 64 Mb free and contiguous memory: that is the key of this matter. When the Garbage Collector does its job and frees memory for unused (better say, unreachable) objects, it also tries to compact the chunks of memory to have it contiguous for future use, but this is not always possible: when the GC runs, it must temporarily stop all application threads, move around chunks of memory, update memory pointers and then the application can continue it’s job. But for performance reasons this can’t be done with objects larger than 85 Kb, which are seen as big objects from the CLR point of view and are allocated on a special heap, called the Large Object Heap. Objects in LOH are still collected and the memory freed, but since moving around such large chunks of memory is very expensive (and remember during this operation the application is frozen and can’t respond to client’s requests) and would require too much time and effort to the system, at the end the application would be too badly affected. The GC frees that memory but does not compact it, and this leaves some holes in our memory (like a Swiss cheese, if you like it).

Multiply this process during the life of the application and you could end up having lot of free memory but so fragmented (I’ve seen dumps where we still had 80% of available memory that the biggest contiguous free chunk of memory was just 50 Mb, too small for the 64 Mb needed), than the CLR could do nothing else but throw an OutOfMemoryException. Of course if you have lots of big objects you’ll have more chances to run into this problem.

This article has a quite detailed description of GC internals: “Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework” http://msdn.microsoft.com/msdnmag/issues/1100/gci/ and part2 http://msdn.microsoft.com/msdnmag/issues/1200/GCI2/default.aspx.

Update #1: It looks like at least a few others raised this issue in 2006 and 2007 to no avail.

Update #2: John Robbins, Wintellect, echoes the sentiment [9]: “This is especially important to .NET 2.0 because Microsoft dropped huge amounts of functionality in the SOS that ships with .NET 2.0 so digging through those production mini dumps was a complete carpel tunnel inducing exercise.”

References:

[1] – !DumpTables references from MS:
http://blogs.msdn.com/tom/archive/2007/12/26/high-memory-continued.aspx http://blogs.msdn.com/alikl/archive/2009/03/09/windbg-walkthrough-dump-values-of-dataset-or-datatable.aspx

[2] – Sample case:
http://rasmuskl.dk/post/A-WinDbg-Debugging-Journey-NHibernate-Memory-Leak.aspx

[3] – Reading values from a DataTable: http://labs.episerver.com/en/Blogs/Johano/Dates/2008/3/WinDBGSOS-Getting-at-the-values-in-a-DataTable/

[4] – On deprecation of !do address –v in 2.x and automation in Windbg: http://blogs.msdn.com/vijaysk/archive/2008/01/15/windbg-scripting-dump-data-column-names-from-a-table.aspx

[5] – Evolution of !da command (some history): http://blogs.msdn.com/shawnfa/archive/2004/04/30/124218.aspx

[6] – Manual Exceptions Summary:
http://blogs.msdn.com/tess/archive/2009/04/16/net-exceptions-quick-windbg-sos-tip-on-how-to-dump-all-the-net-exceptions-on-the-heap.aspx

[7] – !help output (latest CLR10\SOS from WinDbg 6.11.0001.404 x86):

Object Inspection Examining code and stacks

DumpObj (do)
DumpAllExceptions (dae) DumpStackObjects (dso)
DumpHeap (dh)
DumpVC
GCRoot
ObjSize
FinalizeQueue DumpDynamicAssemblies (dda) DumpField (df)
TraverseHeap (th)
GCRef

Threads (t)
CLRStack
IP2MD
U
DumpStack
EEStack
GCInfo
COMState
X
SearchStack

Examining CLR data structures Diagnostic Utilities

DumpDomain
EEHeap
Name2EE
SyncBlk
DumpASPNETCache(dac)
DumpMT
DumpClass
DumpMD
Token2EE
EEVersion
DumpSig
DumpModule
ThreadPool(tp)
ConvertTicksToDate(ctd)
ConvertVTDateToDate(cvtdd) RWLock
DumpConfig
DumpHttpRuntime DumpSessionStateConfig DumpBuckets
DumpHistoryTable DumpRequestTable
DumpCollection(dc)
DumpDataTables
GetWorkItems DumpLargeObjectSegments(dl) DumpModule
DumpAssembly DumpMethodSig DumpRuntimeTypes
PrintIPAddress
DumpHttpContext DumpXmlDocument(dxd)

VerifyHeap(vh) DumpLog FindAppDomain
SaveModule
SaveAllModules(sam)
GCHandles
GCHandleLeaks FindDebugTrue FindDebugModules
Bp
ProcInfo
StopOnException(soe)
TD
Analysis
Bl
CheckCurrentException(cce) CurrentExceptionName(cen) ExceptionBp
FindTable LoadCache
SaveCache
ASPXPages DumpGCNotInProgress CLRUsage

[8] – SOS Assist:
http://old.thinktecture.com/SOSAssist/Screenshots.htm

[9] – Large Object Heap:
http://blogs.msdn.com/carloc/archive/2006/09/25/770314.aspx

[10] – John Robbins On SOSEX:
http://www.wintellect.com/cs/blogs/jrobbins/archive/2007/06/19/great-sosex-a-phenomenal-net-debugging-extension-to-see-the-hard-stuff-steve-johnson-is-my-hero.aspx
http://msdn.microsoft.com/en-us/magazine/cc164138.aspx

3 comments:

kevin goff said...

I've seen log files from MS support where they were able to run !aspxpages and get the output from our memory dumps that were 2.x. Of course internally when I load the SOS from windbg it doesnt work. So MS has it but they dont share :*(

Ben said...

Hello

Perhaps you know it already (this is an old article) but for completeness anybody stumbles over this article here the solution

google "psscor2" and throw sos.dll away.

Regards
Ben

Nariman Haghighi said...

Thanks Ben, yes, I should really update this post!