Wednesday, June 17, 2009

Disable TCP Chimney to Address Sporadic ViewState Exceptions (2009)

What might seem like an exotic issue has turned up twice for me over the last few years, in completely different environments – and both times, heavy troubleshooting finally demonstrated inconsistencies that had no other explanation.

If you’ve stumbled across this post, it’s likely too late.

If only we had the benefit of forewarnings [1] on disabling TCP Off-loading Engine (TOE) in Windows 2003 as a precursor to troubleshooting connectivity issues:

“I've become quick to consider disabling the TCP off-loading engine features (or “TOE”) when trouble-shooting a problem involving IIS 6.0 and detecting the slightest hint of a "networking problem" or "communication problem." With the right combination of NIC, NIC driver level, and OS level, TOE is a good thing which significantly improves the performance of TCP processing. Windows can offload the TCP processing of some network streams to the network controller. But under some circumstances (especially with outdated NIC drivers) strange networking problems can result.”

What’s less documented (try a search for TCP Chimney + ViewState corruption, or variations thereof), is that a host of seemingly ViewState-related exceptions can surface as a result of TOE interference, including:

  • Invalid ViewState: Missing field: __VIEWSTATE? (post ViewState chunking)
  • Invalid character in a Base-64 string
  • Invalid length for a Base-64 char array
  • The serialized data is invalid
  • Invalid ViewState
  • The state information is invalid for this page and might be corrupted

In some cases, obvious network/firewall restrictions might be at play [3] – the key to this breed is that symptoms are sporadic.

Now before experimenting with ViewState chunking, enabling compression, questioning downstream devices, slaving to reduce ViewState or tinkering with load-balancers, consider disabling TOE as a starting point to see if symptoms abate. (You will always have a small trickle of these exceptions on high-volume sites that feature large ViewStates, but it shouldn’t be considerably higher than a 1 /13,000 page views average.)

Other symptoms – if you’re:

  • Experiencing sporadic ViewState corruption that can’t be reproduced consistently, and;
  • Are not seeing Application Pool restarts that may to be blame, and;
  • Can confirm that the issue is distributed randomly across site visitors (you do have this data, right? Site scrapping is generally a large source of ViewState exceptions);

…then you may want to consider the remedy above.

In the most recent case we encountered, Broadcom Gigabit Ethernet drivers were at play as well [4] (they’re notorious for this). That it might be tied to the number of connections the NIC is servicing is quite possible as well, but we never did see a clear correlation between volume and instances of the exceptions.

Not a ViewState Issue

ViewState is simply a common symptom of partial-offload failures in ASP.NET. In practise, reports of TOE failures interfering with SQL Server and Exchange are rampant as well [5]. This is a web-server network interface issue: offloading ViewState to a device like StrangeLoop or ScaleOut State Server wouldn’t alleviate it; keeping ViewState in process though, while chalk full of its own ramifications, would cause symptoms to abate. 

How does it impact you?

I’m very curious to hear from others on this issue. If you found this post useful or want to help, please drop a line in the comments and tell us about your experience.

References:

[1] – Disabling the TCP Chimney During IIS 6.0 Troubleshooting:
http://blogs.msdn.com/chaun/archive/2008/02/20/disabling-the-tcpchimney-during-iis-6-0-troubleshooting.aspx

[2] – Additional Readings

An update to turn off default SNP features is available:
http://support.microsoft.com/default.aspx/kb/948496

The Microsoft Windows Server 2003 Scalable Networking Pack release:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;912222

Some problems occur after installing Windows Server 2003 SP2:
http://support.microsoft.com/?id=945977

Error message when an application connects to SQL Server on a server that is running Windows Server 2003: "General Network error," "Communication link failure," or "A transport-level error":
http://support.microsoft.com/?id=942861

Scalable Networking: Network Protocol Offload - Introducing TCP Chimney:
http://www.microsoft.com/whdc/device/network/TCP_Chimney.mspx

Having Network Problems on Win2003 SP2?
http://blogs.msdn.com/jamesche/

Windows 2003 Scalable Networking pack and its possible effects on Exchange (part 1):
http://msexchangeteam.com/archive/2007/07/18/446400.aspx

Windows 2003 Scalable Networking pack and its possible effects on Exchange - Part 2:
http://msexchangeteam.com/archive/2008/03/12/448421.aspx

Broadcom Demonstrates Industry's First Fully Integrated TCP/IP Offload Engine (TOE) that Supports Microsoft's TCP Chimney Architecture:
http://www.broadcom.com/press/release.php?id=522952

Broadcom to Deliver TCP/IP Offload Technology:
http://www.broadcom.com/press/release.php?id=409783

SP2 Scalable Networking Pack & Connectivity Issues: http://blogs.technet.com/networking/archive/2007/11/04/sp2-scalable-networking-pack-connectivity-issues.aspx

[3] – ViewState Chunking
http://blogs.msdn.com/carloc/archive/2008/12/23/viewstate-validation-troubles.aspx

[4] – Dell PowerEdge & Broadcom Issues
http://mbccs.blogspot.com/2008/01/dell-poweredge-broadcom-issues.html

[5] – ServerFault:
http://serverfault.com/questions/9923/why-does-the-tcp-chimney-offload-feature-on-some-ethernet-cards-fail-to-pass-some

0 comments: