The older I get, the more lessons I seem to learn (or, not learn) over and over. Have you ever seen TCP offload work correctly? Of course not! I’ve been bitten by a TCP offload (aka TCP Offload Engine or TOE) problem in just about every environment I’ve touched in the last 20 years, and sadly this week was no exception.
To make a long story short, we have a production vmware ESXi 4.1 host with both Linux (CentOS) and Windows Server 2008 guests. No problems were reported (or measured) with the Linux guests, but the Win 2008 guests suffered from extremely choppy network connections, for common services like Remote Desktop and backups (including lost connections). As you probably know, I’m big into actually investigating the underlying cause of a problem rather than randomly throwing darts at it, and as such I grabbed some packet traces with wireshark. Check this out:
Ouch! That is super ugly (this is across a LAN, btw)! How can you screw up a single TCP connection so badly in 6 feet of cable? Probably not the cable (or network), sherlock. It appears this is a “known problem.” While this problem (described in the vmware article as “Network performance is very slow and connections drop intermittently”) seems contrained in the article to vmware guests running on a Windows host, I can attest to this occuring on both ESXi 4.0 Update 1 and ESXi 4.1 hosts with Windows guests. After following the instructions in this vmware article to remedy the situation by disabling TCP offload on the Win 2008 guests, they exhibit downright snappy network performance. Check out the improved trace results:
Moral of the story: TCP offload always sucks. Turn it off on Windows Server 2008 vmware guests.



4 Responses
July 29th, 2010 at 12:04 am
Is this TOE based on software or hardware?
Have you worked with hardware TOEs such as broadcom or lewiz chips?
July 29th, 2010 at 5:24 am
This particular host was using hardware-based TOE from Broadcom.
July 29th, 2010 at 1:30 pm
I thought so. Its a shame that TOE gets a bad rap due to poor implimenations.
I’ve heard lots of problems with Broadcom based TOE.
They looked into using Lewiz chips since Lewiz has a much better implimentation but they opted to try and make their own TOE chips and it’s failed miserably.
If you can get a system using Lewiz chips, you’ll see how TOE actually is supposed to work.
August 24th, 2010 at 7:45 pm
Thank you!! I just started using ESXi, had a Windows 2008 VM freshly installed and had nothing but trouble trying to connect to another physical Windows 2003 server. Trying to open network shares or attempting to join the domain would always result in “The specified network name is no longer available.” Applying this registry fix and rebooting solved it. Thanks!
Leave a Comment