Wednesday 30 September 2009

Error E000FE30 every day on one server...

...for months. For months I've struggled with a problem on ONE server, that happens to be at a remote site on a different subnet, connected via a WAN VPN Link.

Every day, one or more jobs would fail with Backup Exec Errors, mainly E000FE30 - with the useful and generic messages about "communications failure has occured" and sometimes the "connection lost to the remote agent".

Needless to say, I've spent some time working on this, and tried all sorts. Reconfiguring the system to use a different WAN link to ensure the fault isn't with the WAN. Nothing. Checking to ensure the issue isn't with the server, reinstalling agents, trying all sorts.

I've updated network drivers, checked all sorts of patches etc - but nothing, Still this error - consistently failing jobs.

I even got a colleague to look at it for a fresh pair of eyes and he too tried all sorts. Given the error, we suspected "something" to do with comms, but never found any issue, and in hundreds of tests conducted could never replicate the issue - transferring large files to/fro the server worked fine etc.

Today I found the answer. The "Large TCP Offload" feature on the Network Card. While I've seen plenty of issues with this feature before, you normally see it with terrible throughput on the system in general and so on - but this machine is solid as a rock for everything else.

Still, the setting is off, and first complete, full backups in a few weeks... voila!

Top tip for anyone else facing this problem - don't just check the network drivers, but try turning off these features, even if you cannot see this issue at any other time on the machine.

Is this a Backup Exec issue? I'm not sure, but I'm happy to blame it since everything else works just fine.