Monday 14 January 2008

Reliability? Can we say we're onto Stage 2?

I'm rather scared, and pleased, to say that we appear to have genuinely made it work - in my last few posts I was still sceptical that Backup Exec was just being nice to us, but it does appear it is genuinely now acting like a competent backup product.

One server has now been up for 27 days, and is still running backups quite happily, with 2,000 odd jobs run in that time (66 a day or so), and the CASO server just getting on with it's jobs and delegation. No more tears.

Our CPS system is still working - that's the most flawless part of Backup Exec I've seen. It's absolutely awesome - it says continuous protection, and it offers just that. We installed it, got over one hurdle of making it listen on a specific IP, and that was that. Our main file servers have just-below-real-time backups 24/7.

Our next challenge (Stage 2, which took 18 months to get to!) is to build a comprehensive reporting and restore testing infrastructure around the software. We want to know everything possible about what it does, so we can provide internal quality reports, ensure we meet SLAs and finally, but not least, ensure customers can be assured of regular, reliable backups.

Restore testing is often overlooked, but certainly not here! It is an essential, and core part of our plan to operate regular (hopefully scheduled) restore tests so we can be sure those backups actually work - as someone else we know found out to great cost - backing up isn't enough!