Friday, June 13, 2003

Aviod 5.0.10

In case some of you were starting to wonder if I'd ever get back around to talking about Domino, I present this little tale. I said quite a few words about Domino yesterday - none I'd want my kids, or you, gentle reader to hear.

Most of yesterday was spent trying to keep our mail servers from spontaneously combusting. Three weeks ago we took them to 5.0.10 (we run both Quickplace 3.0 and Sametime 3.0 and Lotus tell us they both need the NAB to be a 5.0.10 design - so not 5.0.12 until these two are sorted). We have a few other servers at 5.0.10 and were starting to get them all to the same version. Everything was fine after the upgrade.

Midmorning yesterday both the primary mail servers in the cluster crashed the nserver.exe task within 30 minutes of each other. The third (failover) server in the cluster went on happily and we went looking for a dubious message as a probable cause. Happily we had our Lotus Support Manager here for the day - so logged a call and sent off the RIPs straight away. A bit later on, they both crashed again within half an hour of each other, the third still unaffected.

There was lots of running around trying to figure out what was happening - dodgy message? corrupt NAB? The KBase was scoured, RIP files were examined and it seemed to be pointing to a MIME conversion bug. Why had it taken 3 weeks to appear? - we average 35,000 messages per day through these boxes.

By this time Lotus Support (after first convincing them that our support agreement was still valid - we do this every 2 months or so!) had interpreted the RIPs, found the problem and given us the relevant SPR. There was a hotfix, which was also incorporated 5.0.11 and above. Joy!

Lotus support provided excellent, responsive service to us yesterday - thanks!

Then the third server ripped - all by itself.

We then had a long and circuitous discussion with many management folk about whether the hotfix was the way to go, or should we just jump to 5.0.11/12 (along with fiddling ACLs to stop the design getting to Sametime and QuickPlace). Threats were made and termination of employment was canvassed if we made too couragous a decision. In the end, we applied the hotfix. It was only a DLL that had to be replaced - easy to back out of if it all turned the shape of a pear.

All three servers have been up since the hotfix - 18 hours so far.

Our Migration Plan for 6 is well underway - can't wait for that one!

A Sign from God - this morning's mail delivered Quickplace 3.0.1 and Domino 5.0.12 from Lotus. It must be time for 5.0.12.

No comments: