3 steps you can take to avoid a server meltdown
Thursday, August 16, 2012 • 7:01am
Expecting the unexpected
Last Wednesday evening we received an alert from one of our managed care customer sites that their small server room - more of a closet, really - was heating up. The alert showed temps of 75 degrees and were steadily climbing. After calling the customer we agreed that the best thing to do would be to remotely shut down the servers and meet them there the next morning to see what was going on.
On arrival the temperature in their server room had fallen, due to the three servers being shut down, but it was still pretty warm. Just a few minutes later we saw the AC repairman show up. Well, at least the mystery was solved.
The roof unit had failed.
Until the AC unit was repaired, turning on the servers wasn’t a very good idea. Temps would have quickly risen again, putting them at risk of failure. The same was true of the networking and phone equipment in that room.
We had three options. Wait for the repairs to be completed, go out and buy a portable floor AC or, because they had a Backup and Disaster Recovery appliance in place, spin up the backups as virtual machines in the cloud. Now I know you may be thinking that we went with the BDR , right? Why else would I even write an article unless our BDR solution came to the rescue? But you’d be wrong.
After discussing the options with the customer we actually went out and bought a portable floor AC unit from Home Depot. The thinking was that we could set the temp on the unit slightly higher than the normal temp in the office so that it would only kick on if the primary AC ever failed again. Redundancy… my favorite word.
After a trip to Home Depot and some finagling to get the vent to fit properly into the server room window we were up and running. Servers on, we closed the door just in time for the main AC to come back on.
Such is life.
But the important thing is that no damage was done to the servers, there was minimal down time and, in the event of another AC failure, the customer now has backup.
Is it overkill?
This customer didn’t think so. If we hadn’t been monitoring the room temperature who knows what they might have walked in on the next morning. Worst case would have been a fire or the remnants of one.
While their small 6 person office was prepared for disaster, avoiding one was far better than recovering from one. And even though they are a pretty small company, IT is the grease that keeps all their parts moving. Without it they are dead in the water.
For less than an additional $10 per month – we monitor all theirs systems – adding the room monitor was, for them, a no-brainer.
What you can do to avoid a meltdown
1 - Monitor server room temperature
As illustrated by this article, monitoring the temps in the server room can help prevent serious server damage by alerting you about increasing temps.
2 - Monitor the server
Most servers today are capable of providing information about the status many components including fans and CPUs. Monitoring this information in real-time can prevent a lot of small problems from becoming big one. Fan speed, for instance, has an upper and lower RPM threshold. Exceeding either limit is an indication of a problem. Knowing about it early gives you the opportunity to address the issue before any damage occurs.
3 - Scheduled cleaning
It's also not a bad idea to crack the server case open now and then and give the interior a good cleaning. Dust buildup can act like a blanket causing components to overheat. Depending on your environment you may want to schedule this once a year or once a quarter.
About the author
During work hours David is the President of Plenary Technology, an IT Services company in New Jersey that helps small businesses save money by reducing down time. Off hours he spends as much time as possible romping thru the woods with his dog, Maggie.