Thursday, January 26, 2006

A....P....C....DEAD!

I tell you, I've got to stop writing this damn thing. I've had more serious troubles with my server since I started writing about how to do stuff. It's almost like every time I make some progress, something else comes along to force my hand...

The other night, we came home from work to find the house dark. My house is automated with HomeSeer. I don't have a shitload of tasks setup, but the most important: turn on the outside and living room lights just before sunset apparently hadn't run. There's usually only one reason for that: loss of power to the server. The server's plugged into a huge surge protector* along with the TV and other electronic equipment in the living room, and a few weeks back I moved it from the floor to an inaccessible spot behind the equipment rack. Beeper, the cat, likes to sleep back there 'cause of the heater and it's pretty isolated. Unfortunately, the huge switch on the protector was too easily hit by the fat cat. If I couldn't get to my mail at any point in the day, I knew it was due to her. :)

But, as we moved closer to the house, we could hear the alarm beeping. Uh-oh. I pulled out my handy Husky pocket flashlight, and took a tour around the house, peeking in the windows and such. The house was secure, and I could see the clock flashing on the oven. Power outage. Fucking RG&E. Well, at least it wasn't anything serious.

Now, as the three people who read this blog know, I've got my drives setup in RAID arrays. But, that don't help much when your drives have become corrupted, or you corrupt the array yourself. I'll spare you the details because, frankly, I'm not 100% sure what I did, or why I had to do it. Suffice it to say, two hours later, I'd pretty much had enough with computers for life!

The next night, I ran to CompUOverpay and grabbed a 305va APC Back-UPS ES. I made sure it was supported under Linux before buying, of course. :) It's not a bad little UPS for $40. Considering you're slightly better protected from power surges, I'd recommend it as a good investment.

Anywho, fortunately setting it up is easy as pie. The first thing you need to do is install apcupsd. This is a reasonably simple install on pretty much any distro. On Fedora, it's as easy as "yum install apcupsd". Typically, you'd take the time to verify your UPS was being recognized by hotplug before bothering to setup the daemon, but I figured "fuck it". So far, Fedora's been pretty good at that stuff, so let's barrel on!

Even better than expected, the rpm containing apcupsd was already pre-configured for a USB UPS (prolly 'cause that's the most common kind now. Ya think?). So, for shits and giggles I typed "apcaccess" and was rewarded with tons of useful info!

APC : 001,034,0884
DATE : Thu Jan 26 16:10:35 EST 2006
HOSTNAME : someplace.oranother.com
RELEASE : 3.12.1
VERSION : 3.12.1 (06 January 2006) redhat
UPSNAME : someplace.oranother.com
CABLE : USB Cable
MODEL : Back-UPS ES 350
UPSMODE : Stand Alone
STARTTIME: Wed Jan 25 20:44:09 EST 2006
STATUS : ONLINE
LINEV : 120.0 Volts
LOADPCT : 68.0 Percent Load Capacity
BCHARGE : 100.0 Percent
TIMELEFT : 3.9 Minutes
MBATTCHG : 5 Percent
MINTIMEL : 3 Minutes
MAXTIME : 0 Seconds
LOTRANS : 088.0 Volts
HITRANS : 138.0 Volts
ALARMDEL : Always
BATTV : 13.5 Volts
LASTXFER : No transfers since turnon
NUMXFERS : 0
TONBATT : 0 seconds
CUMONBATT: 0 seconds
XOFFBATT : N/A
STATFLAG : 0x07000008 Status Flag
MANDATE : 2005-02-16
SERIALNO : XXXXXXXXXX
BATTDATE : 2000-00-00
NOMBATTV : 12.0
FIRMWARE : 00.e5.D USB FW:e5
APCMODEL : Back-UPS ES 350
END APC : Thu Jan 26 16:11:29 EST 2006

Yaay! (I took this at 4PM the next day, so that's why the battery's so well charged). I see I don't get a whole lot of time before I die, though. The drawbacks of using a dual-proc server. But, 4 minutes is more than enough time to gracefully shutdown the server and hopefully protect my data and such.

The first thing I need to address is the fact that I've got a W2K3/Exchange 2003 virtual machine running. That needs to be shutdown gracefully first to minimize damage to the database. I have a copy of GSX server, but unfortunately, the newest version of GSX doesn't support machines built with the newest version of Workstation. I've tried a couple of times to wedge it in there, but finally decided to wait for a new GSX. (I know, there are plenty of ways to do it, and I've tried a few with no success for various reasons. Don't bother, it's not that important at the moment). Well, here's the problem, once VMware came out with their "server" products, they removed the ability to shutdown machines gracefully at shutdown (you used to be able to put a line in the VMX file telling it to hibernate the machine on SIGHUP). Since it's a GUI app, I can't just script it, so I'd need a tool to do so, and I looked at a couple. None really did easily what I needed it to do (esentially: bring focus to that window, hit ctrl-Z).

Then, I remembered an easier solution: telnet. W2K3 includes a telnet server, and while I have it disabled by default, that's easy enough to change! So, I enabled and started the service and ran this on the Linux host:

autoexpect -f serversdn.exp telnet hostname

Expect is a nifty little scripting language with a specific purpose: automate other console apps. It's perfect for scripting a telnet session because you can tell it "wait for 'ogin:' and then send the username". Autoexpect simplifies this further. You tell it the name of the file to save your tasks to, and then the command you want it to run. When you're done, you have an expect script that needs no more than a tiny bit o' tweaking to get you up and running.

So, I scripted it to telnet into the server, shutdown the Exchange services** and then shutdown the machine:


set force_conservative 0 ;
if {$force_conservative} {set send_slow {1 .1}
proc send {ignore arg} {sleep .1 exp_send -s -- $arg}# }

set timeout -1
spawn telnet server
match_max 100000
expect "login: "
send -- "administrator\r"
expect "password: "
send -- "easypass\r"

expect "Administrator>"
send "net stop MSExchangeIS /y\r"

expect "Administrator>"
send -- "net stop MSExchangeMTA /y \r"

expect "Administrator>"
send -- "net stop MSExchangeSA /y \r"

expect "Administrator>"
send -- "net stop WinHttpAutoProxySvc /y\r"

expect "Administrator>"
send -- "net stop HomeSeerService /y\r"

expect "Administrator>"
send -- "tsshutdn 0 /powerdown /delay:0\r"

interact


Does it work? Oh, hell yeah it works! I had to do a little tweaking of the server first, though. On the first few passes, it took two minutes and fourty five seconds to shut down. Since I've got just under four minutes of battery power, that might not leave enough time to shut the box down. Fortunately, I've got a little experience with Winders, too...

Open regedit, and change the following:


"HKCU\Control Panel\Desktop\AutoEndTasks" change from "0" to "1"

"HKCU\Control Panel\Desktop\WaitToKillAppTimeout" This one defaults to 20000 milliseconds, I believe. Change it to 2000.

"HKCU\Control Panel\Desktop\HungAppTimeout" Same as above.

Duplicate the above two entries for HKEY_USERS\.DEFAULT so it'll apply to new users as well.

Finally, change "HKLM\System\CurrentControlSet\ControlWaitToKillServiceTimeout" to 2000 as well.


The difference? The Exchange VM now shuts down in one minute and ten seconds. That's a whole lot better, huh?

Now, all I need to do is tell apcupsd what to do when the power goes out, and BOOM! everything shuts down easy as pie. This part's easy enough to figure out. Edit /etc/apcupsd/apccontrol and put your shutdown commands in the various case blocks.

I did a test run by pulling the cord on the UPS. Within a couple of seconds, I watched the VM shutdown and turn itself off. The Linux box then followed soon after without too much issue. I had to tweak the timings as the VM didn't entirely shutdown fast enough, but I think I've got it all set now.

Oh, one final step: go into your BIOS and look for a setting called "Restore on AC/Power Loss". Change it to "Full On" or "Power On". ATX-based machines don't automatically power back on, but changing this setting will make it happen. That way, if the power's only out for a short time, your machine'll be back up and running when you come back!


* I don't put a lot of stock in surge protectors. Even the best triacs used to clamp the circuit are generally not fast enough to stop a lightning bolt from killing Stevie and his siblings. However, I WILL generally spend the extra $10-20 and get a good one 'cause they usually come with guarantees that cover zapped equipment. :)

**This is a single machine acting as domain controller and Exchange server. In that combo, it's best to shutdown your Exchange services before you shutdown. If you take the machine down without doing that, it'll enter a race condition where it tries to shut the services down, but it can't query the domain controller properly because that's going down...the short of it is, in this condition, it can take 30-40 minutes for the box to shut itself down. I don't got that kind of time. Oh, and to prevent accidently doing it when I'm in the machine, I've removed the Shutdown command from the start menu via a policy and replaced it with a batch file that does it right. Where possible, always put a cover over the power switch. ;-)

No comments:

Post a Comment