Friday, December 30, 2005

How to get yourself canned

No, I didn't get canned again, but using the info in this article can get your ass in serious hot water if you use it and get caught. You've been warned! Before attempting to use any of these tools, you should also invest in an SSL certificate for your webserver. Can't afford one? No? Have you not heard of CACert? You can get a free cert for your machine from there. They also provide their root certificate in an installable form for your browser so you won't get those annoying "this site is encrypted, but we don't trust it" messages.

My POB has blocked pretty much all Internet access. They've closed down ALL ports except 80 & 443, and those go through a highly restrictive proxy which, oddly enough, blocks www.espn.com, but allows access to Yahoo and Google mail (where you could bring in a virus or trojan). Oh, well, I don't set the policy, I just need to circumvent it.

The first tool in my arsenal is Anyterm. Anyterm is an Apache module which presents to you a terminal window in a webpage. This isn't one of those PHP-based "type a command, see the output" things, you get a full and true terminal window that you can do pretty much anything. Edit a file with vi, play games, whatever, it's possible. Phil Endecott, the author, has done a great job on this. It only came out a short while ago, but it quickly reached a highly stable state, and it just continues to improve. But, don't take my word for it, hit the site, check out the demo. Just make sure you put a pillow on your desk edge so when your jaw hits it, you don't get hurt. :)

The install of the above is well-documented, it's here mostly to let you know it's there. Anyterm has been a great assist in getting this new machine setup, as I can now do it at work when I have more time. :) However, it is limited to a terminal session, and you can't cut and paste into it either. For more advanced things, it would be nice if I could do remote X or even VNC.

Well, you can! With GNU httptunnel, you can create TCP-over-HTTP tunnels that allow you to use any port you want by tunneling it through your home web server. httptunnel consists of two parts: a server which you need to install on an unrestricted machine on the Internet and a client which goes on the machine behind the restrictive firewall.

Installation is as simple as doing "configure && make && make install". To start the server, I use this command:

hts --forward-port localhost:22 80

This tells it to forward incoming httptunnel connections on port 80 to port 22 (ssh, for the real newbs). Now, on the client side, start the tunnel:

htc --forward-port 900 --proxy proxy.mycompany.com:80 --proxyauthorization myname:mypass www.myhomemachine.com:80

For simplicity and security, I put the above line in an htstart script and replaced "mypass" with "$1". That way, I can start it without needing to put my password in a text file. For some reason, GNU decided no output was useful output. So, if you run the above command and nothing appears to have happened, it probably did. To try, just fire up your favorite ssh client and make a connection to localhost on port 900. If all went well, you should get a login prompt from your home machine. Yaay!

Now, you can tunnel any app you want through httptunnel, but I recommend keeping it limited to ssh. First, httpt doesn't support SSL, so you need to provide some kind of encryption. Secondly, httpt only supports one port per instance, and ssh'll allow you to do more. This way, the only unencrypted part of your "conversation" with the outside world is that initial connection. As long as you don't press your luck and use the tunnel too often for large amounts of data, your network team shouldn't notice and you can do what you need.

Good luck, be careful and stay employed!

Tuesday, December 27, 2005

RAIDing the data

Having worked in the IT filed for pretty much all of my adult life, and some of my non-adult life, there's one lesson I've learned the hardway: hard drives die. If you store your important data on a hard drive, chances are you're going to lose that data somehow someday. Backups are a great way to preserve your data from these inevitible failures (they'll happen, it's just a matter of when!) The problem is, backing up and protecting data needs to be easy enough that you'll remember to do it, otherwise it just don't happen. I've been doing this for a very long time, and even I don't backup my data as often as I should. I've lost enough data in my life you'd think I'd learn, but...

So, one of my primary goals with this new server was to have it protect my data without needing my intervention. I'm going to do this is a multi-layer approach, and using RAID arrays is the first layer. For those not familiar, RAID stands for Redundant Array of Inexpensive Disks. Hard drives are cheap these days. It's almost impossible to find a drive that doesn't come with triple-digit gigabytes anymore. In fact, the most recent addition to the server, a 200G 7200 RPM monster, cost me only $30 after rebate. The basic idea behind RAID is to spread your data across multiple cheap disks in order in such a way that if one fails, you don't lose everything. You can Google for more info. The two levels of RAID I'll be using are RAID-1 and RAID-5.

RAID-1 is commonly known as "drive mirroring". I setup two drives of equal size, and everytime I write data to one, it's written to the other. If one drive fails, I have a duplicate of the data on the other one. The big drawback to this setup is that writing to two drives is typically slower than writing to one. The other is that you "lose" a whole drive. If you take two 200G drives and mirror them, you only get to store 200G of data. I offset the first drawback by putting the two drives on different controllers in the system (also known as drive duplexing since I'm protected by redundant controllers as well). Performance is then not affected as much. The second is offset by the fact that drives are cheap. For $30, I can't afford to not protect my data.

RAID-5 is also known as "striping with parity". RAID-5 requires at least three drives. In a nutshell, let's say you wanted to store the following sequence of numbers:

1 2 3 4 5 6

With a single drive, all numbers are written to the single drive, obviously. In a mirror, all 6 are written to each drive. In a RAID-5, the numbers are spread out across multiple drives, with one of the drives storing "parity data". Parity data is essentially a mathematical formula that describes the data such that if you lost one of the three pieces of data, you can recover the third. So, here's how the data would look on a R5 array:

Drive 1 Drive 2 Drive 3
1 2 P3
P7 3 4
5 P11 6

For simplicity's sake, I used a simple algorithm to calculate the parity: I added the data written to each drive together. Let's say we lose Drive 1. Well, we can figure out that the data missing from row 1 is the number 1 since we know X + 2 = 3. The parity for row 2 is 7 because X = 3 + 4 and so on. In most implementation, if a drive fails, the system will stay up and running until you replace it since it can figure out what's missing. Drawbacks: performance is similar to RAID-1 in that you have to write now to three drives. Also, you lose one drive's worth of space, but not as much as in a mirror.

So, how am I using this? Well, in this machine I have the following drives: 1x120G, 2x200G. The 120 is where the OS is stored, and the 200s typically store my data. I'm going to carve them up as shown in this picture:




My "I can't possibly live without this, so it needs maximum protection" data is stored on the RAID-5 array listed as "personal". My Exchange virtual machine is stored on the one listed as "exchange". I have them separated to minimize corruption issues. My "I'd like to make sure I don't lose this since it's a PITA to replace, but I CAN replace it if necessary" data is stored on the RAID-1 (MP3s, videos, etc). Here's how I did it:

The first problem is the fact that HDE was already setup with a single 200G partition, filled with about 140G of data. I didn't have enough drive space to store it elsewhere, but fortunately, the mdadm tool in Linux gives us a simple workaround

Firstly, I needed to very carefully document what it was I wanted to do. I got ADD, so I have to make absolutely sure I've got a detailed plan of attack or I'll forget stuff. :)

Creat an /etc/mdadm.conf file

echo 'DEVICE /dev/hd* /dev/sd*' > /etc/mdadm.conf

This tells mdadm that any hard drive in the machine could be considered a candidate for creating arrays, and at boot time, find them.

Partition the disks

HDA already has some partitions on it, some of which I didn't need anymore. I created two 10G partitions, and one big one for the last. Since this is going to be a MythTV box, I'll use the free space on the drive for scheduled, temporary recordings. If I want to save something, I can reencode it and put it into the store. On HGE, I also created 2 10G partitions and one remaining bigity-big one. Remember also when in fdisk to set the partition type to "fd" (they're created as "83" by default). "fd" is the type for Linux RAID Autodetect.

A quick reboot later into single user mode (safest way to do this stuff), I created my arrays. Three simple commands:

mdadm --create /dev/md0 --level 5 --raid-devices=3 /dev/hda6 /dev/hdg1 \ missing
mdadm --create /dev/md1 --level 5 --raid-devices=3 /dev/hda7 /dev/hdg2 \ missing
mdadm --create /dev/md2 --level 1 --raid-devices=2 /dev/hdg3 \ missing

The "missing" directive is what allows me to keep my data intact until the last drive is ready to add to the array. It allows me to create the array without all of the partitions. Now, all we do is format the filesystems:

mkfs.jfs /dev/md0
mkfs.jfs /dev/md1
mkfs.jfs /dev/md2

I chose JFS based on recommendations on the MythTV board. I knew I wanted a journaling filesystem for the arrays, primarily due to their size, and JFS seems to be the most "stable" in this configuration.

As a final step in this section, I mounted the arrays (treat them as a regular drive, i.e. "mount /dev/md0 /mnt/personal") and copied all of the data from the HDE partition over to the new arrays.

Once all the data was copied over, I simply fdisked HDE so that it's partition table was similar to HDG's. A note: Linux' RAID support is pretty flexible. You don't have to worry about getting EXACTLY the same number of blocks in each partition. When I created the personal and exchange partitions, I used "+10000M" in each of the "end block" sections. That got them close enough on size.

To finalize the arrays and get them to sync, simply issue the following:

mdadm --add /dev/md0 /dev/hde1
mdadm --add /dev/md1 /dev/hde2
mdadm --add /dev/md2 /dev/hde3

To check the sync status, do "cat /proc/mdstat". You'll see them syncronizing. Go do something for a half hour or so. The system will sync each in turn. When everything's up and running, you'll see "[UUU]" or "[UU]" at the end of each status line (indicating all three or two drives are Up). It's probably not entirely dangerous, but you should wait until the arrays are up fully before trying to use them.

Some final config file changes

Obviously, you'll need to add the arrays to your /etc/fstab. Again, treat them just like any other type of drive. Mine now looks like this:

/dev/hda3 / ext2 defaults 1 1
/dev/md0 /mnt/personal jfs defaults 0 0

And, so on. You should also do the following:

mdadm --detail --scan >> /etc/mdadm.conf

This puts the information on the arrays in the conf file. Mdadm doesn't really need this, but it's good for you to have the info in the future.

As a final step, let's make sure we know when our arrays have degraded. Edit your rc.local (or its equivelant in your distro) and include the following line for each of your arrays:

nohup mdadm --monitor --mail=root@localhost --delay=300 /dev/md0 &

Don't forget the ampersand. This line will send an e-mail to you anytime mdadm detects a degraded array. The mail directive simply acts as a "frontend" to sendmail, mdadm doesn't actually send the mail. So, if sendmail isn't setup properly, you won't get the mail. Which is why I have it going to the root mailbox.. ;-)

I think that's about it. Sorry it was a bit long, but there was a lot of ground to cover. Drop me a comment if you found this info useful!

Wednesday, December 21, 2005

Tweaking the penguin's nipples

hdparm: nipple clip of yesteryear

The first tweak I'd recommend is starting with hdparm. It can't hurt, and might help. This article is the one I learned how to use hdparm from a long, long time ago. It's a little dated, but the commands are still the same, so you can at least learn how it works. More than likely, it won't make any difference as the features it's supposed to enable are typically enabled by default. My most current machine did not need hdparm.

Too many consoles spoil the soup

Next, remove some extra virtual consoles. On a Linux box, when you're at the console, you can hit Alt-F1-6 and switch between consoles. Kinda useless if you're booting into a GUI, so let's disable them:

vi /etc/inittab

You'll see a bunch of lines like:

1:2345:respawn:/sbin/mingetty tty1

Comment out the lines that begin with 3 through 6 (keep two as an in case). When you next reboot, you'll only have two gettys running and have freed up a tiny bit of RAM. Hey, tiny, but you weren't using it, right?

Clear out those services

Yes, I'll tell you to do it, too. But, I'll tell you which you can prolly lose:

apmd - The Advanced Power Management daemon. If you don't have a laptop, you can more than likely kill this one.

gpm - Essentially the console mouse driver. In a GUI all the time? Kill it, you won't be using it.

identd - Used for IRC to identify you. Some IRC servers require you to run one. If you don't IRC, say buh-bye!

ip6tables & iptables - Ip6tables can go away if you're not using IPv6. Iptables should only be disabled if your machine is behind a firewall. Even then, you should consider keeping it, but that's your call. This is your machine's firewall.

isdn - Useless if you don't have an ISDN line.

nfs & portmapper - if you're not connecting to NFS file shares somewhere, this can go.

Sloppy with your swappi

The 2.6 kernel gave us the ability to determine how likely things will be swapped to disk, rather than kept in main memory. The kernel does these calculations constantly and there is no cut-and-dried guidelines for setting this. Default is 60, but you can set any value you want betweeen 0 and 100. I use 30 and find that works fine for me on the few occasions I might need to swap (my machine has a gig of RAM, and rarely uses more than that). 20 is a good number if you've got a laptop and want to force the kernel to swap only when absolutely necessary. This is good for those slow-ass laptop harddrives.

Two ways you can set this:

echo 30 >/proc/sys/vm/swappiness

This is a temporary method and lasts until your next reboot. But, you can do a lot of testing by modifying the swappiness on the fly and then determining what works best for you regularly. You can use "free -m" to view your RAM and swap usage at any time. At that point you can....

Most distros use /etc/sysctl.conf to control things like this. Set vm.swappiness = 30 in that file and it'll follow with each reboot.

More to come.

Penguin Performance

If you're not that familiar with Linux, I'm going to break silence and let you in on a little secret: all those claims of higher performance are bunk. They ARE true...under certain circumstances, such as you're not running a GUI and you don't run any services and pretty much don't log in. Yeah, I'm exaggerating a little bit, but only a little. :)

Here's the top three responses you'll get if you ask how to tweak the performance on a Linux box so that it's close to the speed of a Windows machine:

1. Disable any unecessary services. Thanks, that's probably the most useless suggestion ever. Here's the deal, if a person doesn't know enough to do this first, they're probably not going to know enough to know which services are unecessary. Hell, I've been working with Linux for 12 years, and when I use a mainstream distro, I sometimes come across things whose purpose is unknown to me.

2. Switch to a "lighter" GUI (or, my favorite: don't use X at all. Neandertals). For those coming from the Windows world, let me explain how GUIs work on the Unix side of the world:

At the most basic level, you have the underlying OS which is made up of the kernel and some supporting utilities. You're going to run into morons who tell you "Linux isn't an OS, it's a kernel". Well, a kernel's pretty useless on its own, so we can glob those necessary things together and make us up an OS. Linux is now known to be the whole OS, not just the kernel. It's called the evolution of language. Evolve or die.

Sorry, I like to rant. Okay, on top of the base OS, you have the X Windows system. The former one used mostly on Linux was XFree86. In recent years, most distros have switched over to X.org's version. Some did it for licensing reasons, others did it because XFree's development team is made up of an arrogant bunch of bastards who'd rather spend more time complaining that no one follows their arbitrary rules than actually producing code. In the Windows world, X doesn't have an immediate twin aside from probably the GDI subsystems. Basically, X is what puts the GUI on your screen. You know how you log into your Windows box, and for half a second you only see a green screen and a mouse cursor? That's the Windows subsystem that corresponds to X.

On top of X, you need a window manager. This is the subsystem that actually draws windows on the screen and all their widgets and stuff. It's responsible for how things LOOK to the end user. X buys the canvas, the wm draws on it. In the Windows world, despite what most *nix people think, there IS a corresponding component: Explorer (not INTERNET Explorer, basic Explorer. The one that shows you where your files are. It's not a 100% equal comparison, but Explorer does handle a very large portion of what a wm does. Like *nix, you can replace your explorer with another (like LiteStep) and change the way your interface works. One of the big problems with Linux is the hugemongous number of wms out there. I think last count had it at just under 200. Yes, I said it's a problem. Too much choice makes decisions impossible.

Finally, in some cases, you have a "desktop environment". These include KDE & Gnome. DEs extend beyond the window manager to include things like: how drag & drop works between programs, how menus are created and maintained, interprogram communications and compatibility and so on. Without a DE, programs work about as well together as most groups of people...sometimes they'll be compatible enough, but most days you'll wonder why you bothered coming in.

It's these DEs that most *nixers have a problem with. Because they've lived without a modern OS for so long, they don't realize what they're missing out on. The console was good enough for their grandfathers, it's good enough for them. Anyone who complains about anything being "bloated" probably drives around in a 1965 Dodge Dart 'cause "all that technology makes the car too complex. Who needs anti-lock brakes, fuel efficiency, safety or FM radios? That's all useless fluff!" So, they'll tell you to at least switch to a lighter wm like Afterstep or WindowMaker or RatPoison. These have low memory footprints, and without the GUI taking up as much RAM, your computer seems to be faster. Kinda like stripping off the outer shell of a car to speed it up. It'll be faster, sure, but not much fun to drive in the rain.

Besides, I want my computer to work at least as well as it does in Windows, so that means I need all of the same kinds of features AND I want it to be as fast! Suggestion #2 is useless in those circumstances. (Keeping in mind that I like to use WindowMaker anyway, but that's not the point).

3. Use hdparm and tweak your harddrive settings. Almost a good suggestion. The problem again comes when you're using a modern distro. The kernel will generally setup your drives and controllers for maximum performance. In the last 3-4 years, I can't remember a single machine I've installed Linux on that "tweaking" with hdparm made any difference. The tweaks are implemented already.

So, that being said, I'm going to give you some real tweaks that should help some and make a difference for you. They'll be found in this article because this one's getting a bit long. :)

Tuesday, December 20, 2005

FC4 almost makes itself an enemy!

Installed Fedora Core 4, and it made my system unbootable. Turns out it's an issue that's been around since at least FC2 where the CD does some funky stuff on drives 120+ Gigs. It seems the installer doesn't understand LBA and it writes the partition table with the wrong geometry screwing everything up. Unfortunately, the only way to fix it was to wipe the MBR and remove grub. Even more unfortunately, the old way of fixing the MBR (fdisk /mbr) has been removed in XP and you have to boot to the recovery console. Since I was too stupid to install it on this one machine (as opposed to EVERY OTHER XP/2003 MACHINE I'VE EVER BUILT), I had to scramble to dig up a bootable one. Hint: this is really the only way to fix it. I tried numerous utils I found that claimed to repair MBRs, but they all required you to have backed them up in the past...

After getting my system working again, I repartitioned it, but I used XP's Disk Management to do so. I created blank partitions, and then let FC4 format and install onto them. This time it worked right. I also installed Grub into the MBR of the FC4 partition, rather than the root of the drive (I wasn't going through THAT again!). I used bootpart to add it to the XP boot menu. This is a lot cleaner solution than it was in the past with Lilo. Lilo wrote a new sector with each kernel change, Grub writes it once. Booting is now perfect.

The next step was to start following the Fedora Core-Myth Howto, which included updating the system to the latest. The first time I ran yum update, I had about 1700 packages that needed updating, but it kept failing transaction tests due to some weird conflict with KDE's Polish translations (I'll leave the jokes to others). Finally, I got around my issues by doing yum --exclude=kde-il8n* update. Everything necessary updated, and when I ran it again, it updated any remainders. Yay.

My goal from that point was to get this system up to the point that it was doing everything that I used it for on a daily basis. It only had to operate, for now, at the same level of functionality at least. New features could be added later.

So, to that end, I needed at least:

VMWare to install my Windows 2003/Exchange 2003 environment into.
Apache acting as a reverse proxy. This is so I can use OWA and a few other web-based apps from work.
HomeSeer, my home automation system. This has to go into the VM with Exchange, as it's Windows-based.
Some kind of TV-viewing app. I never got around to doing the PVR thing on the Windows side, so I just need something that can show the output of my cable box. I setup MythTV, and got it mostly working, but I'm having some issues with audio sync. For the time being, I just use XawTV. Close enough.

Aside from HomeSeer, I'm good to go. I setup VNC as a second session on the server, so I can do my work an not disturb LMC while she watches TV. Last night, I was using her laptop to work while sitting on the couch. She looked over and saw the VNC session I had running full screen and says, "WHAT DID YOU DO?!" hehehe, it's not your screen, Babe. :)

LFS is dead to me

Boy, been a long time since I posted here, too! Followers from my old site know about my personal side taking up so much time recently, but on the technology side, I've been keeping myself busy, too.

Since October-ish, I've been working to build a new system based on LFS 6.1. I decided to use nALFS to automate the task, and built new profiles with each build. Of course, the fact that it took me a month of fiddling to get glibc to even build doesn't count, right? :)

I finally decided I'd had enough and that I was going to convert my primary media server over from XP to Linux, so I went with Fedora Core 4. I have to say, I've been very impressed so far. Easily, Linux has come a long way to be a viable desktop OS. In fact, I'd even go so far as to say I think I could use this on a daily basis. Time will tell on that one, for sure. hehehe

I guess from this point on I'll document how things are progressing through the build, as well as any fixes and solutions I found to problems encountered. Hopefully, they'll be useful to other folks, too.