Friday, December 30, 2005

How to get yourself canned

No, I didn't get canned again, but using the info in this article can get your ass in serious hot water if you use it and get caught. You've been warned! Before attempting to use any of these tools, you should also invest in an SSL certificate for your webserver. Can't afford one? No? Have you not heard of CACert? You can get a free cert for your machine from there. They also provide their root certificate in an installable form for your browser so you won't get those annoying "this site is encrypted, but we don't trust it" messages.

My POB has blocked pretty much all Internet access. They've closed down ALL ports except 80 & 443, and those go through a highly restrictive proxy which, oddly enough, blocks www.espn.com, but allows access to Yahoo and Google mail (where you could bring in a virus or trojan). Oh, well, I don't set the policy, I just need to circumvent it.

The first tool in my arsenal is Anyterm. Anyterm is an Apache module which presents to you a terminal window in a webpage. This isn't one of those PHP-based "type a command, see the output" things, you get a full and true terminal window that you can do pretty much anything. Edit a file with vi, play games, whatever, it's possible. Phil Endecott, the author, has done a great job on this. It only came out a short while ago, but it quickly reached a highly stable state, and it just continues to improve. But, don't take my word for it, hit the site, check out the demo. Just make sure you put a pillow on your desk edge so when your jaw hits it, you don't get hurt. :)

The install of the above is well-documented, it's here mostly to let you know it's there. Anyterm has been a great assist in getting this new machine setup, as I can now do it at work when I have more time. :) However, it is limited to a terminal session, and you can't cut and paste into it either. For more advanced things, it would be nice if I could do remote X or even VNC.

Well, you can! With GNU httptunnel, you can create TCP-over-HTTP tunnels that allow you to use any port you want by tunneling it through your home web server. httptunnel consists of two parts: a server which you need to install on an unrestricted machine on the Internet and a client which goes on the machine behind the restrictive firewall.

Installation is as simple as doing "configure && make && make install". To start the server, I use this command:

hts --forward-port localhost:22 80

This tells it to forward incoming httptunnel connections on port 80 to port 22 (ssh, for the real newbs). Now, on the client side, start the tunnel:

htc --forward-port 900 --proxy proxy.mycompany.com:80 --proxyauthorization myname:mypass www.myhomemachine.com:80

For simplicity and security, I put the above line in an htstart script and replaced "mypass" with "$1". That way, I can start it without needing to put my password in a text file. For some reason, GNU decided no output was useful output. So, if you run the above command and nothing appears to have happened, it probably did. To try, just fire up your favorite ssh client and make a connection to localhost on port 900. If all went well, you should get a login prompt from your home machine. Yaay!

Now, you can tunnel any app you want through httptunnel, but I recommend keeping it limited to ssh. First, httpt doesn't support SSL, so you need to provide some kind of encryption. Secondly, httpt only supports one port per instance, and ssh'll allow you to do more. This way, the only unencrypted part of your "conversation" with the outside world is that initial connection. As long as you don't press your luck and use the tunnel too often for large amounts of data, your network team shouldn't notice and you can do what you need.

Good luck, be careful and stay employed!

Tuesday, December 27, 2005

RAIDing the data

Having worked in the IT filed for pretty much all of my adult life, and some of my non-adult life, there's one lesson I've learned the hardway: hard drives die. If you store your important data on a hard drive, chances are you're going to lose that data somehow someday. Backups are a great way to preserve your data from these inevitible failures (they'll happen, it's just a matter of when!) The problem is, backing up and protecting data needs to be easy enough that you'll remember to do it, otherwise it just don't happen. I've been doing this for a very long time, and even I don't backup my data as often as I should. I've lost enough data in my life you'd think I'd learn, but...

So, one of my primary goals with this new server was to have it protect my data without needing my intervention. I'm going to do this is a multi-layer approach, and using RAID arrays is the first layer. For those not familiar, RAID stands for Redundant Array of Inexpensive Disks. Hard drives are cheap these days. It's almost impossible to find a drive that doesn't come with triple-digit gigabytes anymore. In fact, the most recent addition to the server, a 200G 7200 RPM monster, cost me only $30 after rebate. The basic idea behind RAID is to spread your data across multiple cheap disks in order in such a way that if one fails, you don't lose everything. You can Google for more info. The two levels of RAID I'll be using are RAID-1 and RAID-5.

RAID-1 is commonly known as "drive mirroring". I setup two drives of equal size, and everytime I write data to one, it's written to the other. If one drive fails, I have a duplicate of the data on the other one. The big drawback to this setup is that writing to two drives is typically slower than writing to one. The other is that you "lose" a whole drive. If you take two 200G drives and mirror them, you only get to store 200G of data. I offset the first drawback by putting the two drives on different controllers in the system (also known as drive duplexing since I'm protected by redundant controllers as well). Performance is then not affected as much. The second is offset by the fact that drives are cheap. For $30, I can't afford to not protect my data.

RAID-5 is also known as "striping with parity". RAID-5 requires at least three drives. In a nutshell, let's say you wanted to store the following sequence of numbers:

1 2 3 4 5 6

With a single drive, all numbers are written to the single drive, obviously. In a mirror, all 6 are written to each drive. In a RAID-5, the numbers are spread out across multiple drives, with one of the drives storing "parity data". Parity data is essentially a mathematical formula that describes the data such that if you lost one of the three pieces of data, you can recover the third. So, here's how the data would look on a R5 array:

Drive 1 Drive 2 Drive 3
1 2 P3
P7 3 4
5 P11 6

For simplicity's sake, I used a simple algorithm to calculate the parity: I added the data written to each drive together. Let's say we lose Drive 1. Well, we can figure out that the data missing from row 1 is the number 1 since we know X + 2 = 3. The parity for row 2 is 7 because X = 3 + 4 and so on. In most implementation, if a drive fails, the system will stay up and running until you replace it since it can figure out what's missing. Drawbacks: performance is similar to RAID-1 in that you have to write now to three drives. Also, you lose one drive's worth of space, but not as much as in a mirror.

So, how am I using this? Well, in this machine I have the following drives: 1x120G, 2x200G. The 120 is where the OS is stored, and the 200s typically store my data. I'm going to carve them up as shown in this picture:




My "I can't possibly live without this, so it needs maximum protection" data is stored on the RAID-5 array listed as "personal". My Exchange virtual machine is stored on the one listed as "exchange". I have them separated to minimize corruption issues. My "I'd like to make sure I don't lose this since it's a PITA to replace, but I CAN replace it if necessary" data is stored on the RAID-1 (MP3s, videos, etc). Here's how I did it:

The first problem is the fact that HDE was already setup with a single 200G partition, filled with about 140G of data. I didn't have enough drive space to store it elsewhere, but fortunately, the mdadm tool in Linux gives us a simple workaround

Firstly, I needed to very carefully document what it was I wanted to do. I got ADD, so I have to make absolutely sure I've got a detailed plan of attack or I'll forget stuff. :)

Creat an /etc/mdadm.conf file

echo 'DEVICE /dev/hd* /dev/sd*' > /etc/mdadm.conf

This tells mdadm that any hard drive in the machine could be considered a candidate for creating arrays, and at boot time, find them.

Partition the disks

HDA already has some partitions on it, some of which I didn't need anymore. I created two 10G partitions, and one big one for the last. Since this is going to be a MythTV box, I'll use the free space on the drive for scheduled, temporary recordings. If I want to save something, I can reencode it and put it into the store. On HGE, I also created 2 10G partitions and one remaining bigity-big one. Remember also when in fdisk to set the partition type to "fd" (they're created as "83" by default). "fd" is the type for Linux RAID Autodetect.

A quick reboot later into single user mode (safest way to do this stuff), I created my arrays. Three simple commands:

mdadm --create /dev/md0 --level 5 --raid-devices=3 /dev/hda6 /dev/hdg1 \ missing
mdadm --create /dev/md1 --level 5 --raid-devices=3 /dev/hda7 /dev/hdg2 \ missing
mdadm --create /dev/md2 --level 1 --raid-devices=2 /dev/hdg3 \ missing

The "missing" directive is what allows me to keep my data intact until the last drive is ready to add to the array. It allows me to create the array without all of the partitions. Now, all we do is format the filesystems:

mkfs.jfs /dev/md0
mkfs.jfs /dev/md1
mkfs.jfs /dev/md2

I chose JFS based on recommendations on the MythTV board. I knew I wanted a journaling filesystem for the arrays, primarily due to their size, and JFS seems to be the most "stable" in this configuration.

As a final step in this section, I mounted the arrays (treat them as a regular drive, i.e. "mount /dev/md0 /mnt/personal") and copied all of the data from the HDE partition over to the new arrays.

Once all the data was copied over, I simply fdisked HDE so that it's partition table was similar to HDG's. A note: Linux' RAID support is pretty flexible. You don't have to worry about getting EXACTLY the same number of blocks in each partition. When I created the personal and exchange partitions, I used "+10000M" in each of the "end block" sections. That got them close enough on size.

To finalize the arrays and get them to sync, simply issue the following:

mdadm --add /dev/md0 /dev/hde1
mdadm --add /dev/md1 /dev/hde2
mdadm --add /dev/md2 /dev/hde3

To check the sync status, do "cat /proc/mdstat". You'll see them syncronizing. Go do something for a half hour or so. The system will sync each in turn. When everything's up and running, you'll see "[UUU]" or "[UU]" at the end of each status line (indicating all three or two drives are Up). It's probably not entirely dangerous, but you should wait until the arrays are up fully before trying to use them.

Some final config file changes

Obviously, you'll need to add the arrays to your /etc/fstab. Again, treat them just like any other type of drive. Mine now looks like this:

/dev/hda3 / ext2 defaults 1 1
/dev/md0 /mnt/personal jfs defaults 0 0

And, so on. You should also do the following:

mdadm --detail --scan >> /etc/mdadm.conf

This puts the information on the arrays in the conf file. Mdadm doesn't really need this, but it's good for you to have the info in the future.

As a final step, let's make sure we know when our arrays have degraded. Edit your rc.local (or its equivelant in your distro) and include the following line for each of your arrays:

nohup mdadm --monitor --mail=root@localhost --delay=300 /dev/md0 &

Don't forget the ampersand. This line will send an e-mail to you anytime mdadm detects a degraded array. The mail directive simply acts as a "frontend" to sendmail, mdadm doesn't actually send the mail. So, if sendmail isn't setup properly, you won't get the mail. Which is why I have it going to the root mailbox.. ;-)

I think that's about it. Sorry it was a bit long, but there was a lot of ground to cover. Drop me a comment if you found this info useful!

Wednesday, December 21, 2005

Tweaking the penguin's nipples

hdparm: nipple clip of yesteryear

The first tweak I'd recommend is starting with hdparm. It can't hurt, and might help. This article is the one I learned how to use hdparm from a long, long time ago. It's a little dated, but the commands are still the same, so you can at least learn how it works. More than likely, it won't make any difference as the features it's supposed to enable are typically enabled by default. My most current machine did not need hdparm.

Too many consoles spoil the soup

Next, remove some extra virtual consoles. On a Linux box, when you're at the console, you can hit Alt-F1-6 and switch between consoles. Kinda useless if you're booting into a GUI, so let's disable them:

vi /etc/inittab

You'll see a bunch of lines like:

1:2345:respawn:/sbin/mingetty tty1

Comment out the lines that begin with 3 through 6 (keep two as an in case). When you next reboot, you'll only have two gettys running and have freed up a tiny bit of RAM. Hey, tiny, but you weren't using it, right?

Clear out those services

Yes, I'll tell you to do it, too. But, I'll tell you which you can prolly lose:

apmd - The Advanced Power Management daemon. If you don't have a laptop, you can more than likely kill this one.

gpm - Essentially the console mouse driver. In a GUI all the time? Kill it, you won't be using it.

identd - Used for IRC to identify you. Some IRC servers require you to run one. If you don't IRC, say buh-bye!

ip6tables & iptables - Ip6tables can go away if you're not using IPv6. Iptables should only be disabled if your machine is behind a firewall. Even then, you should consider keeping it, but that's your call. This is your machine's firewall.

isdn - Useless if you don't have an ISDN line.

nfs & portmapper - if you're not connecting to NFS file shares somewhere, this can go.

Sloppy with your swappi

The 2.6 kernel gave us the ability to determine how likely things will be swapped to disk, rather than kept in main memory. The kernel does these calculations constantly and there is no cut-and-dried guidelines for setting this. Default is 60, but you can set any value you want betweeen 0 and 100. I use 30 and find that works fine for me on the few occasions I might need to swap (my machine has a gig of RAM, and rarely uses more than that). 20 is a good number if you've got a laptop and want to force the kernel to swap only when absolutely necessary. This is good for those slow-ass laptop harddrives.

Two ways you can set this:

echo 30 >/proc/sys/vm/swappiness

This is a temporary method and lasts until your next reboot. But, you can do a lot of testing by modifying the swappiness on the fly and then determining what works best for you regularly. You can use "free -m" to view your RAM and swap usage at any time. At that point you can....

Most distros use /etc/sysctl.conf to control things like this. Set vm.swappiness = 30 in that file and it'll follow with each reboot.

More to come.

Penguin Performance

If you're not that familiar with Linux, I'm going to break silence and let you in on a little secret: all those claims of higher performance are bunk. They ARE true...under certain circumstances, such as you're not running a GUI and you don't run any services and pretty much don't log in. Yeah, I'm exaggerating a little bit, but only a little. :)

Here's the top three responses you'll get if you ask how to tweak the performance on a Linux box so that it's close to the speed of a Windows machine:

1. Disable any unecessary services. Thanks, that's probably the most useless suggestion ever. Here's the deal, if a person doesn't know enough to do this first, they're probably not going to know enough to know which services are unecessary. Hell, I've been working with Linux for 12 years, and when I use a mainstream distro, I sometimes come across things whose purpose is unknown to me.

2. Switch to a "lighter" GUI (or, my favorite: don't use X at all. Neandertals). For those coming from the Windows world, let me explain how GUIs work on the Unix side of the world:

At the most basic level, you have the underlying OS which is made up of the kernel and some supporting utilities. You're going to run into morons who tell you "Linux isn't an OS, it's a kernel". Well, a kernel's pretty useless on its own, so we can glob those necessary things together and make us up an OS. Linux is now known to be the whole OS, not just the kernel. It's called the evolution of language. Evolve or die.

Sorry, I like to rant. Okay, on top of the base OS, you have the X Windows system. The former one used mostly on Linux was XFree86. In recent years, most distros have switched over to X.org's version. Some did it for licensing reasons, others did it because XFree's development team is made up of an arrogant bunch of bastards who'd rather spend more time complaining that no one follows their arbitrary rules than actually producing code. In the Windows world, X doesn't have an immediate twin aside from probably the GDI subsystems. Basically, X is what puts the GUI on your screen. You know how you log into your Windows box, and for half a second you only see a green screen and a mouse cursor? That's the Windows subsystem that corresponds to X.

On top of X, you need a window manager. This is the subsystem that actually draws windows on the screen and all their widgets and stuff. It's responsible for how things LOOK to the end user. X buys the canvas, the wm draws on it. In the Windows world, despite what most *nix people think, there IS a corresponding component: Explorer (not INTERNET Explorer, basic Explorer. The one that shows you where your files are. It's not a 100% equal comparison, but Explorer does handle a very large portion of what a wm does. Like *nix, you can replace your explorer with another (like LiteStep) and change the way your interface works. One of the big problems with Linux is the hugemongous number of wms out there. I think last count had it at just under 200. Yes, I said it's a problem. Too much choice makes decisions impossible.

Finally, in some cases, you have a "desktop environment". These include KDE & Gnome. DEs extend beyond the window manager to include things like: how drag & drop works between programs, how menus are created and maintained, interprogram communications and compatibility and so on. Without a DE, programs work about as well together as most groups of people...sometimes they'll be compatible enough, but most days you'll wonder why you bothered coming in.

It's these DEs that most *nixers have a problem with. Because they've lived without a modern OS for so long, they don't realize what they're missing out on. The console was good enough for their grandfathers, it's good enough for them. Anyone who complains about anything being "bloated" probably drives around in a 1965 Dodge Dart 'cause "all that technology makes the car too complex. Who needs anti-lock brakes, fuel efficiency, safety or FM radios? That's all useless fluff!" So, they'll tell you to at least switch to a lighter wm like Afterstep or WindowMaker or RatPoison. These have low memory footprints, and without the GUI taking up as much RAM, your computer seems to be faster. Kinda like stripping off the outer shell of a car to speed it up. It'll be faster, sure, but not much fun to drive in the rain.

Besides, I want my computer to work at least as well as it does in Windows, so that means I need all of the same kinds of features AND I want it to be as fast! Suggestion #2 is useless in those circumstances. (Keeping in mind that I like to use WindowMaker anyway, but that's not the point).

3. Use hdparm and tweak your harddrive settings. Almost a good suggestion. The problem again comes when you're using a modern distro. The kernel will generally setup your drives and controllers for maximum performance. In the last 3-4 years, I can't remember a single machine I've installed Linux on that "tweaking" with hdparm made any difference. The tweaks are implemented already.

So, that being said, I'm going to give you some real tweaks that should help some and make a difference for you. They'll be found in this article because this one's getting a bit long. :)

Tuesday, December 20, 2005

FC4 almost makes itself an enemy!

Installed Fedora Core 4, and it made my system unbootable. Turns out it's an issue that's been around since at least FC2 where the CD does some funky stuff on drives 120+ Gigs. It seems the installer doesn't understand LBA and it writes the partition table with the wrong geometry screwing everything up. Unfortunately, the only way to fix it was to wipe the MBR and remove grub. Even more unfortunately, the old way of fixing the MBR (fdisk /mbr) has been removed in XP and you have to boot to the recovery console. Since I was too stupid to install it on this one machine (as opposed to EVERY OTHER XP/2003 MACHINE I'VE EVER BUILT), I had to scramble to dig up a bootable one. Hint: this is really the only way to fix it. I tried numerous utils I found that claimed to repair MBRs, but they all required you to have backed them up in the past...

After getting my system working again, I repartitioned it, but I used XP's Disk Management to do so. I created blank partitions, and then let FC4 format and install onto them. This time it worked right. I also installed Grub into the MBR of the FC4 partition, rather than the root of the drive (I wasn't going through THAT again!). I used bootpart to add it to the XP boot menu. This is a lot cleaner solution than it was in the past with Lilo. Lilo wrote a new sector with each kernel change, Grub writes it once. Booting is now perfect.

The next step was to start following the Fedora Core-Myth Howto, which included updating the system to the latest. The first time I ran yum update, I had about 1700 packages that needed updating, but it kept failing transaction tests due to some weird conflict with KDE's Polish translations (I'll leave the jokes to others). Finally, I got around my issues by doing yum --exclude=kde-il8n* update. Everything necessary updated, and when I ran it again, it updated any remainders. Yay.

My goal from that point was to get this system up to the point that it was doing everything that I used it for on a daily basis. It only had to operate, for now, at the same level of functionality at least. New features could be added later.

So, to that end, I needed at least:

VMWare to install my Windows 2003/Exchange 2003 environment into.
Apache acting as a reverse proxy. This is so I can use OWA and a few other web-based apps from work.
HomeSeer, my home automation system. This has to go into the VM with Exchange, as it's Windows-based.
Some kind of TV-viewing app. I never got around to doing the PVR thing on the Windows side, so I just need something that can show the output of my cable box. I setup MythTV, and got it mostly working, but I'm having some issues with audio sync. For the time being, I just use XawTV. Close enough.

Aside from HomeSeer, I'm good to go. I setup VNC as a second session on the server, so I can do my work an not disturb LMC while she watches TV. Last night, I was using her laptop to work while sitting on the couch. She looked over and saw the VNC session I had running full screen and says, "WHAT DID YOU DO?!" hehehe, it's not your screen, Babe. :)

LFS is dead to me

Boy, been a long time since I posted here, too! Followers from my old site know about my personal side taking up so much time recently, but on the technology side, I've been keeping myself busy, too.

Since October-ish, I've been working to build a new system based on LFS 6.1. I decided to use nALFS to automate the task, and built new profiles with each build. Of course, the fact that it took me a month of fiddling to get glibc to even build doesn't count, right? :)

I finally decided I'd had enough and that I was going to convert my primary media server over from XP to Linux, so I went with Fedora Core 4. I have to say, I've been very impressed so far. Easily, Linux has come a long way to be a viable desktop OS. In fact, I'd even go so far as to say I think I could use this on a daily basis. Time will tell on that one, for sure. hehehe

I guess from this point on I'll document how things are progressing through the build, as well as any fixes and solutions I found to problems encountered. Hopefully, they'll be useful to other folks, too.

Monday, March 7, 2005

Freshmeat for LFS

As you'll probably learn from reading this blog, I don't use commercial Linux distributions like RedHat or Suse. About 4-5 years ago, I ran accross Linux From Scratch, and have been doing it that way ever since. The drawback to using LFS is it follows the few standards Linux has such as the Linux Standard Base and the
Filesystem Hierarchy Standard. Now, some distros follow this, and some don't. Some follow it a little, some follow it a little less. Thus, my problem: I never know where anything is! On top of that, these standards are pretty limited once you've gone past the basic level that you get once you've finished a LFS distro. Where does your distro store Apache's httpd.conf? Look at 10 different distros, I'm sure you'll get 10 different answers.

Anyway, this isn't a bash against the big boys, this is a solution to one problem LFS does suffer from: package management. When there's a security problem with zlib, for example, how do you know to upgrade it? With the big boys, they usually have some kind of auto-update feature, but LFS does not. You've compiled it all from scratch, how do you keep track of these packages?

Well, firstly, I use Checkinstall firstly for maintaining some level of control over packages. Checkinstall basically watches when I run the final "make install" from my source-tarball-based installation. It gives me a nice report, and I can use that to delete entire packages if I need to. Very useful.

But, how to keep up with packages that need updating? Simple, when you add it, subscribe to it at Freshmeat! With Freshmeat's subscription feature, you can get alerts when packages are updated. Just go in, subscribe to all of the base packages from LFS, and then just subscribe to all those you add afterwards. You can even setup categories to organize them a little better (although, FM's catagory management for endusers is kinda limited). I have an "LFS" category for the base packages, a "MyLFS" category for all of the packages that I install regardless of what else is going on the machine (IPTables, XFS tools, SSH, syslog-ng, etc), a BLFS category for the BLFS book. I also have one for each purpose, for example, I have a "Multimedia" category to keep track of the stuff for my MythTV machine, and a Firewall category for things like ulogd which replaces syslogd on the firewalls. Now, all I need to do is check my mail each day, which I'm kinda in the habit of doing already, and if there's an update, I'll know about it!

Yes, FM is usually updated a little behind for relying on it for security, but I can live with this solution.

Let my data come!

Data, data, data...so very much data. All of it stored in incompatible formats, all of it accessible only to the software that originally created the file that contains the data. Unless you spend a lot of time creating some kind of converter script or program, your data is trapped.
The open source community for years has been telling us this. They've said things like "Why do you continue to use Microsoft Office? Your data is trapped in proprietary formats that nothing else can read 100%!" Too right! But, when you ask "Where are the alternatives?" you get replies about OpenOffice or AbiWord or a few other F/OSS office suites.
Alrighty, we've got those covered, but what about all the other data? Uh, other data? Yeah, for example, I'm playing around with a couple of PHP apps on my Apache server. They could both be considered "address books". One is actually an address book/contact manager app, the other a customer database which is part of another app suite. Both of these apps would be good together, but in order to use them, I have to create two separate databases because app one names its fields "fname, lname, street, etc" and app2 uses "first, last, homeaddress, etc". Both contain the exact same data, but the formats are different enough. Yes, I could theoretically write a conversion script, or change the way one of the apps works so that I can access one database. But, why bother?
"What is it you propose, then?" Glad you asked...enter libDBT (a working name, stands for Database Template). libDBT is a general purpose library that interfaces to general purpose databases. What I propose is a set of specifications for various types of databases which define element names, database names, etc so that these islands of data become more standardized. Let's face it, aside from the occasional proprietary additions, an address book is going to contain 95% of the same data as any other address book. An address book (AB from here on in, 'cause I'm lazy) database will typically have fields such as:
First Name
Last Name
Home Street Address
Home City
Home Zip
Work Street Address
Work City
Work Zip


The list could go on, but you get the idea. Okay, so what we do is part of the libDBT specification, we capture as many of the "standard" fields as possible. These are the fields that no matter what you're using an AB for, it's going to need these basics. In this way, a developer who wants to create an AB application can just query during install, either through the configure script or installer input, if there's an AB already on the default database for the system (more on that later). If there is one, the installer can ask "Use this one or create new?" This is useful, for example, you want to separate employee information from customer data.
"But, if you're going to separate data in that way, what's the point?" The point is, your customer data is universally accessible to any and all AB-like applications on your system. Your ERP system, your PIM, your e-mail program, whatever! Your libDBT-compatible app can recognize that there's more than one libDBT-compatible database available and ask if you want to use them all, some or none.
Let's extend further. The one problem some people are thinking about right now is that of extension. "My AB needs more than just some basic fields!" Too right, and I'm not surprised. But, you're going to use the same ones that are already in there, right? So, just extend the database schema to include your data as well. See, the specification I have in mind says, "Ignore any data or records or information that is not yours! Treat any fields your not using, regardless of if they're part of the spec or not, as NULL values."
Now the hard part comes in when we have collisions...my app needs to add an "Mother's Maiden Name" field and I want to call it ""MMN", but that's some other app's "My Mammy's Nanny". What do you do, what DOO you do? Simple, don't name it that. There's two ways to get around this...as part of the whole libDBT project, we can have a registry of additional fields which tells which apps use the fields, and what they're used for. This is probably the best route to go as it means that if I want to add a "Mother's Maiden Name" field, I don't have to duplicate the effort, I can just use the registered name and it's available to other apps. The other is to come up with some standard for naming such as "MyApp_MMN". I'm not a big fan of this method, though because it locks that data into "MyApp".
Now, as with anything this powerful, we need a way to configure it. Let's say I need the AB capability, but don't want the ability to organize MP3s (see later). Well, each template (which is really just the schema) will need to also be accompanied by a plugin for libDBT describing calls.
Calls? Oh, didn't I mention? LibDBT is more than just a bunch of templates to make your life and data access easier! It's a system library that gives you full access to those databases without writing a ton of code to do so! Such as, let's say you want to create a new record in the default AB. In the old paradigm, you'd have to code up how to interact with that database, where it's stored, what kind of info goes into which record, etc. Wouldn't this be nicer?
libDBT_Create_New(AB,, The Right Reverand, J.R., Dobbs,,,,,"Bob")
But, what is all that? Well, it's stupid pseudocode representing how to create a new record. It's telling libDBT "create a new record in the default system address book. The name is J.R. Dobbs, his title is The Right Reverand and nickname is "Bob". Simple? Of course, why make things harder on ourselves than we need to be?
Okay, but what's all this stuff about "default database"? Well, once we do away with myriad databases on a system each holding the same data, we'll only need one database on a machine to hold it! For example, I've got OpenLDAP running on my machine, and it's holding some AB-esq data in a GMDB database. I've also got MySQL running, also storing some AB data. God, I wish I could have both of those available in the same database! With libDBT, you could. You could use MySQL, PostgreSQL, MSSQL, Oracle, GMDB, XML or whatever! LibDBT just defines the schema the data is stored under, it doesn't specify anything else! So, we just need a single conf file, say /etc/libdbt.conf. Something like:

# Begin /etc/libdbt.conf

[databases]
default = mysql-master
customers = mysql-cust

[mysql-master]
type=mysql
host=localhost
port=3386
user=root
password=easilycracked

[mysql-cust]
type=mysql
host=localhost
port=3386
user=root
password=easilycracked
table=customers

# End /etc/libdbt.conf

So, let's say you wanted to add The Right Reverand J.R. "Bob" Dobbs as a customer, then you only need to change to, say:

libDBT_Create_New(AB,customers, The Right Reverand, J.R., Dobbs,,,,,"Bob")

Like I said, this would have to be extensible with plugins, and I can picture a few simple databases that I've seen that could benefit from this:

MP3 organizers: they typically store Songname, artistname, albumname, bitrate, etc. There's got to be hundreds of these out there, all with incompatible data!

Photo Organizers: Store EXIF data, if available. Also fields for captions, place taken, people in the picture. Wouldn't these be nice?

LDAP: an extension of the AB concept. Let's face it, at its heart, LDAP is an address book. In fact, that's what it's normally used for. In a lot of cases, it's an authentication mechanism, but that's just an address book that stores a password, too. Why not a simple daemon, ldapdb, that acts as a translator between LDAP-using apps and a database backend? Send it a query to find "uid=tkarakashian, ou=people, c=us" and it sends a SQL query to the database to return the info that's in the employee database. Wouldn't that be nice? Seems to me it would be a relatively system to implement.
Okay, that's my brain dump of my idea. Below this you'll find a Comments section that uses a non-standard database to store them. Please ignore that in light of this new information and let me know if I'm completely insane or not.

Thursday, February 17, 2005

Big freaking surprise

Big news over at Linux-land...er, Slashdot today. "Study Finds Windows More Secure Than Linux". The summaries of the results are pretty good, as these two appear to have done a more fair comparison than I've seen in the past. For example, IIS isn't a webserver, it's an application server and does more than just serve static webpages. To compare IIS to Apache side-by-side is like comparing a Hummer to Yugo. One will get you back and forth to work, the other will get back and forth to work if you have to pass over the rockies, through some rivers and mow down any deer on the way. To make a fair and reasonable comparison, you need to add in a couple of scripting languages to Apache, as well as enable a lot of extra modules. You then need to take into account the security holes in those as well!

Well, anyway, this is news to me...not. For some reason, Apache's been getting a lot of abuse on this blog this week. Not my intention, but it's just worked out that way. Let's be clear, this study shows that IIS is more secure than Apache, and isn't a Linux vs. Windows article. Since so many people have enough trouble with facts, I like to clear up the easy ones in advance. :)

People have been comparing Apache to IIS for ages. For ages, I've been saying IIS is as secure, if not more so than Apache, if configured by a competent administrator! The problem is, IIS "out of the box" is no where near as secure as Apache is out of the box. In fact, even I wouldn't presume to call an out of box IIS secure in any way.

And this is where the confusion sets in, because *nix guys don't know how to secure a Windows box. They just assume it's not, and don't even try. Don't believe me? Ask a *nix guy "How do you secure a Windows box?" They'll always give you an answer similar to "Unplug it from the power outlet" or "Throw it over a cliff". When you press them for a real answer, they'll always say it's not possible. Press them further with "Can you do ANYTHING to secure the box at all?" and they'll usually tell you no. Oddly enough, they DO know how to secure a Windows box, they just don't know the exact procedures. Securing any system includes some basics that any competent admin should know. Security "Best Practices". Some of the basics...

Rule #1: Don't run services you don't need. Every extra service you have installed on a box above and beyond what's necessary for that box to perform its function is a point of failure. Turn 'em off.

Rule #2: Don't use known default configurations. By default, IIS is in C:\Inetpub, Apache in /var/www. Move them.

Rule #3: Secure the filesystems. Don't allow a service to write to your hard drive, unless it absolutely has to.

Rule #4: Use non-privledged accounts for services. On my box, Apache runs as the user apache, and it has write access to one folder on the entire hard drive, a folder required for one of the PHP scripts used. Unfortunately, this isn't as easy on IIS as it is on Apache, but if it's the only practice you miss, it's not as bad as it could be.

I've paraphrased these for the non-technical, and this sure isn't all of them, but they're the most essential, so we'll start with them. We'll use CodeRed as an example (forgetting for the moment that CodeRed also affected some Unix machines as well...) CodeRed was a worm that utilized an exploit in the Internet Printing Protocol in IIS. There isn't anyone who uses IPP, but it's installed by default in IIS. A bad thing right? Nope, let's look at how a machine could get infected with CodeRed...

First, you have to leave the IPP installed. Didn't we just talk about running services you don't need? If you have IPP installed, you're breaking the first cardinal rule of security! No one uses it, so why do you have it? The obvious retort is always, "Well, they don't make it easy to know what's necessary and what's not!" Who's they? Microsoft? Isn't it your job to know, regardless of availability of information? Yeah, I thought it was.

Next, when CodeRed infects a machine, it stores itself in the Scripts virtual directory on an IIS machine. Um, Scripts? You mean that well-known virtual directory that's installed by default and no one uses? Why was it there in the first place. Doesn't that violate Rule #2? Tsk, tsk!

Oh, wait! "It stores itself in the"? So, you mean your webserver got hit with an exploit, and it wrote an infected file to the filesystem? If we followed Rule #3, this wouldn't have happened, would it? It's a worm, not a hacker. It knows to try a couple of default things, and then just fails if they don't exist.

And, with rules 1-3 in place, rule #4 isn't an issue...

Whose fault is it if you got infected now? A little more humble, aren't we loyal penguinhead? You violated three of the top four security practices, and it's someone else's fault that you got infected. I know, Microsoft should secure these things out of the box. After all Apache does, right? But, why? At the end of the day, it's not Microsoft that is setting up these services, it's me. I'm the only one who knows what things I'm going to need, and how I'm going to use the software. That means it's up to ME as administrator to make sure the machine is secure, and no one else.

Anyone who tells you differently is bullshitting you to get you to believe it's not their fault.

Wednesday, February 16, 2005

Soli-calendar

So, I'm walking past someone's cube just now and I happened to glance at their screen. They were looking at their calendar, but with just a quick glance, it almost looked like they were playing solitaire. That gave me an idea: a solitaire game, but instead of cards, you would have what looked like calendar entries. You could shuffle them around, and if someone looked, they'd think you were busy working (instead of posting to your blog!) You could put the red staff meetings on the black status meetings. All hands meetings are the aces.

Man, I really need a life....

Friday, February 11, 2005

Apache docs suck

I've said it before, and received a ton of crap for it....Apache's docs suck.

Consider this...you go to buy a new car. You find the one you want, and proceed to make the purchase. The salesman then explains that what you're going to do next is head to law school so that you can pass the bar. You see, the sales contract for a new car is very complex, and not something to be taken lightly. So, when you're done with law school in four years, come back and we'll do the deal. So, you do it. You pass the bar and you come back to buy your car. Of course, it's been four years, so you have to buy a different car. You manage, however, to get through the sale and are happy with your shiny new car.

A few months down the line your "Check Engine" light comes on. You read through the manual and it tells you that this could be anything! It's the ultimate dummy light, as it literally represents hundreds of possibilities of what could be wrong. You call the dealer, and ask for service. A heavy sigh comes from the other side of the phone as the mechanic gives the stage whisper, "Newbies". He explains that, yes, the "Check Engine" light could be indicitive of anything, and there's no way he can tell you what's wrong without some information first. So, he asks you to disassemble your car, and give him the serial and part numbers engraved on every component that makes it up. When you explain this is too much work, will he please just tell you how to fix it again comes the heavy sigh followed by, "If you wanted other people to fix all of your problems for you, why don't you just take the bus?!" So, it's off to school again for you as you begin your studies as a mechanic. After a couple of years, you feel confident enough to disassemble your car, but now you also know how to fix the problem, too! Yaay, no more dealing with that gruff mechanic! A feeling of accomplishment flows over you as you realize he wasn't just being gruff. He wanted you to get to that point of satisfaction of doing the job yourself. What a wonderful fellow he was!

Bullshit.

Apache docs pretty much tell you, "Apache is a webserver" followed by "here's all of the hundreds of possible commands you could put in your config file to make it work. We've specifically avoided telling you which ones are the barest essentials to get the system up and running if all you want is a small webserver to play around with something. Good luck."

Oh, yes, you're right, Apache DOES come with a default configuration file that does provide you with a minimalistic server that works. However, that file is 1086 lines long! Yes, there are comments above each set of commands that make the file that long, and if you remove them, the file is still over 300 lines long. My point? When I setup my first Apache server, after scouring the web for weeks for hints and such, my configuration file, for a basic setup, was 15 lines long. I had the barest of essentials: I could point a browser at it, and it would show web pages. Again, yes, there are HOWTOs. They suck, too. Don't say it, I'm already writing a good one.

But, what's even better is they're wrong in spots. For example, when compiling Apache from scratch, you use a configure script like most GNU software. Let's say you want to install all of the DSOs that come with the standard distribution, just so you have them. I hate having to recompile because I forgot to turn on some function of the software. To do this, the docs say use something like "./configure --enable-shared-dso=all". Makes sense, you want all the DSOs, you should prolly say "all". Problem is, "all" isn't "all". If you want, for example, proxy support, you have to tell it that specifically. Apparently "all" means, "all that we think you need, not all there is".

It gets better, using "--enable-shared-dso=all --enable-shared-dso=proxy" doesn't work, either. Here's the undocumented command: ./configure --enable-shared-dso="all proxy http_proxy etcDSO". Note the quotes which aren't noted in the docs. In all fairness, it's been a while since I graduated from Apache mechanic school, but it wasn't that long ago that this stuff wasn't well documented. I don't know if they've fixed it, and I don't care. Apache's been around for a long time, and I'm sure I'm not the first one who encountered this. In fact, when I asked, I was told, "here comes this old chestnut again!"

Here's a thought, boys: FIX YOUR DOCS!!!

You're not going to pay a lot for this auction!

I like eBay. I've actually been using it since its original URL was something like ebay.surf.net. Anyone else remember that? I don't use it that often, but when I do, I like to get a good deal. One way to do this is use auction sniping software. The basic idea is to put your only bid in the last 15 seconds of an auction, thus removing anyone's chance of upping the bid. A lot of times, people only increment the bid $1 or two to see how high it'll go. If you can get in before anyone else has a chance to respond, you can swoop in and pick up stuff really cheaply.

My favorite sniper is still the oldest: jBidWatcher. It's a Java app so that means it runs on prett much every OS (and it also means it's really SLOW! LOL!). There's even an app bundle for OSX so it looks and runs just like a native app. jBW can sync with your MyEbay list and watch all of your auctions. My favorite feature, and one I haven't seen in any of the other free watchers (did I mention all of this functionality is FREE?!) is the multi-snipe. To multi-snipe, you select a bunch of auctions in your watch window, right-click and choose "snipe". The software sees you selected multiple items, so it enables the multi-snipe. You put in the maximum bid you want to pay for these items, and the jBW does its thing. It sits there and waits. Patiently. When the first of these auctions is close to the end, it snipes! Ahhh! Scary! If your snipe bid was successful and you won, jBW forgets about the other items in the multi-snipe. If you lost, it'll wait for the next auction to come to and end and try again.

I've been looking to get a second processor for my desktop, and they've been running about $42 before shipping. The other day before leaving for work, I started jBW and selected 4 auctions for the processor. I put my max bid at $45 (some were going as high as $60-70) and left. Later in the day, I got an e-mail that I'd won the auction at $34! Not too shabby!

This really is a great piece of software, and if you buy a lot on eBay, you owe it to yourself and your wallet to just try it a few times.

Replacing it all

As mentioned in an earlier post, my current project has me revamping the home network. Everything that's in place has served me well, but it's time for me to get some more functionality out of everything, as well as cut down on the number of computers I have running 24/7. Considering my previous employer is also my electric company, I want them to get as little of my money as I can! Maybe not so much out of spite, but mostly 'cause I know pretty much how much money they waste every month, and dammit, I KNOW my bills could be HALF what they are now.

But, I digress...

Let's get into it. Currently, I have four computers running pretty much 24/7 at home. "Four computers for one guy, you say? Are you mad?!" Nope, just a dork. :) But, as you'll see, they each have a purpose. Well, most do:

Twoface
This is my firewall/router. (As you'll see, my network has a Batman theme. Twoface seemed appropriate for a name since it's dual-homed on my network and the Internet. A good face and a bad face. :) It's a Celeron 600 w/ 128M of RAM and a pair of hard drives, each 4G. Yup, 4G. It runs Linux. Specifically, it runs Linux From Scratch version 5.1. LFS is, by its nature, a small "distribution" as the only things installed are what you compile from scratch. I believe the total size on disk for this machine is less than 500M. I have two disks in there 'cause I like to try newer versions, and when I first set it up, I only gave it one partition of the 4G, and, really, what am I going to do with the second 4G drive?

It does a little more than just act as firewall/router, though:

Apache 2.0.48: This is setup as a reverse proxy only. My primary server, Alfred, runs IIS and has a couple of web-enable apps running. But they use their own servers, rather than the IIS on the machine. Using Apache this way, I have one external URL, but it delivers content from multiple machines. For example, if you go to www.mydomain.com, you'll get the default website on IIS running on Alfred. If you go to www.mydomain.com/homeseer, you'll get the HomeSeer web interface running on port 8000 on Alfred. www.mydomain.com/wireless gives you the web interface from my wireless router, which is obviously a totally different machine. All URLS have been changed to protect the innocent. Me. :)

Samba 2.0.something: acts as the PDC for my "domain". That's pretty much the only reason I run it.

ASSP & PopFile: provide spam filtering. I don't give out my "real" e-mail address liberally, so I really don't get a lot of spam, but I try to cut down on what little I do get. ASSP provides spam protecting on my incoming SMTP (I use no-ip.com for dynamic DNS services, more on that in another article). PopFile acts as a proxy between internal POP3 clients and outside mailboxes. I've got POP3 boxes for other things, and this "protects" them, too.

There's also a couple other things I can't think of at the moment, such as Bind for DNS.

Alfred
Alfred was originally supposed to be my "media server" (It's a server. Alfred. Server. Get it?) It's grown into being a bit more, but its original purpose still remains. It's a Dual PII-333 w/ 192M of RAM. Keep that in mind when you read what's running on it...

Windows 2000 Advanced Server: I got this soon after W2K was RTMed back in 1999. I could only get a copy of AS, so that's what I use. And, yes, it was installed over 5 years ago and hasn't been reinstalled. I hate doing that.

ShowShifter: The primary purpose of the machine. This all-in-one interface allows me to watch TV, video files, DVDs and play MP3s and audio CDs. It also acts like a PVR, but I'm on digital cable, so it hasn't seen much use in that regard. Development of SS has been exceedingly slow, and replacing it is one of the main reasons for the revamp.

HomeSeer: This is the best home automation software on the market. It's fairly feature-complete, but highly extensible via scripts and plug-ins makeing it amazingly flexible and infinitely useful. Despite the fact that this machine will be converted to Linux, I have plans to keep this around.

IIS: My primary webserver. Pretty much all of the apps I have on it, though are written in PHP. Not really much reason to use IIS then.

MySQL: Pretty good F/OSS database software. Great for home use like I use it, but its lack of mature transaction support and stored procedures make it a poor choice for the enterprise. I'll be keeping it around, though.

Subversion: For version control. I've begun playing around with it for maintaining my data on multiple machines. That's a story for another article, though.

Batman
My desktop. Already pretty much replaced, but here for historical reference. It's a PIII-1G w/ 256M of RAM running XP Pro. The guts of this machine will be used to build my new server. The mobo supports dual CPUs, but I never got around to getting one, until this week. I've also got another 512M of RAM to go into it. It's going to need it!

Batmobile
My new machine. It's a 12" Apple Powerbook G4 1.3Ghz with 512M of RAM, 80G hard drive and SuperDrive. Runs OSX 10.3.8. I bought it about six months ago...actually, to be more precise, I bought it the day before I was let go from my previous job. Fortunately, I had 6 months no payments no interest! :) I've finally gotten down to using it as my primary machine. It ain't been easy going doing the switch, but I've gotten used to some of the quirks and have plans for workaround for some of the remaining. On the whole, though, I've been happy with it. I like OSX a lot, but pretty much because the FreeBSD-based undersystem allows me to run pretty much all of the F/OSS software available without having to deal with Linux as a desktop. Tried it, hated it.

So, that tells you where I am today. Some time soon, I'll be telling you where I'm going, and why. I've got some wicked weird ideas for the future! ;-)


Apache a day....

I've always been a fan of IIS....no, it's true. It has a nasty reputation for being insecure, and in some ways, it's not unearned. Problem is, it's easy to secure, it just don't come that way out of the box! Any admin that tells you IIS ain't secure has no idea what they're doing...especially if they went and put an IIS box on the web! If you don't think it's secure, why are you using it??

Anywho, I'll get to the details later, but I'm in the process of revamping the makeup of my home network. One of the phases of the project has me kind of replacing IIS with Apache. Mainly because most of the apps I run on my IIS server are PHP, so it doesn't make a whole lot of sense to not use Apache...

Anyway, I found this site with some fella who've written some really cools mods for Apache at Tangent.

Immediately on reading their site, I saw so many uses:

mod_mp3: turns the Apache Web server into an MP3 or Ogg streaming server. Fantastic! This server's going to be my "media server" and house all of my MP3s anyway! This'll give me the ability to listen to my collection at work where I only have outgoing access to port 80! Nifty!

mod_layout: provides both a Footer and Header directive to automagically include output from other URIs at the beginning and ending of a Web page. Brilliant! The main reason I run a webserver is to give me access to some of my data that I'd like available everywhere, such as phprecipebook and SiteBar. Now, I can create a "master" page with an iframe generated automatically with links to all of these tools so I can switch back and forth with ease. I had planned on doing this manually, but this'll make life easier!

mod_trigger: gives you hooks into each Apache request to launch triggers if certain actions occur. Great! I run a webmail app, I can have Apache send me a mail or some other alert if someone tries to access it. I can write a script that'll exclude domains I normally come from (like work) to cut down on alerts.

MyXML: an UDF extension to the MySQL database. I can't think of a specific use for this at the moment, but I've had a couple of incidents where it would have been nice to be able to generate XML from a database without all the typical work involved.

Very nice stuff, really. Anyone got any other favorite Apache mods they wanna share?