Monday, March 7, 2005

Let my data come!

Data, data, data...so very much data. All of it stored in incompatible formats, all of it accessible only to the software that originally created the file that contains the data. Unless you spend a lot of time creating some kind of converter script or program, your data is trapped.
The open source community for years has been telling us this. They've said things like "Why do you continue to use Microsoft Office? Your data is trapped in proprietary formats that nothing else can read 100%!" Too right! But, when you ask "Where are the alternatives?" you get replies about OpenOffice or AbiWord or a few other F/OSS office suites.
Alrighty, we've got those covered, but what about all the other data? Uh, other data? Yeah, for example, I'm playing around with a couple of PHP apps on my Apache server. They could both be considered "address books". One is actually an address book/contact manager app, the other a customer database which is part of another app suite. Both of these apps would be good together, but in order to use them, I have to create two separate databases because app one names its fields "fname, lname, street, etc" and app2 uses "first, last, homeaddress, etc". Both contain the exact same data, but the formats are different enough. Yes, I could theoretically write a conversion script, or change the way one of the apps works so that I can access one database. But, why bother?
"What is it you propose, then?" Glad you asked...enter libDBT (a working name, stands for Database Template). libDBT is a general purpose library that interfaces to general purpose databases. What I propose is a set of specifications for various types of databases which define element names, database names, etc so that these islands of data become more standardized. Let's face it, aside from the occasional proprietary additions, an address book is going to contain 95% of the same data as any other address book. An address book (AB from here on in, 'cause I'm lazy) database will typically have fields such as:
First Name
Last Name
Home Street Address
Home City
Home Zip
Work Street Address
Work City
Work Zip


The list could go on, but you get the idea. Okay, so what we do is part of the libDBT specification, we capture as many of the "standard" fields as possible. These are the fields that no matter what you're using an AB for, it's going to need these basics. In this way, a developer who wants to create an AB application can just query during install, either through the configure script or installer input, if there's an AB already on the default database for the system (more on that later). If there is one, the installer can ask "Use this one or create new?" This is useful, for example, you want to separate employee information from customer data.
"But, if you're going to separate data in that way, what's the point?" The point is, your customer data is universally accessible to any and all AB-like applications on your system. Your ERP system, your PIM, your e-mail program, whatever! Your libDBT-compatible app can recognize that there's more than one libDBT-compatible database available and ask if you want to use them all, some or none.
Let's extend further. The one problem some people are thinking about right now is that of extension. "My AB needs more than just some basic fields!" Too right, and I'm not surprised. But, you're going to use the same ones that are already in there, right? So, just extend the database schema to include your data as well. See, the specification I have in mind says, "Ignore any data or records or information that is not yours! Treat any fields your not using, regardless of if they're part of the spec or not, as NULL values."
Now the hard part comes in when we have collisions...my app needs to add an "Mother's Maiden Name" field and I want to call it ""MMN", but that's some other app's "My Mammy's Nanny". What do you do, what DOO you do? Simple, don't name it that. There's two ways to get around this...as part of the whole libDBT project, we can have a registry of additional fields which tells which apps use the fields, and what they're used for. This is probably the best route to go as it means that if I want to add a "Mother's Maiden Name" field, I don't have to duplicate the effort, I can just use the registered name and it's available to other apps. The other is to come up with some standard for naming such as "MyApp_MMN". I'm not a big fan of this method, though because it locks that data into "MyApp".
Now, as with anything this powerful, we need a way to configure it. Let's say I need the AB capability, but don't want the ability to organize MP3s (see later). Well, each template (which is really just the schema) will need to also be accompanied by a plugin for libDBT describing calls.
Calls? Oh, didn't I mention? LibDBT is more than just a bunch of templates to make your life and data access easier! It's a system library that gives you full access to those databases without writing a ton of code to do so! Such as, let's say you want to create a new record in the default AB. In the old paradigm, you'd have to code up how to interact with that database, where it's stored, what kind of info goes into which record, etc. Wouldn't this be nicer?
libDBT_Create_New(AB,, The Right Reverand, J.R., Dobbs,,,,,"Bob")
But, what is all that? Well, it's stupid pseudocode representing how to create a new record. It's telling libDBT "create a new record in the default system address book. The name is J.R. Dobbs, his title is The Right Reverand and nickname is "Bob". Simple? Of course, why make things harder on ourselves than we need to be?
Okay, but what's all this stuff about "default database"? Well, once we do away with myriad databases on a system each holding the same data, we'll only need one database on a machine to hold it! For example, I've got OpenLDAP running on my machine, and it's holding some AB-esq data in a GMDB database. I've also got MySQL running, also storing some AB data. God, I wish I could have both of those available in the same database! With libDBT, you could. You could use MySQL, PostgreSQL, MSSQL, Oracle, GMDB, XML or whatever! LibDBT just defines the schema the data is stored under, it doesn't specify anything else! So, we just need a single conf file, say /etc/libdbt.conf. Something like:

# Begin /etc/libdbt.conf

[databases]
default = mysql-master
customers = mysql-cust

[mysql-master]
type=mysql
host=localhost
port=3386
user=root
password=easilycracked

[mysql-cust]
type=mysql
host=localhost
port=3386
user=root
password=easilycracked
table=customers

# End /etc/libdbt.conf

So, let's say you wanted to add The Right Reverand J.R. "Bob" Dobbs as a customer, then you only need to change to, say:

libDBT_Create_New(AB,customers, The Right Reverand, J.R., Dobbs,,,,,"Bob")

Like I said, this would have to be extensible with plugins, and I can picture a few simple databases that I've seen that could benefit from this:

MP3 organizers: they typically store Songname, artistname, albumname, bitrate, etc. There's got to be hundreds of these out there, all with incompatible data!

Photo Organizers: Store EXIF data, if available. Also fields for captions, place taken, people in the picture. Wouldn't these be nice?

LDAP: an extension of the AB concept. Let's face it, at its heart, LDAP is an address book. In fact, that's what it's normally used for. In a lot of cases, it's an authentication mechanism, but that's just an address book that stores a password, too. Why not a simple daemon, ldapdb, that acts as a translator between LDAP-using apps and a database backend? Send it a query to find "uid=tkarakashian, ou=people, c=us" and it sends a SQL query to the database to return the info that's in the employee database. Wouldn't that be nice? Seems to me it would be a relatively system to implement.
Okay, that's my brain dump of my idea. Below this you'll find a Comments section that uses a non-standard database to store them. Please ignore that in light of this new information and let me know if I'm completely insane or not.

No comments:

Post a Comment