One of the biggest mistakes I run into every day is databases and spreadsheets which weren't designed for the exceptions. I get it, most of your data fits in your model. But then you run into data that doesn't fit your world view and you call me. And that's when things get ugly.
We need an example.
Let's use ZipCodes. We all know how these work right? Each city has a zip code. Here's some examples from here in Southwest Idaho:
City | ZipCode |
---|---|
Eagle | 83616 |
Kuna | 83634 |
Idaho City | 83631 |
That wasn't so hard was it? Our spreadsheet is done!
Not so fast cowboy. What do we do with Meridian? It has three Zips: 83642, 83646, 83680.
You've seen this before though and you got it taken care of don't you? That's hierarchical!
Cities | |
---|---|
CityID | City |
1 | Eagle |
2 | Kuna |
3 | Idaho City |
4 | Meridian |
(I'll add the City in parentheses for clarity.)
ZipCodes | ||
---|---|---|
ZipCodeID | ZipCode | CityID |
1 | 83616 | 1 (Eagle) |
2 | 83634 | 2 (Kuna) |
3 | 83631 | 3 (Idaho City) |
4 | 83642 | 4 (Meridian) |
5 | 83646 | 4 (Meridian) |
6 | 83680 | 4 (Meridian) |
You mad dawg! You got it. But wait, the USPS is kinkier than that. 83714 is valid in Garden City, Boise and Hidden Springs! USPS Postal Code Lookup
Now what?! Welcome to the real world where everything is miscellaneous. Hierarchies only exist in human relationships and these days with a 50%+ divorce rate that isn't even true. Nice family bush you got there.
So what to do? You got a many-to-many on your hands. It's funny how common this pattern is... I'd call it more the rule than the default, yet somehow we always want to split things into distinct categories. I think it is a relic of the filing cabinet age that'll disappear in the few years. Google won, Yahoo lost the hierarchy war. Ontology is overrated and this is the pattern behind the free tagging web and real relationships in the real world. It's time for desktop databases to do a better job of handling these relationships. But I'll leave that for another post. For now you need a new pattern. Three tables:
Cities | |
---|---|
CityID | City |
1 | Eagle |
2 | Kuna |
3 | Idaho City |
4 | Meridian |
5 | Boise |
6 | Garden City |
7 | Hidden Springs |
ZipCodes | |
---|---|
ZipCodeID | ZipCode |
1 | 83616 |
2 | 83634 |
3 | 83631 |
4 | 83642 |
5 | 83646 |
6 | 83680 |
7 | 83714 |
(I've added the City name/Zip Code in parenthesis for clarity.)
Cities_By_ZipCode | |||
---|---|---|---|
ID | CityID | ZipCode | () |
1 | 1 | 1 | (Eagle-83616) |
2 | 2 | 2 | (Kuna-83634) |
3 | 3 | 3 | (Idaho City-83631) |
4 | 4 | 4 | (Meridian-83642) |
5 | 4 | 5 | (Meridian-83646) |
6 | 4 | 6 | (Meridian-83680) |
7 | 5 | 7 | (Boise-83714) |
8 | 6 | 7 | (Garden City-83714) |
9 | 7 | 7 | (Hidden Springs-83714) |
(In reality Boise has 29 Zip Codes and Garden City has two, but let's keep it a little simpler for this explanation)
I picked Zip Codes because they are a human constructed order that's outgrown it's own hierarchy. We could have just of easily used the classic teachers to students model and then dropped in co-teachers (my Dad had that job once), or tried to decide if a tomato is a fruit or veggie, or whether hot burns like fire or jalapeño... the world is full of them. If you look deep enough it's hierarchical relationships that are really uncommon.
In my next post, Spreading the Sheets and Letting Relations In, I'll give some examples of how to work with many-to-many relationships in MS Access.
UPDATE: Above I tangentially mentioned Clay Shirky's talk Ontology is Overrated and it's worth more than that. Like all of Shirky's talks it's brilliant, but this one is one of his best. Five years old and still a great read.
Comments