On Thu, Jan 19, 2006 at 11:22:24AM -0800, Dann Corbit wrote: > > > In any of these either misspellings, changes of names, ownership or > > > even structure over time render the obvious useless as keys. There > are > > > techniques for detecting and reducing duplication but the point is > that > > > for any of these duplicates *can* be valid data. > > > > Please point me out where, in the writings of E.F. Codd or in the SQL > > Standard, it says that keys have to be immutable for the life of the > row. > > Only do that for data that you care about. If you think that the data > has no value, there is no need to have a way to identify a row.
Erm... if you don't care, why are you storing it? :) > I was a DBA for a database for a company with many millions of customers > worldwide (e.g. the product registration table was 24 GB). > > Their design had natural keys in it. It caused dozens of problems, > every single day. > > I content that most people are not smart enough to decide when a natural > key is a good idea. The engineers that designed the database were > probably pretty smart, since it sort of worked and had thousands of > tables and hundreds of millions of rows in it. But one bad decision on > a natural key will cause literally millions of dollars of damage. > > The primary defense I have heard so far is that the Oids are hard to > understand. They are nothing in comparison with understanding what to > do when you have 25 changes to primary keys on various tables every > single day. > > Once you get used to Oids, I find it hard to believe that any > intelligent person finds them confusing. Confusion resulting from > having primary keys that are a moving target? Now that's confusion for > you. Well, I wouldn't use OIDs as in the PostgreSQL OID, but I agree. If nothing else an ID gives you a fallback... if you absolutely can't find a customer (or whatever else) through natural keys, you ask them for their customer ID/number, which has no reason to ever change. BTW, if you want to see a mess*, take a look at the distributed.net stats code, which unfortunately uses email as the means to identify participants. It made perfect sense originally, anyone running the client was bound to have an email address, and they all had to be unique, right? Worked great until the first person contacted us wondering how to change his email address in stats because he'd changed ISPs. If you look at todays statscode (at least the database portion of it) approximately 50% of it is there to deal with people retiring one email address into another, and I'd say that 90%+ of the bugs are in this code. Had we just required new users to register to get a nice shiny unique numeric ID (or a unique username...), none of that code would exist. * note that I'm not trying to rag on any of the numerous people who've been involved in the stats code over the years, but it is insightful to look at some of the 'dumb mistakes' that have been made and the large amount of pain that it's caused. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly