> However any time you have a primary key with meaning is probably
> a mistake in database design. You are best trying to avoid this
> at all costs unless you are constrained by a legacy system in
> some way.

This is common advice, but the topic is controversial.
Even in non-legacy databases, having natural PKs can have advantages:
- A synthetic key often means an additional join since an internal ID number is 
incomprehensible to a human so you need to get the database record that 
contains the printable text for the ID. This affects a large percentage of 
queries that return displayable data, and almost all ad-hoc queries that your 
team leader will use to check database sanity (this means that your team leader 
will usually shoot down the idea of using synthetic PKs exclusively, with good 
reasons, and that's that).
- Even with synthetic keys, in a hierarchy of parent-child relationships, it is 
useful to construct the PK of a child from the ID of the parent plus a unique 
synthetic detail id. That way, if you have a grandchild record, you can 
immediately access its grandparent and don't need to load the parent record. 
This isn't a very common scenario and mostly restricted to ad-hoc queries, but 
it does occasionally help.
- A synthetic key is redundant if the data does contain a natural key, so we're 
violating a normalization property and (more importantly) spreading out the 
data over more disk sectors, which has a performance impact (this starts to get 
noticeable at six-digit record count and can become paramount at millions of 
records; below that, performance is usually a non-issue anyway and you need to 
question yourself what you actually need a database for when a text editor 
search will work just as well - okay, I'm exaggerating a bit :-) ).

There is one restriction: You never ever change a primary key. It's technically 
possible, but in practice, you need to find each and every place where that key 
value was ever stored: not just associated tables but also any database dumps 
on backup media, or if that PK value ever went to somebody else's computer 
(say, via a webservice), any copy of the value on their disks, which is far 
more effort that it's worth even if you manage to control all the places where 
the value has gone.
So, if you use natural keys, if there is *any* chance that *anybody* will 
*ever* want to change a PK candidate field, don't put that field in the PK and 
use a synthetic PK instead.

The distinction between natural and synthetic keys is a bit more blurry than 
most people realize, too. Most "natural" keys that I see in practice have 
started life as synthetic key on somebody else's system.
The EAN, ISSN, and ISBN, for example, come as data to most, but these are a 
world-wide unique values specifically designed to never change, so each of them 
is perfectly fine as a PK if you are dealing with data that always comes with 
such a number. Just be 150% sure that all the articles you'll ever deal with 
are guaranteed to have such a number (not true if you're doing business with 
entities that do not know or care about such numbers, but can be true in B2B 
scenarios).
The symbols of chemical elements, despite being standardized, have changed in 
the past, and it is conceivable that we may live to see one or two changes in 
the future. Ordinal numbers of elements, by definition, never change, so they 
are an acceptable PK in a table of element properties.
If you are talking to another system and that system provides you with a unique 
key for something, you can use that UK as a PK (or as part of a PK), too. Just 
talk to the guys maintaining the system and make sure that they run a strict 
never-change-a-PK policy. (This is essentially the same scenario as the one 
with EAN/ISSN/ISBN, except that the guys maintaining these PKs have already 
publicly committed to never changing these numbers.)

Reporting straight from the trenches,
Jo

Reply via email to