Re: [Tagging] football or soccer ?

Colin Smale Mon, 02 Aug 2010 01:39:14 -0700

 On 01/07/2010 15:25, Anthony wrote:

On Thu, Jul 1, 2010 at 8:08 AM, John F. Eldredge <[email protected]<mailto:[email protected]>> wrote:
    In fact, the technique of having the user select from a list of
    words, but actually storing the value as an arbitrary ID
    (generally numeric), is the recommended technique in database
    design.  It is called "normalizing the database".
Umm...no. At least, not exactly. If a single column is independentfrom other columns, it is not necessary for normalization to store itas an arbitrary ID. (For example, if you have a database tablecontaining a driver's license number, date of birth, and hair color,you generally wouldn't store the hair color as an arbitrary ID andthen have a separate table to look up the hair color. It certainlyisn't necessary for normalization. Assuming driver's license numberis your primary key, hair color is a fact about the key, the wholekey, and nothing but the key.)

Actually that would be exactly what you would do, assuming you want thelist of colours to be controlled and finite. If you denormalise and putthe text of the hair colour in the person table you are enablingspelling variations, translations and other kinds of "noise" which isusually what you want to prevent. A real-life example would be thecolour of a car in the registration database. My car is painted(according to the manufacturer) "Noir Nacre" but I wouldn't find "NoirNacre" in the government database. It's black, however you look at it.Unless you are French, in which case it is "noir". Etc etc.

It starts with a question about the data model. Do you recognise colouras having a finite set of valid values, or is it really free text? The"OSM way" is to have everything as "free text" at a technical level, andto maintain any "list of valid values" by general consensus, althougheven this goes against the grain for some people. As the profile of OSMimproves in the market for cartographic data, it will becomeincreasingly important to demonstrate that the data has some kind ofquality control.

The discussion about football vs. soccer is not one about what it IS,but what it is CALLED. British English is the base language for OSM, sothe main tag value should be "sport=football". Just as German-speakersare free (encouraged?) to use "sport:de=fussball" why should it not be"sport:us=soccer" in the USA?

If you're using a crappy "DBMS" you might do this anyway, not fornormalization, but for performance purposes, because the DBMS is toostupid to do it automatically behind the scenes for you. If you'reusing a good DBMS, it won't be necessary, though.


Which DBMS do you call crappy and which do you call good?

Colin

_______________________________________________
Tagging mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/tagging

Re: [Tagging] football or soccer ?

Reply via email to