Hoi, I'd like to share some thoughts about the ``How to implement MV in OSM'' question, as opened in: http://wiki.openstreetmap.org/wiki/Proposed_features/Multivalued_Keys
I'd prefer to first have explicit agreement that we actually need MV ... but as the implementation discussion is already rolling ... Initially, I wanted to add a section to the talk page of the wiki page but my text now appears to be better suited for an email. Please feel free to integrate any part of my considerations into the wiki page. I see two general ways to solve the problem of MV in OSM: 1) Allow multiple identical keys per object (as is was before API 0.6, I learned). This means tag names of one object need *not* be unique. When we talk about tag names being unique, we should distinguish between being unique in the data storage and being unique at the surface (GUI). It seems ... well, what does it seem? Are we more concerned about the technical storage level or at the user experience? Which of them are we discussing? 2) Make multiple identical keys by some *technical* measurement unique. This is the currently assumed way to go, at least as such it appears to me. I (now) think that it is important to keep the value domain free from logic and thus have it reserved for literal data. This means, MV need to be implemented in the key domain. Currently, we mostly discuss with concrete examples. The assumption is, that the user would have to deal with these suffixes. Maybe he doesn't have to. It might be possible to abstract the user's view from the internal storage. Then the actual encoding becomes irrelevant from the user's perspective. Multiple identical keys could be presented to him (even grouped) ... and they'd be translated (e.g. by appending arbitrary suffixes (e.g. hashes of the value)) at the interface to the data storage layer. (I focus on unordered MVs here.) As a user, I'd never want to have to deal with this MV problem at all, which means no encoding should be required by me, neither in the value *nor* in the key domain. If there are two refs, then I'd want to tag: ref=foo + ref=bar. The internal storage should not be the user's problem. Of course, it's not that easy, because raw data is dealt with much too often. Nonetheless we should kept in mind, that a separation of the user's view from the data storage can solve colliding wishes. Concerning the choice, of how to add such a suffix: We should realize what we try to do here: We're violating the first normal form for relational databases, by encoding two separate bits of information in one field (the key's name and some unique suffix). We already came to the opinion, that encoding multiple values in one field in the value domain is bad ... but it is equally bad in the key domain. And it is even worse if the separator is not (technically) reserved for that specific purpose. If we would use the underscore (_) to separate the key's name from the unique suffix, then the technical separation of name and suffix would be pretty fragile, because names already contain underscores. The split would be rather guessing, based on the suffix to be a number. Hence, if we do encode two separate values in one field, then we better try hard to make the separator reserved. This not only spares us escaping, but also allows us to search for exact key names, because the search engine can then be enabled to know which is the name to compare and which is the suffix to ignore. The underscore approach fails in this respect equally as the colon approach. Of the currently discussed approaches, only the subscript (bracket) syntax satisfies this need. (Assuming that there are no brackets in key names, currently.) However, it's closing bracket is technically superfluous and only motivated by the thinking that humans have to see these suffixes. What we need in my opinion is one single character, that must never be part of any key name and never be part of any suffix. Using this separator, we encode two separate bits of information in one field (the key field) ... and thus have effectively three columns in a two-column table. At the surface (GUI) we should rather hide the technical suffix stuff. meillo P.S. Ordered MVs are not considered here. It is not clear if we need to consider them. _______________________________________________ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging