Peter Wendorff wrote: > I'm not talking about data duplication in the meaning of "I add my data > twice in different ways", but about redundant (not duplicate) data in > the meaning of "Sven added his data there not nowing that it's possible > here too; I add the data here - and you can check if we both contributed > data that doesn't show failures."
OK, but this all still rests on the assumption that there are in fact two independent data sources. I really don't think this is happening in real life. There are basically 3 scenarios how you can get the ref tags out of sync: 1) Someone creates a relation with ref=42 and then add a way with ref=24, why would he do it? Imho there are two possibilities: a) A mistake during editing - if the road really does not belong there, then a QA tool analyzing roads, should find it relatively easy (and such a tool would find e.g. a building polygon added to the route relation as well <- THIS, you can't do with simple ref cross-check). b) It is correct and has some meaning, that I can't think of right now. (simple ref cross-check fails again) 2) A relation exists with member ways without ref tag. This means that the route is essentially mapped and any further editor is correcting errors, that he found. Then someone comes and adds a ref tag to one of the ways - why? a) He wanted to correct a wrong ref tag. Well, then I think that person would/should look for the source of that wrong value (the relation) and correct it. I think this scenario is highly unlikely. b) Same as 1b). (cross-check again fails) 3) Both relation and ways are populated with ref tags and someone who wanted to correct a wrong value (e.g. because it's changed) edited only one of them. Could somebody provide a scenario where the data duplication and simple way-relation cross-check of ref tags is really useful? So far, I can't see one. > If you create a route relation and add a ref there, that's fine. It's > correct (as long as you provide correct data of course), and it can be > used by data consumers. > If Emil draws his ways and adds a ref tag to it, that's fine too - it's > correct (...) and can be used by data consumers. > Neither you nor Emil did wrong stuff, and even if we afterwards have the > ref on both, that's fine - as explained before. Oh, OK... Let me clarify my position as well: I do not propose some mass edit that would wipe out one way of tagging in favor of the other right now. But I do think, that we should reach some consensus about the desired final state of things and encourage data producers/consumers to converge on it. E.g. as Volker Schmidt wrote (wrt hiking routes), it's OK to use ref tag on ways, but it doesn't make much sense to keep it there once the relation is created and maintained. > You (may) complain that now it's hard to "fix" a bug in it. > Sure: if the routes ref get's changed, anyone has to fix that both in > ways and in the relation probably; but if not, we have a contradiction > that at least can be found in QA tools; And this contradiction is clearly a negative side effect of data duplication, because without the duplication this bug would never occur. Please note, that the duplication of ref tags on relation+ways will never alert you about the ref change in real-world. So, in this use case the data duplication has only negative effect on data quality. Once you've found the no longer valid ref tag, in the case of duplicated data you must change the relation and all the member ways, which is error-prone boring task. On the other hand, if you keep the ref only on the relation, it's an easy fix. Best regards, Petr Morávek aka Xificurk
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Tagging mailing list Tagging@openstreetmap.org http://lists.openstreetmap.org/listinfo/tagging