On 1 September 2011 15:02, Jo Størset <stor...@gmail.com> wrote: > Great that you're looking at this. Some immediate feedback (pardon the lack > of structure:)
Thanks for feedback ... > > Den 1. sep. 2011 kl. 13.55 skrev Bob Jolliffe: > >> As a first step I am interested in reusing the DataValueSet stuff from >> the model rather than the representation of metadata, which I think >> needs to be done more completely and not until changes to the >> dimensional model are realized in the not too distant future. > > The only metadata stuff I've done was basically just to serve up some basic > html, that should not be used, and certainly not reused :) > >> 1. We should shift storedBy up to the dataValueSet level. I'm >> assuming all datavalues in a datavalueset will be stored by the same >> user. I'd put back an optional Comment attribute here as well. >> Currently its only useful for rolling back imports. Not the most >> efficient way to implement it but still useful. > > I agree it would be nice if we could move it up. I am a bit unsure of the > semantics of our data model and the use cases for this. If this were to be > used to communicate between dhis instances, I guess it wont be an unthinkable > situation that I have edited/added a value in a set that you originally > stored, and that granularity would be lost. If that is something we should > rethink in our data model rather than inherit to the xml structure, I don't > know. Me neither so it was a bit of a tentative suggestion :-) I think the semantics have their origin in an earlier era of standalone and isolated dhis. My thinking would be that what is relevant is who has stored this value in *this* database. Usernames from strange databases wouldn't make much sense anyway. And if one wanted to audit it's absolute origin, one would have to follow the trail back to the producer of the datavalueset - which might or might not be a dhis instance. Of course persisting the datavalueset would be immensely helpful for this, but as Lars has pointed out, no requirement for this has emerged yet so we hold off on that for now. Its not critical either way at the moment - just looks a bit untidy. Thus far, unless I hear compelling argument to the contrary, it seems better to move it up. Will wait and listen. > >> 2. I don't think categoryOptionCombo should *necessarily* be exposed >> to the external world. Its very much an internal arrangement of DHIS. >> Its useful enough in cases where HISP folk are involved on both >> producer and consumer side of the equation, but for other 3rd parties >> in the world it is best to hide this internal arrangement. I suggest >> that dataElement and value are *required* attributes, >> categoryOptionCombo is optional and in addition we have have an >> <xs:anyAttribute> extension point which allows for additional >> attributes. The implication would be that the above dataset will >> remain valid (so existing stuff is still working), > > I think I agree that we need another model to better "externalize" > dimensions. But it would become a bit more complex to implement if > dataElement+optionCombo is not a "simple" identifier to the datavalue any > more. It would be good to hear a little more about how you plan to implement > it in the short run and if you think it should be combined with changes > inside dhis.. I think that the simple {dataElement,optionCombo} tuple will remain the internal identifier to the datavalue for the foreseeable future. There's a lot of stuff built on top of it, it has some merits and it can be coerced to behave reasonably well with some tightening of constraints at the level of our java model. > > - Are you thinking of modelling this anyattribute extension point on the sdmx > model in some way? Well, similar enough I guess. > - If there is a more explicit way to describe this in the schema than just > anyattribute, I think it could help? Schema languages are better at some things than others. The problem here is that we would be required to constrain the attributes on the basis of a dynamic list which would vary from the concept list of one application to another. This would not be friendly to annotating bindings for use on any system. This is also the sdmx-hd problem. As it is, the xmlanyattribute annotation would bind to a map like: @XmlAnyAttribute public Map<QName,Object> getAny(){ if( any == null ){ any = new HashMap<QName,Object>(); } return any; } The datavalue service can determine whether attributes are invalid or not (in much the same way it determines whether orgunits, dataelements really exist etc. It could do this fairly painlessly by looking at the categorycombo of the dataelement - which I think we need to do now anyway to determine if the optioncombo is valid. Of course it would be fairly trivial for a running instance of dhis to generate a *strict* schema with the anyattributes replaced by fixed attributes, which might be of value to producers. But the internal parser would have to be a bit agnostic. > - And I think it would be advantageous if we could rework the internal data > model to better fit this more general "schema" at the same time, or at least > know a little bit more about how the internal changes would look. Internally I want to change very little. The most fundamental change being to implement the category-concept-categoryoption binding in the model and put strict constraints on concept names so that they are obliged to conform to the intersection of requirements for sql column names and xml attribute names. Breaking mcdonalds and replacing with a star or snowflake type schema is not really a sensible option at this juncture. > - We need to stay backwards compatible with existing meta models, are we sure > that the rules for names of dimensions (Sex, Age) is compatible with xml > attribute names? That we must impose through inspired regex on concept names which should be relatively easy. Category names can remain what they like. > - We might need to think through how these dimensions would look in the > metamodel xml, and how the link between this anyAttribute space and that > model would be? Will get to metamodel.xml next. But the link between anyattribute space and the model is essentially a fairly trivial one through categorycombo and conceptname. > > Overall I guess allowing the two identifier schemes to coexist for a while, > seems like a good idea. Though we should probably look to get rid of > optionComboId asap, then. Don't know. Could well be that optionComboid has long legs. It has its uses between dhis systems which both understand the notion. > >> 3. On the question of identifiers .... >> >> So I am going to suggest two additional attribute, probably at the >> dataValueSets level which indicates the id system to use. Currently I >> can think of internal, code, uuid and map as possible candidates for >> these attribute values. Where map would imply that ids need to be >> resolved using an aliases table keyed by a naming context, possibly >> using some of Lars' objectmapper or perhaps simpler. To maintain >> compatibility with existing web service api this attribute can be >> optional and default to uuid. > > Yep. I'm not sure what should be the default, though. Maybe just the internal > id? For simple cases that looks easier than uuids (at least if we are > thinking about the metamodel and how to communicate these id's *to* other > systems?). Since we would maybe want to reuse this id model for the meta > model as well, you think it would fit there? I agree that uuid is not the most gentle default. I just suggested it because you were already using it. > >> I am pretty sure I can implement the above without breaking what is >> currently there. One possible but minor breaking change I would >> suggest to improving parsing of very large datasets might be to >> abbreviate some well known element names to dv, de and v for >> compactness. > > I am not sure if these element names would really be that well known and > obvious for the target people having to work with the schema. > - Is there any alias mechanism for xml easily used with jaxb? Not really. There is a standard called DSRL which is designed to alias/transform element names but not really applicable here. Its not that important. I can live with long names or short. > - Wouldn't we want explicit streaming/"batch" handling for use cases where > sizes grew to this size, anyway? I think for really large cases, database dumps and other tools are maybe more appropriate anyway. Of course one problem is that you don't know the size of the stream when you start consuming it from the head ... I am sure some snakes have this problem :-) Bob > > Overall, though, if you think abbreviated names are better, I'm all for it. > > Jo _______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp