Hi 2009/9/29 Abyot Gizaw <aby...@gmail.com>
> Yes your suggestion is doable and less is better .... but I think the > requirement from the field is more complex. > > If, for a moment, we stop talking about datavalues and talk about > dataelements - why are we talking about dimension combinations? > > Because you are assuming a dataelement to have only one dimension. Am I > correct? If that is the case, I see a little bit of inconsistency here. > DataElement talks about one dimesion, but its corresponding value talks > about combination of dimensions. > No you are misreading me - or I have made a mistake. DataElement can have may dimensions. If it were just one there would just be a n-1 relation between Dimension and DataElement. Because DataElement can have more than one dimension, I have the DateElementDimension table inbetween. I actually meant to call it DateElementDimensions but table names should generally be singular. So the contents of this table might look like: dimensionID, dataElementID 1, 45 1, 46 2, 45 3, 45 4, 6 So dataelement 45 would have 3 dimensions etc > > Yes from the datavalue I can have dimensionelementcombinations, pick > dimensionelments regroup and put them in their corresponding dimesions -- in > the end telling me from which dimension they came from. But from this point > onwards I am no more talking about a value of a single dataelement but a > value for combination of dataelements (because I have to pull different > dataelements which can give me the identified dimensions) .... but is this > what we want? > > The other point I would like the raise is - will there not be any > limitation on the flexibility of the system when putting the restriction "A > Dimension has many DimensionElements. But a DimensionElement is a member of > only one Dimension" ? Not only system flexibility problem, I see a logical > problem as well. Because if we think for example beyond the obvious > SEX(male,female,unknown) - I see a strong need for letting dimensionelements > to be member of multiple dimensions: For example take the other obvious > dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its > dimesionelements. May be such scaling of the AGE dimension is approrpiate > for Malaria case, but for TB case people might be interested to break the > AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to > handle cases like this? Are we going to define a number of <5yrs or are we > going to use the same <5yr dimensionelement ? > I think in this case we would have to define a number of "<5" dimensionelements. I agree that the way it is now there is maximum flexibility, but it comes at quite a cost. I haven't seen much to suggest that this would be a real limitation. Anyway, the way it stands "<5" is just a label without any intrinsic meaning. So we can just as easily combine it with apples or oranges. By binding a set of dimensionelements to a dimension we at least give them some meaning as an aggregation group. Thanks for your input. I will lokk again at the first issue and see whether I have made a mistake. Regards Bob > > > Thank you > Abyot. > > > > > On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe <bobjolli...@gmail.com>wrote: > >> OK. Here's my first attempt to rationalize things. Please excuse the >> attachments. I try not to send attachments to mailing lists but these are >> at least fairly small. (And Lars I will write it up in docbook after >> fishing for feedback). >> >> My primary aim has been to disturb the existing model as little as >> possible whilst trying to simplify wherever possible. >> >> Attached oldmodel.png shows the participants in the existing model. As >> you can see there are 11 tables in all. I haven't showed the relations as >> it becomes a bit of a web. >> >> Also attached is a proposed amended database model which bears sufficient >> similarity to the old that migration between the two should be feasible. >> But it is down to 6 tables. And I have named the tables according to the >> terms we have been discussing. Of course this is just the database model. >> I've also put together an XML view of what some sample dataset might look >> like. There is also a UML model required which would be richer than the >> underlying datamodel, but one step at a time .... >> >> Walking through: >> >> 1. DataElements can have Dimensions. And different dataElements can (and >> hopefully will) share some of the same Dimensions. So there is a m-to-n >> relationship between the two necessitating an extra table >> (DataElementDimensions). An example of a Dimension is SEX. Nothing new >> here. >> >> 2. Dimensions have DimensionElements. So SEX for example might have >> DimensionElements "Male", "Female", "Unknown". A big difference from the >> old model is that there is 1-n relationship between DimensionElements and >> Dimensions. A Dimension has many DimensionElements. But a DimensionElement >> is a a member of only one Dimension. >> >> 3. DataValues represent the values at intersection of these Dimensions. >> Keeping with the spirit of the old model this intersection is represented by >> a single key, DimensionElementCombination. The DimensionElementCombinations >> would be populated when a new Dimension is added to a DataElement. Like the >> original model there is some fragility here. Changing dimensions on >> dataelements could create a situation where datavalues become orphaned or >> misdirected. The API must have robust methods for defending this integrity >> particulalrly when updating the structural metadata. But this is perhaps >> doable. Either way its not worse than we have. >> >> I haven't given a name to DimensionElementCombinations. From the examples >> I have seen from SL this seems to be unnecessary. The names I have seen >> being used are generally simply contrived from the dimensions or (worse >> still) from the categoryoptions. What is important is that dataelements can >> have sets of dimensions. >> >> And then much of what is different is just a renaming of the original >> entities. From the attached XML file I think you can see some of the >> issues faced re names and identifiers. I find myself following a sort of >> convention of CODE, Name, Description and UUID. CODE's must be unique >> within the scope of the database. I suppose this is close to what we >> currently call ShortName. I would like to place constraints on CODES in >> terms of length and also the disallowing of spaces and other funny >> characters. The reason being that we may well have to use these codes in >> making up uri's. So CODES must be unique. For the moment we could keep >> name unique but should migrate from it. Its a matter of rewriting all our >> comparators I guess. UUIDs I am told are unique through some sort of >> divinity so we apparently do not need to worry about them :-) >> >> I've also tried to reduce the number of knees on the donkey - from 11 >> tables to 6. I believe this can be done whilst preserving the existing >> functionality. This arangement would make it much more sensible to produce >> the XML I need to produce. I'm hoping that it would also be more friendly >> to those who would be trying to pivot the data across dimensions. >> >> Jason do you think this works for you? I might have missed out something >> really fundamental. Abyot, you've been through this process before - am I >> missing something? From the DataValue you can see DimensionElements. And >> once you know a DimensionElement you also know the Dimension to which it >> belongs. I think thats queryable. Will have to hydrate with some data and >> see. >> >> Shaking the multidimensional model up like this would obviously have >> implications. But I suspect most of it is taking stuff away rather than >> adding new so it might just be doable. Less is more. >> >> Not spending time with docbook yet, till I get some feedback. >> >> Cheers >> Bob >> >> 2009/9/29 Bob Jolliffe <bobjolli...@gmail.com> >> >> Hi >>> >>> On the back of Jason and others comments, I've reached the conclusion >>> that we cannot really live with the MD model the way it is. Whereas I think >>> it is (just about) workable there are some serious optimizations we can and >>> should do. I am going to put my other work back a day or two and propose >>> some changes in a branch. >>> >>> I think central to the inefficiency is the many-many relation between >>> categories and categoryoptions. This strikes me as illogical as well as >>> being cumbersome in the UI. Do we really want to be able to make categories >>> with options like {'0<5','6-10','Male','Out of stock','35-40'}. Reducing >>> the relation between categories and category options to 1-n cuts two tables, >>> should make sql queries more efficient and grokkable and also matches other >>> models such as sdmx better. >>> >>> The other possiible inefficiency is the dimensionset. It can be useful >>> in some contexts but I'm guessing that when querying the data (which we want >>> to be fast) it is not relevant. A dataelement can have dimensions. The >>> fact that some dataelements have the same combinations of dimensions is very >>> useful to know for some purposes, but it should be possible to get from the >>> dataelement to the dimension directly. >>> >>> On the other side of the road is the hierarchical dimensionality idea I >>> see Ola and Jason have been discussing, where dimensions are composed >>> (perhaps post-facto) of uni-dimensional dataelements rather than decomposed >>> into pre-structured dimensional elements. I suspect that: >>> 1. we need both; and >>> 2. from the API, user and reporting perspective they should look the >>> same (ie a dataelement can have dimensions - how they come about should not >>> be a concern at the end point). >>> >>> I'll try out some of these ideas and point you to the branch. >>> >>> Regards >>> Bob >>> >>> 2009/9/29 Lars Helge Ă˜verland <larshe...@gmail.com> >>> >>>> >>>> >>>>> Thanks for the explanations Jason. The multidimensional model is quite >>>>> complicated, is poorly documented, and as you say is DHIS-centric in the >>>>> way >>>>> that it is built around the DHIS notion of a Data Element. >>>>> >>>>> >>>> Could we assemble and put some of the text being written on the list to >>>> docbook? >>>> >>>> >>>>> That said, and I think Jason already has made a strong case for this, >>>>> also in a 100% DHIS2 scenario you will need more flexibility in defining >>>>> dimensions to your data than what categories can provide. Being able to >>>>> define data dimensions independent of data collection is powerful and >>>>> should >>>>> be supported in a better way than what data element groups provide today. >>>>> Given that we already have the orgunit group set code in place I would >>>>> assume that adding group sets to data elements could be a relatively >>>>> straight forward thing to do (but then again, I am not the programmer...). >>>>> >>>> >>>> I don't see any implications in adding this to the system, it won't >>>> require changes to the existing model as the association goes from the >>>> groupset to the groups. We can prioritize this for the 2.0.3 release. >>>> >>>> >>>> _______________________________________________ >>>> Mailing list: >>>> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>>> Post to : dhis2-devs@lists.launchpad.net >>>> Unsubscribe : >>>> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>>> More help : https://help.launchpad.net/ListHelp >>>> >>>> >>> >> >> _______________________________________________ >> Mailing list: >> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >> Post to : dhis2-devs@lists.launchpad.net >> Unsubscribe : >> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >> More help : https://help.launchpad.net/ListHelp >> >> >
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp