On Wed, Sep 30, 2009 at 2:27 PM, Ola Hodne Titlestad <olati...@gmail.com>wrote:
> 2009/9/30 Bob Jolliffe <bobjolli...@gmail.com> > >> OK. I've reached the conclusion that the model can and probably should be >> simplified, but it is really far too much work for what I have time for >> now. The categoryoptioncombo is already deeply ingrained in many parts of >> the system. So don't hold your breath. >> >> I'm going back to focus on my much simpler problem of exploding >> categorycombooptions into dimensions and vice versa. >> >> For querying, I can see the API needs methods added to return datavalues >> by arbitrary collections of category rather than just fixed >> categoryoptioncombos. These only exist for the purpose of data collection. >> I suspect that this is what Ola needs to create more flexible reporttables. >> Then when configuring the reporttable you would freely select the dimensions >> you were interested in. This is of course do-able - I can see it - but my >> little brain is struggling with the complexity. >> >> Looking at a two stage process it is a matter of getting the collection of >> categorycombooptionids which intersect with the given set of categories and >> then passing that collection to the existing API method which returns >> collections of datavalues which match particular categorycombooptionids. >> >> In principle if we can expose the required methods in the API then it >> might be possible at some time in the future to revamp the underlying table >> structure without disturbing the API. >> >> Two final thoughts: >> 1. if we are bound to the model whereby categoryoptions are free standing >> entitities (ie many to many relation with categories) then, for the purpose >> of import/export we are obliged to uniquely identify these as well. So I >> will have to reluctantly also put uuids on categoryoptions. After >> discussing with Abyot last night, I can see that there is some value in >> having them the way they are, but we will have to live with the complexity. >> What you gain on the swings you lose on the roundabouts. >> >> > OK. I still don't get why we need this flexibility though. When using the > data values you would only query for data element + categories/dimensions > anyway right, and <5 means <5 whether it is part of AGE1, AGE2 or AGE 3. Or? > I think it is quite important to simplify things - as Bob very pointedly highlighted, there is currently no multidimensionality for indicators, which I think renders it relatively irrelevant. So I strongly support a simplification (and renaming) which will allow us to relatively quickly have good support for 95% of the cases rather than cater for very esoteric needs that may be useful to a few people. Agile principles and all that. Knut > > >> 2. Indicators are not multidimensional. Why is this? Was it a conscious >> decision resulting from earlier discussion or is it just that we haven't got >> there yet? >> > > Data analysis could benefit from having multidimensional indicators, but > then since this is strictly for output and never input I would suggest using > the post-method of assigning indicator group sets and groups (or whatever > you end up calling it in the UI). What makes indicators interesting and > complex in this context is that the numerator/denominator formulas should be > able to contain slices of the multidimensional data element, e.g. "Malaria" > + "all ages", "male", and not only the flat data element (data element + 1 > categoryoptioncombo, "Malaria"+ "<5", "male") like it is today. > > >> >> Regards >> Bob >> >> 2009/9/29 Bob Jolliffe <bobjolli...@gmail.com> >> >> 2009/9/29 Abyot Gizaw <aby...@gmail.com> >>> >>>> >>>> >>>> On Tue, Sep 29, 2009 at 9:16 PM, Jason Pickering < >>>> jason.p.picker...@gmail.com> wrote: >>>> >>>>> I think Abyot raises some good points, especially his last one about >>>>> differenences of what the age dimension really is. >>>>> >>>>> I think the biggest challenge is going to be how to unite the concepts >>>>> of a multidimensional data element (as it is currently implemented >>>>> with categories) and a data element that has no multidimensionality, >>>>> at least in the sense of it not being assigned any categories. >>>>> >>>> >>>> Isn't this what we have in the current system? If you are not assigning >>>> any combination of categories for a dataelement (well of course for the >>>> sake >>>> of consistency - from programming logic point of view - implicitly a >>>> default >>>> category combination with one default category having one default option is >>>> assigned - it is like putting your value at zero on the dimensions axis) >>>> then the dataelement has no dimensionality. >>>> >>> >>> I don't really like the default category idea. The way I have currently >>> proposed there is no default category. By default a dataelement has no >>> dimensions. It doesn't need a default dimension. And also by default the >>> dimensionelementcombination in datavalue is NULL. >>> >>> >>>> >>>> >>>>> >>>>> What about the following scenario. Could the cateogry/category combos >>>>> be transformed somehow into a sort of data element generator? Users >>>>> could define a dimensionality set, assign a master data element, and >>>>> DHIS would create all of the necessary data elements. So a category >>>>> combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1 >>>>> ,Under 5 and Over 5) and template data element (Clinical malaria) >>>>> would produce : >>>>> >>>>> OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria} >>>>> OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria} >>>>> OPD Over 5 Clinical Malaria ... >>>>> OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria} >>>>> ... >>>>> .. >>>>> .. >>>>> IP Clinical Malaria Total {IP, All ages, Clinical Malaria} >>>>> ... >>>>> ... >>>>> ... >>>>> Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria} >>>>> Clinical Malaria Total {All patient status, All ages, Clinical malaria} >>>>> >>>>> Each one of those data elements would then be assigned a set of >>>>> dimensions, and a set of dimensional elements. >>>>> The cateogries functionality would simply be an artifact to produce >>>>> multiple data elements, without having to enter them seperately, which >>>>> if I understood Ola yesterday, was one of its intended purposes. >>>>> >>>>> Now, for those of use such as myself, that do that have already create >>>>> dozens of data elements with different dimensions in their names (but >>>>> no where in a relational table) we could assign the dimensionality in >>>>> a seperate step (post-facto as Bob mentioned earlier). I might want to >>>>> assign a "uber" dimension of "Communicalble" and "Non-communicable" to >>>>> a disease type that might not have anything to do with the definition >>>>> of the data element itself, but would be simply for analysis purposes >>>>> later. Again, I may be rehashing my previous emails here, but from a >>>>> pure SQl standpoint, the approach I suggest here makes sense to me, in >>>>> terms of queries of how to pull this into a crosstab as well as how to >>>>> generate a fact table that something like an OLAP server could deal >>>>> with >>>>> >>>>> This approach might seem to resolve the issue of how to deal with >>>>> these two different beasts, but unfolding the multidimensional data >>>>> element into simpler components. Meaning that the >>>>> cateorgy/combos/options would be used as a templating mechanisms, but >>>>> that dimensionality could be assigned through a separate set of >>>>> relations. Perhaps this is what is represented in the diagram, but I >>>>> will need to study it tomorrow after some sleep. >>>>> >>>>> I do think that that dimenional elements should not be able to be >>>>> share by dimensions, and that dimensions and dimensional elements >>>>> should not be able to be deleted without lots of bells and whistles >>>>> going off once they have been assigned to data elements. >>>>> >>>> >>>> What is wrong with that as long as values are not associated with them? >>>> I think we will be falling back to the current implemention instead - like >>>> dimensional elements should not be deleted once values are assigned to >>>> their >>>> combinations. >>>> >>> >>> I agree. I think we all will agree on this much. >>> >>> >>>> >>>> >>>>> >>>>> I guess the key question is whether data elements should be able to >>>>> have multiple DimensionElementCombinations, which I think is the >>>>> current implementation. I am just not sure this will work with a >>>>> combination of DHIS2-type-multidimensional elements, and DHIS1.4-type >>>> >>>> data elements. >>>>> >>>> >>>> Can anyone explain me how the DHIS2 multidimensional dataelement concept >>>> fails to handle the DHIS 1.4 dataelements - sorry may be I missed this from >>>> your earlier discussion? I think the way I see it - if the objective is on >>>> OLAP, pivoting/querying, then what we need is not to change the model - >>>> instead to develop more APIs which can pull data along a dimension, varying >>>> degree of overlappings across dimensions - or more generally aggregation of >>>> values over a flexible set of dimensionelementcombinations ! >>>> >>> >>> Again I am with you mostly on this. In fact that has been my suggestion >>> all along - to push the functionality into the API. But having said that I >>> think the current model is too double-jointed and complex. I have seen by >>> trying to unpick the dimensions using xslt I need too many hash tables which >>> is inefficient. This no doubt would also translate into too many SQL >>> clauses. By trimming the requirement that dimensionelements are freely >>> assignable the model becomes a good bit simpler. Beyond that it is mostly >>> changing names. >>> >>> >>>> >>>> Using the example above - {OPD, IPD}, {Male, Female},{Under 1, 1-5, >>>> Above 5} and malaria as base dataelement >>>> >>>> What we have currently is an API to provide values for >>>> >>>> Malaria(OPD,Male,Under 1) >>>> Malaria(OPD,Male,1-5) >>>> Malaria(OPD,Male,Above 5) >>>> Malaria(OPD,Female,Under 1) >>>> Malaria(OPD,Female,1-5) >>>> Malaria(OPD,Female,Above 5) >>>> .... >>>> ... >>>> >>>> And if I understood correctly .. what is required is to have registred >>>> cases of >>>> >>>> Malaria in the OPD, >>>> Malaria in the IPD >>>> Malaria for Males >>>> Malaria for Females >>>> .... >>>> .. >>>> >>>> Malaria In the OPD but only those Female >>>> Malaria In the IPD but for male >>>> .. >>>> .. >>>> .. >>>> we can list different combinations.... >>>> >>>> or finally ask ...... for the Malaria >>>> >>>> Isn't this a simple question of Aggregation? Does the multidimensional >>>> datamodel have a limitation to handle the above requirements - or am I >>>> talking a different stuff here? >>>> >>> >>> No I believe it can probably be done - but yet it doesn't seem to have >>> been done. When I started looking at how I might do it I realized that it >>> could also be simplified. >>> >>> Regards >>> Bob >>> >>> >>>> >>>> >>>>> >>>>> Enough for today. >>>>> >>>>> Thanks for this Bob. It is a good start. Can't you make this diagram >>>>> in DocBook so I can edit it? :D >>>>> >>>>> Regards, >>>>> Jason >>>>> >>>>> >>>>> >>>>> On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw <abyo...@gmail.com> >>>>> wrote: >>>>> > Yes your suggestion is doable and less is better .... but I think the >>>>> > requirement from the field is more complex. >>>>> > >>>>> > If, for a moment, we stop talking about datavalues and talk about >>>>> > dataelements - why are we talking about dimension combinations? >>>>> > >>>>> > Because you are assuming a dataelement to have only one dimension. Am >>>>> I >>>>> > correct? If that is the case, I see a little bit of inconsistency >>>>> here. >>>>> > DataElement talks about one dimesion, but its corresponding value >>>>> talks >>>>> > about combination of dimensions. >>>>> > >>>>> > Yes from the datavalue I can have dimensionelementcombinations, pick >>>>> > dimensionelments regroup and put them in their corresponding >>>>> dimesions -- in >>>>> > the end telling me from which dimension they came from. But from this >>>>> point >>>>> > onwards I am no more talking about a value of a single dataelement >>>>> but a >>>>> > value for combination of dataelements (because I have to pull >>>>> different >>>>> > dataelements which can give me the identified dimensions) .... but is >>>>> this >>>>> > what we want? >>>>> > >>>>> > The other point I would like the raise is - will there not be any >>>>> limitation >>>>> > on the flexibility of the system when putting the restriction "A >>>>> Dimension >>>>> > has many DimensionElements. But a DimensionElement is a member of >>>>> only one >>>>> > Dimension" ? Not only system flexibility problem, I see a logical >>>>> problem as >>>>> > well. Because if we think for example beyond the obvious >>>>> > SEX(male,female,unknown) - I see a strong need for letting >>>>> dimensionelements >>>>> > to be member of multiple dimensions: For example take the other >>>>> obvious >>>>> > dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its >>>>> > dimesionelements. May be such scaling of the AGE dimension is >>>>> approrpiate >>>>> > for Malaria case, but for TB case people might be interested to break >>>>> the >>>>> > AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we >>>>> going to >>>>> > handle cases like this? Are we going to define a number of <5yrs or >>>>> are we >>>>> > going to use the same <5yr dimensionelement ? >>>>> > >>>>> > >>>>> > Thank you >>>>> > Abyot. >>>>> > >>>>> > >>>>> > >>>>> > On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe <bobjolli...@gmail.com> >>>>> wrote: >>>>> >> >>>>> >> OK. Here's my first attempt to rationalize things. Please excuse >>>>> the >>>>> >> attachments. I try not to send attachments to mailing lists but >>>>> these are >>>>> >> at least fairly small. (And Lars I will write it up in docbook >>>>> after >>>>> >> fishing for feedback). >>>>> >> >>>>> >> My primary aim has been to disturb the existing model as little as >>>>> >> possible whilst trying to simplify wherever possible. >>>>> >> >>>>> >> Attached oldmodel.png shows the participants in the existing model. >>>>> As >>>>> >> you can see there are 11 tables in all. I haven't showed the >>>>> relations as >>>>> >> it becomes a bit of a web. >>>>> >> >>>>> >> Also attached is a proposed amended database model which bears >>>>> sufficient >>>>> >> similarity to the old that migration between the two should be >>>>> feasible. >>>>> >> But it is down to 6 tables. And I have named the tables according >>>>> to the >>>>> >> terms we have been discussing. Of course this is just the database >>>>> model. >>>>> >> I've also put together an XML view of what some sample dataset might >>>>> look >>>>> >> like. There is also a UML model required which would be richer than >>>>> the >>>>> >> underlying datamodel, but one step at a time .... >>>>> >> >>>>> >> Walking through: >>>>> >> >>>>> >> 1. DataElements can have Dimensions. And different dataElements >>>>> can (and >>>>> >> hopefully will) share some of the same Dimensions. So there is a >>>>> m-to-n >>>>> >> relationship between the two necessitating an extra table >>>>> >> (DataElementDimensions). An example of a Dimension is SEX. Nothing >>>>> new >>>>> >> here. >>>>> >> >>>>> >> 2. Dimensions have DimensionElements. So SEX for example might >>>>> have >>>>> >> DimensionElements "Male", "Female", "Unknown". A big difference >>>>> from the >>>>> >> old model is that there is 1-n relationship between >>>>> DimensionElements and >>>>> >> Dimensions. A Dimension has many DimensionElements. But a >>>>> DimensionElement >>>>> >> is a a member of only one Dimension. >>>>> >> >>>>> >> 3. DataValues represent the values at intersection of these >>>>> Dimensions. >>>>> >> Keeping with the spirit of the old model this intersection is >>>>> represented by >>>>> >> a single key, DimensionElementCombination. The >>>>> DimensionElementCombinations >>>>> >> would be populated when a new Dimension is added to a DataElement. >>>>> Like the >>>>> >> original model there is some fragility here. Changing dimensions on >>>>> >> dataelements could create a situation where datavalues become >>>>> orphaned or >>>>> >> misdirected. The API must have robust methods for defending this >>>>> integrity >>>>> >> particulalrly when updating the structural metadata. But this is >>>>> perhaps >>>>> >> doable. Either way its not worse than we have. >>>>> >> >>>>> >> I haven't given a name to DimensionElementCombinations. >From the >>>>> examples >>>>> >> I have seen from SL this seems to be unnecessary. The names I have >>>>> seen >>>>> >> being used are generally simply contrived from the dimensions or >>>>> (worse >>>>> >> still) from the categoryoptions. What is important is that >>>>> dataelements can >>>>> >> have sets of dimensions. >>>>> >> >>>>> >> And then much of what is different is just a renaming of the >>>>> original >>>>> >> entities. From the attached XML file I think you can see some of >>>>> the >>>>> >> issues faced re names and identifiers. I find myself following a >>>>> sort of >>>>> >> convention of CODE, Name, Description and UUID. CODE's must be >>>>> unique >>>>> >> within the scope of the database. I suppose this is close to what >>>>> we >>>>> >> currently call ShortName. I would like to place constraints on >>>>> CODES in >>>>> >> terms of length and also the disallowing of spaces and other funny >>>>> >> characters. The reason being that we may well have to use these >>>>> codes in >>>>> >> making up uri's. So CODES must be unique. For the moment we could >>>>> keep >>>>> >> name unique but should migrate from it. Its a matter of rewriting >>>>> all our >>>>> >> comparators I guess. UUIDs I am told are unique through some sort >>>>> of >>>>> >> divinity so we apparently do not need to worry about them :-) >>>>> >> >>>>> >> I've also tried to reduce the number of knees on the donkey - from >>>>> 11 >>>>> >> tables to 6. I believe this can be done whilst preserving the >>>>> existing >>>>> >> functionality. This arangement would make it much more sensible to >>>>> produce >>>>> >> the XML I need to produce. I'm hoping that it would also be more >>>>> friendly >>>>> >> to those who would be trying to pivot the data across dimensions. >>>>> >> >>>>> >> Jason do you think this works for you? I might have missed out >>>>> something >>>>> >> really fundamental. Abyot, you've been through this process before >>>>> - am I >>>>> >> missing something? From the DataValue you can see >>>>> DimensionElements. And >>>>> >> once you know a DimensionElement you also know the Dimension to >>>>> which it >>>>> >> belongs. I think thats queryable. Will have to hydrate with some >>>>> data and >>>>> >> see. >>>>> >> >>>>> >> Shaking the multidimensional model up like this would obviously have >>>>> >> implications. But I suspect most of it is taking stuff away rather >>>>> than >>>>> >> adding new so it might just be doable. Less is more. >>>>> >> >>>>> >> Not spending time with docbook yet, till I get some feedback. >>>>> >> >>>>> >> Cheers >>>>> >> Bob >>>>> >> >>>>> >> 2009/9/29 Bob Jolliffe <bobjolli...@gmail.com> >>>>> >>> >>>>> >>> Hi >>>>> >>> >>>>> >>> On the back of Jason and others comments, I've reached the >>>>> conclusion >>>>> >>> that we cannot really live with the MD model the way it is. >>>>> Whereas I think >>>>> >>> it is (just about) workable there are some serious optimizations we >>>>> can and >>>>> >>> should do. I am going to put my other work back a day or two and >>>>> propose >>>>> >>> some changes in a branch. >>>>> >>> >>>>> >>> I think central to the inefficiency is the many-many relation >>>>> between >>>>> >>> categories and categoryoptions. This strikes me as illogical as >>>>> well as >>>>> >>> being cumbersome in the UI. Do we really want to be able to make >>>>> categories >>>>> >>> with options like {'0<5','6-10','Male','Out of stock','35-40'}. >>>>> Reducing >>>>> >>> the relation between categories and category options to 1-n cuts >>>>> two tables, >>>>> >>> should make sql queries more efficient and grokkable and also >>>>> matches other >>>>> >>> models such as sdmx better. >>>>> >>> >>>>> >>> The other possiible inefficiency is the dimensionset. It can be >>>>> useful >>>>> >>> in some contexts but I'm guessing that when querying the data >>>>> (which we want >>>>> >>> to be fast) it is not relevant. A dataelement can have >>>>> dimensions. The >>>>> >>> fact that some dataelements have the same combinations of >>>>> dimensions is very >>>>> >>> useful to know for some purposes, but it should be possible to get >>>>> from the >>>>> >>> dataelement to the dimension directly. >>>>> >>> >>>>> >>> On the other side of the road is the hierarchical dimensionality >>>>> idea I >>>>> >>> see Ola and Jason have been discussing, where dimensions are >>>>> composed >>>>> >>> (perhaps post-facto) of uni-dimensional dataelements rather than >>>>> decomposed >>>>> >>> into pre-structured dimensional elements. I suspect that: >>>>> >>> 1. we need both; and >>>>> >>> 2. from the API, user and reporting perspective they should look >>>>> the >>>>> >>> same (ie a dataelement can have dimensions - how they come about >>>>> should not >>>>> >>> be a concern at the end point). >>>>> >>> >>>>> >>> I'll try out some of these ideas and point you to the branch. >>>>> >>> >>>>> >>> Regards >>>>> >>> Bob >>>>> >>> >>>>> >>> 2009/9/29 Lars Helge Ă˜verland <larshe...@gmail.com> >>>>> >>>> >>>>> >>>>> >>>>> >>>>> Thanks for the explanations Jason. The multidimensional model is >>>>> quite >>>>> >>>>> complicated, is poorly documented, and as you say is DHIS-centric >>>>> in the way >>>>> >>>>> that it is built around the DHIS notion of a Data Element. >>>>> >>>>> >>>>> >>>> >>>>> >>>> Could we assemble and put some of the text being written on the >>>>> list to >>>>> >>>> docbook? >>>>> >>>> >>>>> >>>>> >>>>> >>>>> That said, and I think Jason already has made a strong case for >>>>> this, >>>>> >>>>> also in a 100% DHIS2 scenario you will need more flexibility in >>>>> defining >>>>> >>>>> dimensions to your data than what categories can provide. Being >>>>> able to >>>>> >>>>> define data dimensions independent of data collection is powerful >>>>> and should >>>>> >>>>> be supported in a better way than what data element groups >>>>> provide today. >>>>> >>>>> Given that we already have the orgunit group set code in place I >>>>> would >>>>> >>>>> assume that adding group sets to data elements could be a >>>>> relatively >>>>> >>>>> straight forward thing to do (but then again, I am not the >>>>> programmer...). >>>>> >>>> >>>>> >>>> I don't see any implications in adding this to the system, it >>>>> won't >>>>> >>>> require changes to the existing model as the association goes from >>>>> the >>>>> >>>> groupset to the groups. We can prioritize this for the 2.0.3 >>>>> release. >>>>> >>>> >>>>> >>>> >>>>> >>>> _______________________________________________ >>>>> >>>> Mailing list: >>>>> >>>> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>>>> >>>> Post to : dhis2-devs@lists.launchpad.net >>>>> >>>> Unsubscribe : >>>>> >>>> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>>>> >>>> More help : https://help.launchpad.net/ListHelp >>>>> >>>> >>>>> >>> >>>>> >> >>>>> >> >>>>> >> _______________________________________________ >>>>> >> Mailing list: >>>>> >> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>>>> >> Post to : dhis2-devs@lists.launchpad.net >>>>> >> Unsubscribe : >>>>> >> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>>>> >> More help : https://help.launchpad.net/ListHelp >>>>> >> >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > Mailing list: >>>>> > https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>>>> > Post to : dhis2-devs@lists.launchpad.net >>>>> > Unsubscribe : >>>>> > https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>>>> > More help : https://help.launchpad.net/ListHelp >>>>> > >>>>> > >>>>> >>>> >>>> >>> >> >> _______________________________________________ >> Mailing list: >> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >> Post to : dhis2-devs@lists.launchpad.net >> Unsubscribe : >> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >> More help : https://help.launchpad.net/ListHelp >> >> > > _______________________________________________ > Mailing list: https://launchpad.net/~dhis2-devs > Post to : dhis2-devs@lists.launchpad.net > Unsubscribe : https://launchpad.net/~dhis2-devs > More help : https://help.launchpad.net/ListHelp > > -- Cheers, Knut Staring
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp