I am in favor of making the terminology adhere more to the general terms in use in the Data warehouse/OLAP/SDMX fields (though it must also be understandable to the users, of course - and documented!) Knut
On Fri, Sep 25, 2009 at 12:07 PM, Knut Staring <knu...@gmail.com> wrote: > > http://208.76.222.114/confluence/display/RandD/General+multi-dimensional+model > > > On Fri, Sep 25, 2009 at 11:55 AM, Abyot Gizaw <aby...@gmail.com> wrote: > >> The one-to-one relationship mentioned between dataelement and >> categorycombo is not correct ! >> >> The realtionship is one-to-many. A categorycombo can be assigned for many >> dataelements. But a dataelement can have only one categorycombo. >> >> Thank you >> Abyot. >> >> >> On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering < >> jason.p.picker...@gmail.com> wrote: >> >>> Hi there. >>> >>> My basic issue with the category/category combo is that it appears to be >>> a one-to-one relationship with data elements. If I look at the data model, >>> there is a one-to-one relationship between dataelement and categorycomboid. >>> For a given category combo, you can have multiple options. So, you can >>> establish a relationship for a given data element and a group of category >>> options. >>> >>> Let me try and describe the issue. We have a set of data elements related >>> to malaria for this example. We would like to be able to pivot the data on >>> other dimensions dimensions (Data element, age, disease, patient status). >>> Obviously there are other dimensions that are pivotable (orgunit, period, >>> dataset) >>> >>> The data elements look like this. I have put the dimensions in square >>> brackets, and the dimensional elements into curly brackets. >>> >>> [Data element, Age, Disease, Patient status] >>> Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases, >>> Deaths} >>> Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths} >>> >>> Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths} >>> Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths} >>> IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria >>> Cases, IP} >>> IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, >>> IP} >>> IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, >>> Deaths} >>> IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, >>> Deaths} >>> OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, >>> Malaria Cases, OPD} >>> OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria >>> Cases, OPD} >>> OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, >>> OPD} >>> OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria >>> Cases, OPD} >>> >>> OK, I hope this is pretty clear. Obviously, there are more data elements >>> (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I >>> have had for all diseases, or how many OPD cases I have had for each >>> disease. How can I do this with the existing data model? It is not obvious >>> to me because there is no relationship between dimensional elements >>> (categoryoptions) to each other. Category options can be related through a >>> cateogry combination, but since data elements can only be assigned a single >>> category option, the dimensionality is broken once it gets time to pull the >>> data into a pivot table. >>> >>> In the incomplete example that I gave yesterday, I established a >>> one-to-many relationship between a data element and a dimension. If I >>> understand the current data model, I would have to create a separate >>> categorycombo for each of these data elements, and assign this >>> categorycombo to the data element. Now, I might be able to unfold the >>> dimensions using the categories and categorycombos. I it is not apparent how >>> the dimensional elements correspond themselves to a particular dimension, as >>> there is no relation for this in the database as I can see it. >>> >>> As (Johan pointed out a few mails ago, if I understand him correctly) >>> is different categorycombo's can be created for individual data elements, >>> and assigned to these elements. However, this seems to be 1) incredibly >>> inefficient and 2) does not establish any relationship between dimensional >>> elements and dimensions. Perhaps it is there, and maybe it has been done in >>> SL, but the SQL is not apparent to me at all. >>> >>> It would appear to me, looking from an SQL perspective, that a >>> one-to-many relationship between a data element, a dimension (category) and >>> dimensional element (category combo) would be much more effieicnet, and >>> highly usable from an SQL perspective. As I mentioned in my mail, I am not >>> sure how easy this would be to implement in a procedural language like Java, >>> but I assume it should be possible to either do it this way, or rewrite my >>> Postgres proprietary query in standard SQL (which there are ways to do with >>> ANSI SQL). This would require modification to the data model (similar to >>> the table I provided yesterday) and modification to the UI to allow users to >>> 1) select a dimension (category) 2) Select a dimensional element for the >>> given dimension. This would populate the table with a dataelementid, a >>> dimensionid (categoryid) and a dimensional element (cateogryoptionid). >>> >>> My gut feeling this is exactly the same functionality as has currently >>> been implemented for organizational units. Users can define a hierarchy for >>> organizational units, and then assign them to >>> categories/dimension/organizational group sets, decide whether the groups >>> are compulsory and exlusive, and then assign a particular organizational >>> unit to a particular group (which is analogous to a dimensional element). >>> Organizational group sets define the dimension, and one-to-one assignment of >>> an organizational unit to a particular organizational group defines which >>> dimensional element the organizational unit is a member of. These dimensions >>> can then be used in PivotTable analyses, where the orgunitgroupsets become >>> dimensions, and orgunitgroups become dimensional elements.. >>> >>> I beleive that data elements are no different than organizational units. >>> They should be able to be grouped into some sort of hierarchy and pivoted on >>> any dimension. Data elements groups establish a one-to-many relationship >>> between data elements and a data element group, but there is no concept of >>> how data element groups relate to each other. >>> I think this is perhaps the same concept you mention, ReportSet. >>> >>> I suspect we would need to potentially rethink the entire concept of >>> multidimensionality if we really wanted to get it right. It would see to me >>> that the DHIS datamodel and associated aggregation methods have been >>> hardwired into aggregation across time (period) and geography (orgunit). >>> What we can do with PivotTables and (and OLAP) is to aggregate across any >>> possible dimension, slicing as you mention ,on any dimension . I am not sure >>> this will be so simple to implement but I think there is a way to do it, >>> without major modifications. >>> >>> I am not sure it solves the SDMX issue. There are potential issues >>> related to "ragged" dimensions and how these get handled. Some data elements >>> might have three dimensions, while others may have more. I have not thought >>> about this in detail, but know it is an issue with cross-tab queries in SQL. >>> You normally have to know how many dimensions you are working with in order >>> to perform a cross-tab, but there are dynamic solutions. Perhaps this could >>> be dealt with somehow in SDMX. >>> >>> Anyway ,I am rambling. Hope this mail helps though to push my point >>> further. Once I get the SQL from SL, I will see if perhaps it has been done >>> already, and that I am just writing long emails for nothing. :) >>> >>> Regards, >>> Jason >>> >>> >>> >>> >>> >>> On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe <bobjolli...@gmail.com> >>> wrote: >>> > Hi Jason and Johan >>> > >>> > I'm really pleased to see you having this discussion as I have been >>> > grappling with a similar issue which involves unravelling categories, >>> > category options and combos into something more familiar. I have >>> reached >>> > similar conclusions regarding nomenclature: >>> > >>> > category = dimension >>> > categorycombo - I have been calling a dimension set (it bears a strong, >>> and >>> > useful, resemblance to xslt:attribute-set) >>> > category option - I like your suggestion of DimensionalElement. I am >>> going >>> > to start calling it that too. >>> > >>> > In my case I need to export (and import data) into a standard format >>> called >>> > sdmx. So whereas in the DHIS2 native DXF we export datavalues with >>> > effectively three dimensions (source, period, categorycombooption) the >>> last >>> > dimension is a sort of uber-dimension. Like a peppercorn or a cardamon >>> > seed, when you break it open it explodes its rich complexity of >>> dimensions. >>> > >>> > In sdmx we need the dimensions exploded. So data values look like: >>> > >>> > <dataset> >>> > <datavalue name="TB test given" uid="44344 ...44" gender="Male" >>> age="0-5" >>> > value="32" /> >>> > <datavalue name="TB test given" uid="44344 ...44" gender="Female" >>> > age="0-5" value="38" /> >>> > .. >>> > </dataset> >>> > >>> > My approach to unpicking the dimensions from the dxf file is to >>> transform it >>> > with an xslt transformation which is still incomplete but seems to work >>> > well. >>> > >>> > One other nomenclatures issue which has surfaced as a result is what we >>> call >>> > a "dataset". In DHIS2, if I understand correctly, a dataset >>> corresponds >>> > roughly to all the dataelements which might occur on a datacollection >>> form. >>> > If we view all dataelements as having just the three "dimensions" then >>> all >>> > is well, but if we explode the actual dimensions then we have an >>> issue. In >>> > the sdmx model a dataset consists only of dataelements with the same >>> > dimensionset. After discussing this with Ola we have reached the >>> conclusion >>> > that we need another level of grouping, primarily for the UI - eg >>> FormSet or >>> > ReportSet which allows us to group related datasets. But that is an >>> aside >>> > from what you are talking about. >>> > >>> > I know that you guys can do magic with sql, but it seems that we should >>> try >>> > to capture some of this and place it down in the datamodel API. It >>> occurs >>> > to me that for a multidimensional dataelement we might benefit from >>> some >>> > utility methods to retrieve slices and dices which might assist in >>> > constructing the pivot tables around dimensions. Does this sound like >>> the >>> > right thing to do. >>> > >>> > Regards >>> > Bob >>> > >>> > 2009/9/24 <joha...@ifi.uio.no> >>> >> >>> >> Jason, >>> >> I will leave to others to comment the code, but I have a few >>> comments... >>> >> >>> >> > I have done a bit more thinking on this, and would like to offer >>> some >>> >> > more examples up for discussion. >>> >> > >>> >> > Basically, we have a lot of data elements that are somehow related >>> to >>> >> > each other, similar to my kooky example in my original mail. I >>> assume >>> >> > this is fairly common throughout other HMIS systems. Here, malaria >>> >> > attendance is broken down into various dimensions/category by >>> patient >>> >> > type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 >>> and >>> >> > over 5). But say you want to be able to pivot to look at outpatient, >>> >> > inpatient and deaths totals (i.e. summed up by age). Well, you could >>> >> > create a separate data element for this, but it sure would be nice >>> to >>> >> > be able to Pivot the data somehow. >>> >> >>> >> In the Sierra Leone db, Edem and Romain set up views that pulled the >>> >> categories through into a "Category" pivot field, which you can then >>> use >>> >> to get what you want. Simply tick the categories (see below) you want >>> to >>> >> see, and group them together in excel. Maybe Edem and Romain can help >>> >> further here. >>> >> >>> >> >>> >> > Dimension ? Category >>> >> > Dimensional element ? Category option ? Category combo ( I think) >>> >> >>> >> The right symbol disappeared from my reply-mail here, but some >>> >> clarification: >>> >> >>> >> Crosstab Dimension (age AND gender) = Category combo >>> >> Dimension (age, gender) = Category >>> >> Dimensional element (inpatient, outpatient, death, under1, 1-5, and >>> over >>> >> 5) = Category option >>> >> >>> >> So by assigning a DE the category combo of "gender_age", you get 9 >>> >> dimensional elements, 3 category options (in category age) by 3 >>> category >>> >> options (in category gender) >>> >> >>> >> Johan >>> >> >>> >> >>> >> >>> >> >>> >> > Anyway, here is the helper table I created. >>> >> > >>> >> > CREATE TABLE test_dataelementcategorycombo >>> >> > ( >>> >> > test_dataelementid integer NOT NULL, >>> >> > test_dataelementcategoryid integer NOT NULL, >>> >> > test_dataelementcategorycomboid integer NOT NULL, >>> >> > CONSTRAINT pk_testdataelementcategory PRIMARY KEY >>> >> > (test_dataelementid, test_dataelementcategoryid, >>> >> > test_dataelementcategorycomboid) >>> >> > ) >>> >> > WITH (OIDS=FALSE); >>> >> > >>> >> > So this is a real simple table which references a data element, a >>> >> > data element category, and a data element combo. The reference to a >>> >> > data element category may be redundant, but anyway, lets leave it in >>> >> > for now. >>> >> > >>> >> > I populated the table with some data, which will be used to assign >>> >> > dimensions to data elements. It looks like this in my DB, which >>> looks >>> >> > like this. >>> >> > >>> >> > 309;25250;25251 >>> >> > 309;25257;25255 >>> >> > 348;25250;25252 >>> >> > 348;25257;25255 >>> >> > 455;25250;25253 >>> >> > 455;25257;25255 >>> >> > >>> >> > but of course this is meaningless to you. What do these values >>> >> > correspond >>> >> > to? >>> >> > >>> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 >>> Year";"Age";"Under >>> >> > 1" >>> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 >>> >> > Years";"Age";"Age 1-5" >>> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 >>> Years";"Age";"Over >>> >> > 5" >>> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient >>> >> > status";"OPD" >>> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 >>> >> > Years";"Patient status";"OPD" >>> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient >>> >> > status";"OPD" >>> >> > >>> >> > which can be produced by the following view. >>> >> > >>> >> > CREATE OR REPLACE VIEW vw_dataelements_dimensions AS >>> >> > SELECT dataelement.name, dataelementcategory.name AS dimension, >>> >> > dataelementcategoryoption.name AS dimension_element >>> >> > FROM dataelement >>> >> > JOIN test_dataelementcategorycombo ON >>> >> > test_dataelementcategorycombo.test_dataelementid = >>> >> > dataelement.dataelementid >>> >> > JOIN dataelementcategory ON dataelementcategory.categoryid = >>> >> > test_dataelementcategorycombo.test_dataelementcategoryid >>> >> > JOIN dataelementcategoryoption ON >>> >> > test_dataelementcategorycombo.test_dataelementcategorycomboid = >>> >> > dataelementcategoryoption.categoryoptionid; >>> >> > >>> >> > So, that view just provides a human readable view of those integers >>> >> > that I populated in the the test_dataelementcategorycombo table I >>> >> > created above. This table just assigns particular data elements to >>> >> > different category options (dimensional elements). >>> >> > >>> >> > OK, so far so good, but the problem now is, how to use this with the >>> >> > aggregatedatavalue table? If we try and join this table directly, we >>> >> > will have issues with duplicates in the pivot table, so we need to >>> >> > transform the data slightly. >>> >> > >>> >> > This should do the trick. >>> >> > >>> >> > SELECT * FROM crosstab >>> >> > ( >>> >> > 'SELECT name, dimension, dimension_element FROM >>> >> > vw_dataelements_dimensions ORDER BY 1,2,3', >>> >> > 'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY >>> 1 >>> >> > ASC' >>> >> > ) >>> >> > as >>> >> > ( >>> >> > name character varying(230), >>> >> > age character varying(160), >>> >> > status character varying(160) >>> >> > ); >>> >> > >>> >> > >>> >> > which returns this record set >>> >> > >>> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 >>> Years";"Age >>> >> > 1-5";"OPD" >>> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over >>> >> > 5";"OPD" >>> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under >>> >> > 1";"OPD" >>> >> > >>> >> > >>> >> > OK, admittedly, I cheated a bit and used the crosstab function of >>> >> > Postgresql, but I assume that this query could be rewritten with a >>> few >>> >> > more lines of code in standard SQL or some procedural language like >>> >> > Java. Now, this record set looks like something that I can almost >>> use >>> >> > with the aggregateddatavalue table simply by joining up the table on >>> >> > the appropriate dataelementid and pulling everything into a pivot >>> >> > table. I would not have any duplicated values and would have >>> columns >>> >> > like data element name, period, orgunit, age, patient status and of >>> >> > course the value of the data element. I hope that part is pretty >>> >> > clear. Just join up that table to the aggregateddata table, and you >>> >> > have pretty much what is needed to pull the data directly into a >>> >> > PivotTable for further analysis. >>> >> > >>> >> > This is not a complete example, but it is very close to what I need >>> >> > here ,and I think this type of functionality would be much more >>> useful >>> >> > than the current data element categories functionality. Basically, >>> all >>> >> > that would be required, at least initially, would be another user >>> >> > interface screen to allow the definition of which category(ies) and >>> >> > category options a data element is a member of. The rest could ,in >>> the >>> >> > first instance be executed with custom SQL (obviously, I am partial >>> to >>> >> > this language and hobbled by the fact that I do not know Java), but >>> >> > eventually this would need to be implemented somehow in Java. >>> >> > >>> >> > I am not sure if this really solves all of the issues surrounding >>> >> > multidimensional analysis of data elements, but it seems to solve >>> the >>> >> > issues that I am having by trying to assign some sort of dimensional >>> >> > hierarchy to data elements (similar to the exclusive/compulsory >>> >> > functionality of orgunits). Any thoughts on this? >>> >> > >>> >> > Best regards, >>> >> > Jason >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering >>> >> > <jason.p.picker...@gmail.com> wrote: >>> >> >> >>> >> >> >>> >> >> On Wed, Sep 16, 2009 at 10:13 PM, <joha...@ifi.uio.no> wrote: >>> >> >>> >>> >> >>> >> However, there does seem to be the ability to assign >>> dimensions, >>> >> >>> there >>> >> >>> >> does >>> >> >>> >> not seem to be the ability to assign particular elements within >>> >> >>> those >>> >> >>> >> dimensions to a particular DHIS data element. >>> >> >>> >>> >> >>> >>> >> >>> Just some more clarification here: you can make category combos >>> which >>> >> >>> you >>> >> >>> assign to data elements. However, it is not possible to assign >>> just >>> >> >>> specific parts of a category combo (only some of the category >>> options) >>> >> >>> to >>> >> >>> a data element. >>> >> >> >>> >> >> Yes, this was exactly what I wanted. Assigning different categories >>> >> >> would >>> >> >> seem to break the dimensionality. >>> >> >> >>> >> >>> >>> >> >>> Then you must make a specific category (as the only one in >>> >> >>> or part of a new category combo) with just those options. It can >>> be >>> >> >>> hell; >>> >> >>> in Tajikistan there were way over 20 categories I think, at least >>> 10 >>> >> >>> just >>> >> >>> on various age groups. >>> >> >>> >>> >> >>> Johan >>> >> >>> >>> >> >> >>> >> >> This was my fear. >>> >> >> >>> >> >> I will need to do some testing and see. I still fear it is not >>> exactly >>> >> >> the >>> >> >> intended functionality. >>> >> >> >>> >> >> Basically, I think I need something akin to the >>> exclusive/compulsory >>> >> >> groups >>> >> >> that are in place for organizational units, but instead, for >>> arbitrary >>> >> >> dimensions. I will give a try and see what happens. >>> >> >> >>> >> >> Thanks, >>> >> >> Jason >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> > >>> >> >>> >> >>> >> >>> >> _______________________________________________ >>> >> Mailing list: >>> >> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>> >> Post to : dhis2-devs@lists.launchpad.net >>> >> Unsubscribe : >>> >> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>> >> More help : https://help.launchpad.net/ListHelp >>> > >>> > >>> >>> >>> _______________________________________________ >>> Mailing list: >>> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>> Post to : dhis2-devs@lists.launchpad.net >>> Unsubscribe : >>> https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs> >>> More help : https://help.launchpad.net/ListHelp >>> >>> >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~dhis2-devs >> Post to : dhis2-devs@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~dhis2-devs >> More help : https://help.launchpad.net/ListHelp >> >> > > > -- > Cheers, > Knut Staring > -- Cheers, Knut Staring
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp