Hi Olav, I came across a similar case,but even worse with many more disaggregates and and data elements . Short story is that's it was simply not feasible with the aggregation query builder,but rather trivial with a script to pull required data out of the database,aggregate it and then reimport it through the API. Not surprisingly, I chose to use R,but am sure it could be done with other languages. A map between the event data elements and aggregate data elements was used to transform and aggregate from one domain to the other. Robin brought up a similar request a while ago about this ,so looks like something we should really have in the core as opposed to relying on scripts.
Regards, Jason On Jan 29, 2015 1:36 PM, "Olav Poppe" <olav.po...@me.com> wrote: > Hi devs, > I’m having some problems with making reports/dashboards/output based on > tracker data. > > We have made a «Cause of death» program, quite similar to the inpatient > program on the demo instance and what is currently being used several > places. Each case will have as a minimum age, sex and a cause of death > (based on an ICD-10 based short list of about 120 diagnosis). > > Based on this data, we want to make reports that include as a minimum: > - number of cases by gender => can be done in ER/EV > - number of cases by diagnosis => can be done in ER/EV > - number of cases by age *group* => can NOT be done in ER/EV (unless > I’ve missed something) > - number of cases by diagnosis *group* => can NOT be done in ER/EV > (unless I’ve missed something) > - combinations of the above, e.g. case by diagnosis and age group/sex => > can NOT be done in ER/EV (unless I’ve missed something) > > Since this output can quite easily be created based on aggregate data (it > is similar to the OPD datasets), I decided to set up aggregation queries to > make the reports we need. I therefore made a categorycombo with gender and > age group, and one data element for each diagnosis. I then grouped the > diagnosis into data element groups to be able make reports based on disease > group. I tested this with a couple of queries, and it worked fine. > > The problem with this approach is that 118 diagnosis * 2 sexes * 22 age > group = 4 956 aggregation queries, and this causes the aggregation to take > a very long time. When I try to run manual aggregation for just one week > with 4-500 cases, it takes about 10 minutes (4/8 core CPU, SSD, 3/6 GB > memory for tomcat/postgres). Running the aggregation for the last 6 months > would take a several hours. This *might* work for a cause of death > program where the number of cases is relatively low, but to do similar > analysis on an admission program (where I know some countries have 19000+ > cases some weeks) is just not possible. > > So I guess my question is whether there are any ideas on how this can be > made to work. One obvious thing I see is that the aggregation only uses one > CPU (as opposed to the analytics), could that be made more efficient? Or > what would be even better was if ER/EV could support this type of analysis > - the general requirement would be some sort of grouping functionality? > That is something I think would be useful in many/most programs (e.g. > grouping mothers by age group in ANC program) > > Olav > > > > _______________________________________________ > Mailing list: https://launchpad.net/~dhis2-devs > Post to : dhis2-devs@lists.launchpad.net > Unsubscribe : https://launchpad.net/~dhis2-devs > More help : https://help.launchpad.net/ListHelp > >
_______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : dhis2-devs@lists.launchpad.net Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp