Hi devs, 
I’m having some problems with making reports/dashboards/output based on tracker 
data.

We have made a «Cause of death» program, quite similar to the inpatient program 
on the demo instance and what is currently being used several places. Each case 
will have as a minimum age, sex and a cause of death (based on an ICD-10 based 
short list of about 120 diagnosis). 

Based on this data, we want to make reports that include as a minimum:
- number of cases by gender                     => can be done in ER/EV
- number of cases by diagnosis          => can be done in ER/EV
- number of cases by age group          => can NOT be done in ER/EV (unless 
I’ve missed something)
- number of cases by diagnosis group    => can NOT be done in ER/EV (unless 
I’ve missed something)
- combinations of the above, e.g. case by diagnosis and age group/sex => can 
NOT be done in ER/EV (unless I’ve missed something)

Since this output can quite easily be created based on aggregate data (it is 
similar to the OPD datasets), I decided to set up aggregation queries to make 
the reports we need. I therefore made a categorycombo with gender and age 
group, and one data element for each diagnosis. I then grouped the diagnosis 
into data element groups to be able make reports based on disease group. I 
tested this with a couple of queries, and it worked fine.

The problem with this approach is that 118 diagnosis * 2 sexes * 22 age group = 
4 956 aggregation queries, and this causes the aggregation to take a very long 
time. When I try to run manual aggregation for just one week with 4-500 cases, 
it takes about 10 minutes (4/8 core CPU, SSD, 3/6 GB memory for 
tomcat/postgres). Running the aggregation for the last 6 months would take a 
several hours. This might work for a cause of death program where the number of 
cases is relatively low, but to do similar analysis on an admission program 
(where I know some countries have 19000+ cases some weeks) is just not possible.

So I guess my question is whether there are any ideas on how this can be made 
to work. One obvious thing I see is that the aggregation only uses one CPU (as 
opposed to the analytics), could that be made more efficient? Or what would be 
even better was if ER/EV could support this type of analysis - the general 
requirement would be some sort of grouping functionality? That is something I 
think would be useful in many/most programs (e.g. grouping mothers by age group 
in ANC program)

Olav


_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to     : dhis2-devs@lists.launchpad.net
Unsubscribe : https://launchpad.net/~dhis2-devs
More help   : https://help.launchpad.net/ListHelp

Reply via email to