Dear devs, I am experiencing problems when handling category combinations. Our protoype with 5 dimensions went through the process of generating categoryOptionCombinations (~20.000 records) quite well. 7 dimensions (~400.000) worked as well, although it took a very long time.
Now we defined the next datamodel with 10 dimensions (expecting ~5Mio categoryOptionCombinations) and the process dies without further notice. Last words in catalina.out: * INFO 2016-06-07 13:29:33,783 Building object-bridge maps (preheatCache: true, 3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15]) * INFO 2016-06-07 13:29:36,779 Building object-bridge maps took 2.99 seconds. (DefaultObjectBridge.java [http-bio-8180-exec-15]) * INFO 2016-06-07 13:29:36,896 'admin' update org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership, uid: SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15]) Ten dimensions with not extraordinarily big option sets is actually not unusual and rather slim for multi-dimensional data-models in data warehouses, so I'd expect DHIS2 to be able to handle this easily. Could of course be a memory problem (tried up to 14g for tomcat on a 4-core Ubuntu 14.04 server, DHIS 2.23) Before I'll start experimenting with other parameters, I am hoping to get some hints on known limitations or workarounds from you (not allowed: reducing the number of options or categories, sql-hacks :-) ). Is there any info on whether optimizations on this process are being planned in the kernel? Some observations on the process: * during generation (either when saving the categoryCombination or in the data maintenance menu): - long names - cOCs are generated with generated names that are getting extremely long as they are mere concats of the involved categoryOptions. Could there be an option to just use the codes as basis or to leave away the names completely? Could be one reason for a memory problem and performance issues. - long log entries - every single entry is logged in catalina.out with several lines of text, causing catalina to become extremely big. - during execution lots of Java-memory are being used and no DB-memory, which looks to me as if all the logic is happening in the java machine. It might be more usefull to transfer more logic into SQLs to the DB (e.g. use DB cross-joins for combining options) as the DB will be more efficient. - because of the log entries I assume that every single combination is being persisted into the DB with a single SQL statement, causing millions of single SQL requests. Prefer batch SQL instead of single record processing. * during import/export of categoryOptionCombinations: - prefer batch SQL instead of single record processing - huge log entries in catalina.out due to several lines of text per combination I'd be very happy about comments. Thanks in advance, Uwe _______________________________________________ Mailing list: https://launchpad.net/~dhis2-devs Post to : [email protected] Unsubscribe : https://launchpad.net/~dhis2-devs More help : https://help.launchpad.net/ListHelp

