Am Freitag, 19. Februar 2010 schrieb Jeff Kletsky: > I found the "optimizing reports" thread from October 2007, but not a lot > since then. Has there been any significant understanding gained since > then about what is so slow on what should be a relatively > straightforward task? Was it ever narrowed down to bad algorithms in the > reporting, or something else?
I've created some of the oldest reports in there. I have never done real profiling of the time spent in the report generation - but what I can recall is this: All the "number collecting calculations" in all of the reports are horribly inefficient, or more academic: coded in the wrong level of abstraction. That is, the calculation of the balance difference between start and end of a reporting period is usually done by obtaining the list of splits (transactions) in question from within the scheme code, then summing over all of those splits. Also, the intermediate results are not cached at all, even during one report generation run, so they are calculated over and over again when the report is generated. This turns your O(n) problem into an O(n^2) problem. As I said, the problem is the level of abstaction: Most of the report code uses too low-level accessor functions into the txn data, like "querying for a list of splits", then "accumulating the sum over these splits". It would go a long way to choose more higher-level query functions here, like "querying for a balance difference for splits obeying criterion xy". The "engine" and/or the "libqof" part of gnucash in theory should provide sufficient such higher-level functions (implemented in C); it's unclear if they actually do. However, the problem doesn't stop here: Even if those few higher-level query functions were being used from the schemed side of it, it would turn out their implementation is probably highly inefficient as well, like O(n^2) instead of something less than that. But as long as nobody really used these function for time-critical applications (or cared about the time), nobody had any incentive to work on the optimization here and its functional correctness was enough as a goal. The optimization of the report generation obviously needs work on several levels. I would recommend the following: - Use profiling to identify the 10% part of the report generation that consumes most of the time. Using the time-of-day is probably good enough to identify this section. I would guess the problem is in the calculation of each single amount that is displayed in the report, and most probably the amount calculation is programmed in scheme for the most part and using the "engine" query functions only on a very low level. - Identify some higher-level query function that can fulfil the amount generation task, but refactors its calculation away from the reporting code. This higher-level query function can first be a scheme function. Once the functional correctness is somewhat verified, you can try to port this to C and/or to replace this by use of the libqof query object... Good luck! Regards, Christian _______________________________________________ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel