On Wed, Feb 24, 2010 at 10:32 AM, Phil Longstaff <plongst...@rogers.com> wrote: > On Wed, 2010-02-24 at 09:59 -0500, Derek Atkins wrote: >> Donald Allen <donaldcal...@gmail.com> writes: >> >> >> I think true measurements will be the only way to find out what causes >> >> delays >> >> where. >> > >> > Of course. I spent a big chunk of my career doing performance analysis >> > on various bits of complicated software and learned very young (the >> > hard way) that if you think you know how your software behaves and >> > where the time is going, you are probably wrong. Measurement, done >> > correctly, is the only way to get to the truth reliably. I sometimes >> > had to insist on measurement by people who worked for me who were as >> > cocky (and wrong) as I was when I was young :-) >> > >> > But until the measurements are done, there's no harm in doing some >> > educated guessing, so long as the guessing doesn't replace the >> > measuring. If you are frequently right, it can help you set your >> > measurement priorities. If you are frequently wrong, it reminds you >> > that you aren't too good at modeling the behavior of software in your >> > head. >> >> For what it's worth, the old Postgres backend was dog slow too. >> >> I certainly encourage you to perform profiling to determine where our >> bottlenecks are. > > Another thing that I haven't done too much of is trying to add extra > indexes or optimize queries. All SQL statements are logged to > gnucash.trace. Feel free to add indexes and/or change queries to > improve performance. > > In general, one major problem is that certain areas of the code just > assume that the data is loaded. Until we remove those assumptions or > provide alternatives, it seemed the safer route to just load all data at > start time.
I have one quick data point for you: I ran 'top' while loading my data a few times from Postgresql. 'top' is not exactly a surgical measurement tool, but it can get you started in the right direction by letting you know what the bottleneck resource is, e.g., I/O-limited, cpu-limited, etc. What I'm seeing is that for the vast majority of the time while the data is loading, gnucash-bin is using 100% of a processor (2 core system). A postgres server process shows up a distant second occasionally, and then there's a brief period at the end of the loading where there's a burst of cpu activity by the postgres server process. But most of the time is spent waiting while the gnucash-bin process computes like crazy. This is 99% user-mode time. Now the trick is to get more specific about where the time is going. I will offer one of my usual guesses: I don't *think* that missing indices (resulting in full-table scans) would produce behavior like this, because I believe the query processing is done on the server side, so I'm postulating that in that situation, you would see high cpu utilization by the server, which is not the case. If I'm right, then this might be good news, if the bulk of the time is being spent in actual gnucash code (which can be improved once you understand the problem), as opposed, say, to libpq code. Anyway, as we discussed earlier, my guessing is not a substitute for actual measurement. /Don > > Phil > > _______________________________________________ > gnucash-devel mailing list > gnucash-devel@gnucash.org > https://lists.gnucash.org/mailman/listinfo/gnucash-devel > _______________________________________________ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel