On Tue, Feb 23, 2010 at 9:15 AM, Geert Janssens <janssens-ge...@telenet.be> wrote: > On Tuesday 23 February 2010, Donald Allen wrote: >> As I've mentioned in other posts, I have a pretty large gnucash >> datafile -- more than 20 Mb uncompressed. I've been testing the SQL >> backend and I'm concerned about the performance, particularly startup >> performance. >> >> I've been doing this testing on an inexpensive little HP desktop >> machine, dual-core 2.8 Ghz AMD Athlon, 2 Gb memory, 300 Gb 7200 rpm >> SATA drive, Arch Linux, 2.6.32 kernel. It may have been cheap, but >> it's damned fast (I've been in the computer business since 1964 until >> I retired last Oct. and the cost-performance of today's hardware just >> boggles my mind, especially when I think about what we put up with 20 >> or 30 years ago; you haven't lived until you've tried getting your >> work done *and* stayed sane while sitting in front of a VT100 talking >> to an overloaded Vax 780 running 4.1BSD; motto: "It was hard to build, >> it ought to be hard to use"). From my gnucash xml file, I've created >> sqlite3 and postgresql databases containing the same data. >> >> Here are the average data-load timings, in seconds, from just a couple >> of tests per storage method (this is from the point gnucash says >> "reading file" until it's up): >> >> xml 9.10 >> sqlite3 45.41 >> postgresql 45.46 >> >> My mental model, which may be wrong, of what the SQL backend is doing >> is that it is reading the entire database at startup into memory, as >> opposed to retrieving the data as-needed (which is the way I'd guess >> gnucash+database would be architected if it was being written from >> scratch; I'll guess further that given the file-based history of >> gnucash, it's existing architecture precluded using the incremental >> approach when adding the database backend). I'm guessing this because >> of the amount of time it takes to start up when pointed either at >> postgresql or sqlite3 versions of the same data. I am further assuming >> that, with the SQL backend, as the user adds new things (accounts, >> transactions, etc.), new objects get inserted into the database and >> modified objects get updated, on the spot. I'm guessing this because >> as I make changes, the 'save' button remains grayed-out. So the >> primary advantage of the database backend is that file saves are not >> necessary. But, at least in my case, I am paying about a 36 second >> price at startup to avoid the file saves. File saves on my machine >> take about 2 seconds, much faster than reading the file, probably >> because the data is being written to the buffer cache. So I'd need to >> do 18 file saves during a gnucash session to be worse off (kept >> waiting) with the xml file than with the data in either flavor of >> database. And that assumes that I am always waiting for file saves to >> complete, which is not always the case (I frequently do them after >> completing a chunk of work, e.g., entering an investment transaction >> from a statement, and the file-save happens while I turn my attention >> to the paperwork to figure out what I need to do next). >> >> While I didn't do a lot of timed tests and I've used the word "guess" >> an awful lot above (corrections to what I've said above from someone >> who actually knows what (s)he is talking about are perfectly welcome), >> the numbers confirm what I already knew from many more untimed tests, >> that at least in my case, startup takes an unacceptably long time. If >> this is the way the released version performs and my >> back-of-the-envelope analysis of the tradeoff is correct, I don't >> think I'd use the database stuff and just continue with the xml file. >> But if you think I've missed something here, please feel free. >> > Your assumptions on how things work are correct. > > And I noticed this performance decrease as well. > > There is one difference between the xml and the sql backends that may > influence this (at least in part): the sql backend writes lots of debug > information to gnucash.trace at present. I don't know how much impact this > has, I haven't tested without debug information, but if we disable the debug > information before the 2.4 release, it will surely narrow the gap.
I'm seeing trace files on the order of .5 Mb. As I mentioned earlier, saving my xml file takes about 2 seconds. It's about 2.5 Mb (over 20 Mb uncompressed) and the 2 seconds includes the time to compress it. Writing the trace file is not nearly as hard a job and the periodic writes should be to the buffer cache on any reasonable machine. So I'll guess (again) that the gap-narrowing won't amount to much. I hope I'm wrong :-) > > In the future this may still be improved such that data is only queried for > when needed, but this requires a lot of internal modifications. This would not > have been possible for the next stable release. I understand why this was done as it was; I said so in my previous email. If I had been running this project, I'm quite sure I'd have done the same thing, so I'm not at all critical. I also note that I'm a bit of an extreme case. I've been tracking my finances with a computer for a long time (I started using Quicken in 1994 when I bought my first PC when Windows NT 3.5 became available; I was not about to use Windows 3.1/MSDOS!), some of that ancient history is in my gnucash file, so I'm pretty sure I'm on the right tail of the file-size distribution. People with more reasonable-sized xml files will not notice the performance hit as much, and the database back-end frees them from having to worry about file-saves and replaying log files in the event of a crash. /Don > > Geert > _______________________________________________ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel