Phil Longstaff wrote: > > I just saw an e-mail on the libgda mailing list which said that > GdaQuery might be deprecated in the 4.0 series. I just saw that too. It is disappointing. You have put in a lot of work using the GdaQuery.
I had compiled the V4 branch with the idea of testing it with gnucash-gda. Not necessarily a good idea as it would be testing both gda and gnucash-gda simultaneously. Thus, so far, I have not installed it. As you will see below, I am having enough trouble with the current (3.1.2) PostgreSQL provider. >>> >>> It looks as though the postgresql backend has a bug. Using the >>> object model, I create an XML-based representation of the table. I >>> create the 'slot_id' column with type integer and add the 'AUTOINC' >>> flag to it. For mysql and sqlite, this is an INTEGER AUTO_INCREMENT >>> column, while for postgresql, it should be a SERIAL column. >> Agreed. I have tracked this far enough to file a reasonable bug >> report against the PostgreSQL backend. I just ran out of time to do >> it. I'll get to it in the next couple of days. > > Unfortunately, any fix for this will take time to propagate to the > distributions and make it out to people who want to use the GDA > backend with postgresql. We might need a workaround. Yes, I think an "ALTER TABLE....." might work as a workaround. It would not be a good thing to require the user to do, however. Then one gets into code like "for this version of this Another possibility would be to do a "Select MAX(slot_id) FROM slots;" followed by an INSERT with a specified slot_id. Two problems. I don't like imposing this overhead on non-PostgreSQL users. And the PostgreSQL provider has especially poor performance with its GdaQuery, so that imposing an extra round trip on it has a higher penalty than it should. That's why they're called "workarounds" - they aren't good enough to be called "solutions". On the other hand, there is time required to finish this gnucash backend. Though, I doubt if that would be enough time. I have now filed a bug report about this SERIAL problem: http://bugzilla.gnome.org/show_bug.cgi?id=515528 > >> >>> I don't really want to special-case the code for various backends >>> since libgda is supposed to remove that need. There are other cases >>> where it might be required. >> Agreed. Not having special cases for different backends is a major >> reason to use libgda. And it has enough backends that supporting >> special cases would be an undesirable amount of work. > > Well, we could limit support to sqlite, mysql and postgresql. They > are the 3 backends for which the provider status is "fully functional". I hadn't noticed that. I had planned to limit my testing to those three backends though. It would simply be too much work for me to test a dozen libgda backends. I wonder what not "fully functional" implies. For example, when you execute a GdaQuery, it checks to see if the provider object implements a function for this purpose. If so, it calls the function on the provider. If not libgda uses fallback code to implement a GdaCommand object and calls that. PostgreSQL's provider implements the function (poorly from a performance point of view), and MySql's does not. Both work. The reason I say PostgreSQL's provider does this poorly from a performance point of view is that it converts all such queries into prepared statements. This entails three round-trips to the db - one to prepare the statement, one to execute it, and another to deallocate it! This is done for every query. There is a "FIXME" in the code here, but it looks like it will only be useful when the client uses GdaQuery with parameters, which gnucash-gda does not. I foresee a risk that the fix to this FIXME may cause an extremely large buildup of cached prepared statements, which, in turn, could cause its own performance degradation..... Additionally, for every query, some other part of the PostgreSQL provider executes two queries per table column - checking for unique and primary keys. I am told this comes from gda_data_model_describe_column(), but I haven't studied the code to see why that is called for each gnucash-gda query. (Eg. would this happen for a SELECT MAX(slot_id).. scalar query?) I have filed a performance bug on this one: http://bugzilla.gnome.org/show_bug.cgi?id=513149 Investigating why none of these queries ever returned a single row led me to: http://bugzilla.gnome.org/show_bug.cgi?id=513543 Added to the SERIAL problem above, I seem to have found just as many problems with the PostgreSQL provider as with the software I am nominally testing - namely gnucash-gda. >> >>> SQLite, for example, does not allow more than one row to be added in >>> an INSERT statement, whereas mysql and postgresql do. I could speed >>> up initial save to a blank db by writing all accounts at once, for >>> example, but sqlite couldn't handle it and would need them to be >>> written one at a time. Note that the libgda object model does not >>> allow more than 1 row to be written either, so that would need to be >>> done with SQL directly. >> I wasn't aware of this limitation of SQLite. I have not had occasion >> to use it. I've recently looked a little into the manual. Enough to >> see that it supports sub-queries and only does joins through nested >> loops. That may cause performance issues with my suggested queries, >> especially the last one. > > Would the performance be any worse than "SELECT DISTINCT tx_guid FROM > splits..." followed by "SELECT * from SPLIT where tx_guid IN (...)" > which is a nested loop broken up into 2 pieces? These are queries 1 and 2 from the register code. In my other posting ("GDA register open queries"), I propose merging them into one. I do believe that would improve performance. (Testing of said belief is required.) I can't do any real benchmarking right now - my test system is building the latest gnucash-gda. However, roughly, Query A appears to be faster than query 1 followed by query 2 . I believe that Query A will be easier for you to maintain - eg. no code to split up long query strings. Later, I'll do this more rigorously without the unpredictable load of building gnucash-gda concurrently, and not run the .db file on an NFS mount. And also for MySql, and PostgreSQL. Here is how I've benchmarked it: $ time sqlite3 mydata.db <query1.sql >/dev/null The first time I ran the above, I sent its output to a file, and edited the file to produce query2.sql. Additionally, combining the two queries is one less IPC for MySQL (three less (plus two for each column in each of the two tables) for PostgreSQL provider). Mark _______________________________________________ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel