TL;DR: hi! I'm a programmer! The attached patch to one line of code gives a 50% reduction in XML load CPU use! Skeptical? I was.
*Introduction* (of me) My name is Chris Carson. I wrote code for a living from 1976-1986, and then entered management but continued dabbling in code as a hobbyist. C, C++, Unix, Linux, blah blah. *A Story* I have financial data in Quicken dating back to 1991. When Intuit sold off Quicken I decided I needed a path to another financial management package. I wrote a processor to deal with the QIF file duplicate transfer problem (that's another story) and imported my data into Gnucash. The resulting XML save file is 55.8Mb (uncompressed. Yes, I store it compressed.) The XML file takes ~38 seconds of user CPU time to load on my build of the Gnucash 3.3 maint stream. (For reference, starting Gnucash with an empty simple account file takes ~4.5 seconds of user CPU time on my machine.) I did a relatively tedious run of callgrind which showed that about half of that time was being consumed dom_chars_handler(...) and checked_char_cast(...), both in the libgnucash/backend/xml directory. Turns out dom_chars_handler(...) is called with an enormous multi-line string. It copies the whole thing and validates it, nibbles off a few bytes, and returns, only to be called again with the remainder of the enormous multi-line string to copy, validate, nibble again. *The Patch* I tried a couple of different fixes to this. The patch below copies off and validates only the bytes being consumed. It brings the user CPU to startup and load my XML file from ~38 seconds to ~20.5 seconds, and given that 4.5 seconds of that is startup I make that about a 50% improvement in load speed. I tried a more aggressive fix for funsies and it wasn't much better. I have tested this *ONLY* on the load of my largeish XML file. But the patched code reads well. What would you guys advise as next steps? Patch included below signature, and separately as a file. Regards, Chris Carson ===================== >From b4e1911f774bfc292e97cffd2492a0257d0aee3c Mon Sep 17 00:00:00 2001 From: "Christopher D. Carson" <chriscarson60...@gmail.com> Date: Sun, 23 Dec 2018 20:48:02 -0600 Subject: [PATCH] Performance fix in dom_chars_handler: use g_strndup instead of g_strdup Because the origin string can be extraordinarily long, you get more benefit from this than you would imagine --- libgnucash/backend/xml/sixtp-to-dom-parser.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgnucash/backend/xml/sixtp-to-dom-parser.cpp b/libgnucash/backend/xml/sixtp-to-dom-parser.cpp index e6ba43039..9aba0801a 100644 --- a/libgnucash/backend/xml/sixtp-to-dom-parser.cpp +++ b/libgnucash/backend/xml/sixtp-to-dom-parser.cpp @@ -95,7 +95,7 @@ static gboolean dom_chars_handler ( { if (length > 0) { - gchar* newtext = g_strdup (text); + gchar* newtext = g_strndup (text,length); xmlNodeAddContentLen ((xmlNodePtr)parent_data, checked_char_cast (newtext), length); g_free (newtext); -- 2.19.2
From b4e1911f774bfc292e97cffd2492a0257d0aee3c Mon Sep 17 00:00:00 2001 From: "Christopher D. Carson" <chriscarson60187@gmail.com> Date: Sun, 23 Dec 2018 20:48:02 -0600 Subject: [PATCH] Performance fix in dom_chars_handler: use g_strndup instead of g_strdup Because the origin string can be extraordinarly long, you get more benefit from this than you would imagine --- libgnucash/backend/xml/sixtp-to-dom-parser.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgnucash/backend/xml/sixtp-to-dom-parser.cpp b/libgnucash/backend/xml/sixtp-to-dom-parser.cpp index e6ba43039..9aba0801a 100644 --- a/libgnucash/backend/xml/sixtp-to-dom-parser.cpp +++ b/libgnucash/backend/xml/sixtp-to-dom-parser.cpp @@ -95,7 +95,7 @@ static gboolean dom_chars_handler ( { if (length > 0) { - gchar* newtext = g_strdup (text); + gchar* newtext = g_strndup (text,length); xmlNodeAddContentLen ((xmlNodePtr)parent_data, checked_char_cast (newtext), length); g_free (newtext); -- 2.19.2
_______________________________________________ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel