Re: [GNC-dev] Convert Imap to Flat

2018-11-07 Thread Geert Janssens
Op woensdag 7 november 2018 01:10:03 CET schreef John Ralls:
> > On Nov 7, 2018, at 5:52 AM, Geert Janssens 
> > wrote:> 
> > Op maandag 5 november 2018 23:27:31 CET schreef John Ralls:
> >> That’s not to say that we shouldn’t also do everything we can to speed up
> >> the code--it’s really slow--and certainly suspending UI events while
> >> computing the import matches is a good idea.
> >> 
> >> There’s something else going wrong, though: convert_imap_bayes_to_flat
> >> calls xaccAccountBeginEdit at the start and xaccAccountCommitEdit at the
> >> end. That should raise the edit level and prevent any interior calls to
> >> xaccAccountCommitEdit from doing anything.
> > 
> > This doesn't look wrong per se to me.
> > convert_imap_account_bayes_to_flat is run once for each account. So there
> > will still be the same number of gui refreshes as there are accounts. I
> > don't think we can reduce this further on the account begin/commit level.
> > A complete gui refresh suspend may do better.
> 
> Each account has its own import map and any particular import affects only
> one account’s map, so there should be exactly one effective, i.e. with
> edit_level == 1, commit and exactly one UI refresh.

The new CSV importer can handle imports into multiple accounts (on both 
sides), but that's tangential to this issue.

The conversion as you describe it would be a lazy conversion: only convert 
maps on an as needed basis. An interesting idea which hadn't occurred to me 
and also different from how it's implemented.
Right now the conversion of *all* import maps of all accounts is initiated the 
first time any import map (bayes of course) is needed. So even if the import 
requires only one account's maps, all maps will be converted in one go. Hence 
the iteration over all accounts.

I don't know how 2.6.21 would handle partly converted bayes import maps. I do 
know as it is now the conversion is designed to be run as long as the feature 
flag is not set. That would also have to change if we would like to go for the 
lazy conversion.

The advantage of such lazy conversion is the time required to convert is 
spread over several import runs, so each conversion is likely to take less 
time. The disadvantage is we risk ending up with books that carry hierarchical 
bayes data for an eternity and hence gnucash has to keep code around to handle 
with it. In a full conversion in one go scenario we can at some point 2 major 
releases from now declare this unsupported as we only guarantee backwards 
compatibility for 1 major release series.

What *is* a problem with the one-shot-convert-all scenario is that it takes 
noticable time (in some cases even horribly long) and we don't inform the user 
of what's happening. The conversion should really have been initiated from the 
gui with a progress bar showing something is going on and indicating how far 
the process has run.

> If there’s more than
> one of either inside a call to convert_imap_account_bayes() then
> something’s broken at the QofInstance level. If we’re calling
> convert_imap_account_bayes() on a particular account more than once in a
> session then there’s something wrong with the decision logic that calls it.
> Bob’s printf-profile suggests at least the latter and  your
> "imap_convert_bayes_to_flat's sub functions will call xaccAccountBeginEdit
> and xaccAccountCommitEdit at some point” suggests the former.
> 
> I suppose Aaron thought that running it on an empty or non-existent map
> would take negligible time; if that’s not the case then we can simply check
> for that and quit, but it should be checked in the profiler before we add
> any code.

Agreed

Geert


___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] [GNC] mysql backend, second user (lock, for example)

2018-11-07 Thread Craig Arno
On 11/6/2018 3:54 PM, John Ralls wrote:
> Read about SQLite3 locking: https://www.sqlite.org/lockingv3.html.
> They’re locking virtual memory pages, totally independent of table or
> record structure. In practice what that means is that at the
> application level only using the SQL Transaction API makes sense, the
> application doesn’t have enough visibility of the internals to be able
> to implement finer-grained controls.

Then record locking may not be the best approach to solving the
multi-source SQLite database issue.  For asynchronous concurrency caused
by peer or multi-thread database request input sources either an Event
Queue or Message Queue design pattern could be built as a serializing
front-end for database transactions.

Using a queue has other benefits if you find yourself in a situation
where there are heavy database updates.  As a for instance "Write
updates" could be given priority over "Read requests" so Read Requests
can be assured of returning the latest and most relevant results.  For
SQLite this might be because the user put their database on a slow
device, like a USB stick, or as you suggested earlier, DropBox.

Interesting read on SQLite record locking.  I had no idea.  Thanks!

Craig
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Convert Imap to Flat

2018-11-07 Thread John Ralls


> On Nov 7, 2018, at 5:14 PM, Geert Janssens  wrote:
> 
> Op woensdag 7 november 2018 01:10:03 CET schreef John Ralls:
>>> On Nov 7, 2018, at 5:52 AM, Geert Janssens 
>>> wrote:> 
>>> Op maandag 5 november 2018 23:27:31 CET schreef John Ralls:
 That’s not to say that we shouldn’t also do everything we can to speed up
 the code--it’s really slow--and certainly suspending UI events while
 computing the import matches is a good idea.
 
 There’s something else going wrong, though: convert_imap_bayes_to_flat
 calls xaccAccountBeginEdit at the start and xaccAccountCommitEdit at the
 end. That should raise the edit level and prevent any interior calls to
 xaccAccountCommitEdit from doing anything.
>>> 
>>> This doesn't look wrong per se to me.
>>> convert_imap_account_bayes_to_flat is run once for each account. So there
>>> will still be the same number of gui refreshes as there are accounts. I
>>> don't think we can reduce this further on the account begin/commit level.
>>> A complete gui refresh suspend may do better.
>> 
>> Each account has its own import map and any particular import affects only
>> one account’s map, so there should be exactly one effective, i.e. with
>> edit_level == 1, commit and exactly one UI refresh.
> 
> The new CSV importer can handle imports into multiple accounts (on both 
> sides), but that's tangential to this issue.
> 
> The conversion as you describe it would be a lazy conversion: only convert 
> maps on an as needed basis. An interesting idea which hadn't occurred to me 
> and also different from how it's implemented.
> Right now the conversion of *all* import maps of all accounts is initiated 
> the 
> first time any import map (bayes of course) is needed. So even if the import 
> requires only one account's maps, all maps will be converted in one go. Hence 
> the iteration over all accounts.
> 
> I don't know how 2.6.21 would handle partly converted bayes import maps. I do 
> know as it is now the conversion is designed to be run as long as the feature 
> flag is not set. That would also have to change if we would like to go for 
> the 
> lazy conversion.
> 
> The advantage of such lazy conversion is the time required to convert is 
> spread over several import runs, so each conversion is likely to take less 
> time. The disadvantage is we risk ending up with books that carry 
> hierarchical 
> bayes data for an eternity and hence gnucash has to keep code around to 
> handle 
> with it. In a full conversion in one go scenario we can at some point 2 major 
> releases from now declare this unsupported as we only guarantee backwards 
> compatibility for 1 major release series.
> 
> What *is* a problem with the one-shot-convert-all scenario is that it takes 
> noticable time (in some cases even horribly long) and we don't inform the 
> user 
> of what's happening. The conversion should really have been initiated from 
> the 
> gui with a progress bar showing something is going on and indicating how far 
> the process has run.

Sorry, I’d forgotten that it’s an all-at-once. Given that it’s controlled by a 
feature-flag that’s a reasonable design decision. It points to another possible 
slow-down: Even if convert_imap_account_bayes() takes negligible time on an 
empty map, walking a large account tree looking for maps won’t. We can fix that 
and speed up the eventual conversion a bit by constructing a list or vector of 
accounts while loading the book: The SQL backends can use a single query, 
SELECT guid FROM accounts WHERE guid = (SELECT DISTINCT a.guid FROM accounts = 
a, slots = k WHERE a.guid = s.obj_guid AND s.name = 'import-map-bayes’);[not 
tested, might need a tweak or two]. The XML backend would just add the account 
guid to the list. 

While working with imaps we should see if we can combine the flatting and the 
transfer account name->guid conversion so that they need only one pass through 
the map to accomplish both in cases where both are needed.

I agree that a dialog box informing the user of what’s going on would be good. 
Given the performance problems with progress bars on Windows with HiDPI 
displays I’m not so sure about that part, especially since generating a useful 
progress measure is a problem for most of our progress bars.

Regards,
John Ralls
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] [GNC] mysql backend, second user (lock, for example)

2018-11-07 Thread John Ralls


> On Nov 8, 2018, at 6:42 AM, Craig Arno  wrote:
> 
> On 11/6/2018 3:54 PM, John Ralls wrote:
>> Read about SQLite3 locking: https://www.sqlite.org/lockingv3.html 
>> . They’re locking virtual memory 
>> pages, totally independent of table or record structure. In practice what 
>> that means is that at the application level only using the SQL Transaction 
>> API makes sense, the application doesn’t have enough visibility of the 
>> internals to be able to implement finer-grained controls.
> 
> Then record locking may not be the best approach to solving the multi-source 
> SQLite database issue.  For asynchronous concurrency caused by peer or 
> multi-thread database request input sources either an Event Queue or Message 
> Queue design pattern could be built as a serializing front-end for database 
> transactions.
> 
> Using a queue has other benefits if you find yourself in a situation where 
> there are heavy database updates.  As a for instance "Write updates" could be 
> given priority over "Read requests" so Read Requests can be assured of 
> returning the latest and most relevant results.  For SQLite this might be 
> because the user put their database on a slow device, like a USB stick, or as 
> you suggested earlier, DropBox.
> 
> Interesting read on SQLite record locking.  I had no idea.  Thanks!

Always keep GnuCash’s target audience (Personal/Small Business) in mind: We’re 
definitely not designing for “heavy database updates”. Any business that needs 
an auditor won’t be allowed to use GnuCash because of the complete absence of 
internal controls. In order to use a queue serialization model we’d have to 
disable update queries and allow only inserts for the offline session. 
Remember, no conflict resolution: An update query just overwrites the record, 
there’s no check to see if the old value is what was expected. In GnuCash terms 
that means that when you click on an existing transaction in the register or 
open an Edit Foo dialog box the session needs to acquire an exclusive lock on 
the record *in the database* and hold it until you exit the transaction or 
close the dialog box.

Regards,
John Ralls

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] [GNC] mysql backend, second user (lock, for example)

2018-11-07 Thread Craig Arno
On 11/7/2018 1:42 PM, Craig Arno wrote:
> Using a queue has other benefits if you find yourself in a situation...

For the "offline" update usecase where the primary database is "server"
based, a local SQLite database could be used for local offline work
(travel receipt entry), which the database Event Write Queue could be
tee'd to a file for later "online" update replay.

When "online" status to the remote server based database is restored,
the "remote write queue" could be played to update the remote/server
database.  This would have the net effect of "mirroring" the remote
database transactions locally for one user's offline work.

In the read direction, a query for all database table entries between
the last successful update/connect and now will have to be performed and
the local database updated with new changes from
User/Bookkeeper/Accountant activity.  If there are a "lot" of changes
and online time is short or connection speed slow, and potential for
interruption high, it might be best to queue up read database changes to
disk until the database read "diff" is complete for all tables.  Then
the "read" half of the Sync can then be completed with local disk based
"complete" diff data.  And give the user a little arrow chasing arrow
button to do a "refresh" compare between local and remote databases, if
sync is ever in doubt.  The "arrow chasing arrow button" is consistent
with how remote calendar/contact based office information systems
operate today.  I do this with local machine Thunderbird Calendar and my
OwnCloud server based system.
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] [GNC] mysql backend, second user (lock, for example)

2018-11-07 Thread Craig Arno
On 11/7/2018 3:43 PM, John Ralls wrote:
> Always keep GnuCash’s target audience (Personal/Small Business) in
> mind: We’re definitely not designing for “heavy database updates”.

"heavy database updates" can be caused in a Personal/Small Business
application when two asynchronous events arrive at the same time from a
"second user".  This can also happen if an Offline->Online transaction
replay/sync occurs, especially if a "second user" is working/involved
during replay.
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] [GNC] mysql backend, second user (lock, for example)

2018-11-07 Thread John Ralls


> On Nov 8, 2018, at 9:00 AM, Craig Arno  wrote:
> 
> On 11/7/2018 3:43 PM, John Ralls wrote:
>> Always keep GnuCash’s target audience (Personal/Small Business) in mind: 
>> We’re definitely not designing for “heavy database updates”. 
> 
> "heavy database updates" can be caused in a Personal/Small Business 
> application when two asynchronous events arrive at the same time from a 
> "second user".  This can also happen if an Offline->Online transaction 
> replay/sync occurs, especially if a "second user" is working/involved during 
> replay.

Not my understanding of “heavy database updates” (which would be something like 
> 100K TPS), but OK. 

Regards,
John Ralls

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] [GNC] mysql backend, second user (lock, for example)

2018-11-07 Thread John Ralls


> On Nov 8, 2018, at 8:50 AM, Craig Arno  wrote:
> 
> On 11/7/2018 1:42 PM, Craig Arno wrote:
>> Using a queue has other benefits if you find yourself in a situation...
> 
> For the "offline" update usecase where the primary database is "server" 
> based, a local SQLite database could be used for local offline work (travel 
> receipt entry), which the database Event Write Queue could be tee'd to a file 
> for later "online" update replay.
> 
> When "online" status to the remote server based database is restored, the 
> "remote write queue" could be played to update the remote/server database.  
> This would have the net effect of "mirroring" the remote database 
> transactions locally for one user's offline work.
> 
> In the read direction, a query for all database table entries between the 
> last successful update/connect and now will have to be performed and the 
> local database updated with new changes from User/Bookkeeper/Accountant 
> activity.  If there are a "lot" of changes and online time is short or 
> connection speed slow, and potential for interruption high, it might be best 
> to queue up read database changes to disk until the database read "diff" is 
> complete for all tables.  Then the "read" half of the Sync can then be 
> completed with local disk based "complete" diff data.  And give the user a 
> little arrow chasing arrow button to do a "refresh" compare between local and 
> remote databases, if sync is ever in doubt.  The "arrow chasing arrow button" 
> is consistent with how remote calendar/contact based office information 
> systems operate today.  I do this with local machine Thunderbird Calendar and 
> my OwnCloud server based system.

That’s a lot more complex than any backend I’d want to implement, but 
fortunately GnuCash’s backends are plugins so you’re welcome to write a 
separate one.

Regards,
John Ralls

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] [GNC] mysql backend, second user (lock, for example)

2018-11-07 Thread Craig Arno
On 11/7/2018 4:14 PM, John Ralls wrote:
> Not my understanding of “heavy database updates” (which would be
> something like > 100K TPS), but OK.
Yeah, I may not be using the right terminology, but look at my
suggestions as generally correct if not using the right glossary.  In
the database arena I'm more of a semi-sophisticated user than a domain
expert.  I assume you are a domain expert, at least relative to my
database experience, an opportunity for me to learn.  What may be
confusing is I do have a lot of engineering experience.

In this case I considered processing two asynchronous events (second+
user) arriving at the same time, not raw processing throughput, a
different kind of performance.  Even the "sync data" proposal in a SOHO
environment shouldn't stress today's multi-core, gigabit memory
commodity computers for throughput in a SOHO environment.  I'm thinking
more race conditions caused by async (two+ user) events.  Still thinking
SOHO.

On 11/7/2018 4:16 PM, John Ralls wrote:
> That’s a lot more complex than any backend I’d want to implement, but
> fortunately GnuCash’s backends are plugins so you’re welcome to write
> a separate one.
Fair enough.  Hopefully architecture framework design decisions can
support this sort of future "plugin" expansion.  Guess I'd better look
for "plugin support documentation", see what I can figure out.
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] [GNC] mysql backend, second user (lock, for example)

2018-11-07 Thread John Ralls


> On Nov 8, 2018, at 9:57 AM, Craig Arno  wrote:
> 
> On 11/7/2018 4:14 PM, John Ralls wrote:
>> Not my understanding of “heavy database updates” (which would be something 
>> like > 100K TPS), but OK.
> Yeah, I may not be using the right terminology, but look at my suggestions as 
> generally correct if not using the right glossary.  In the database arena I'm 
> more of a semi-sophisticated user than a domain expert.  I assume you are a 
> domain expert, at least relative to my database experience, an opportunity 
> for me to learn.  What may be confusing is I do have a lot of engineering 
> experience.

I’m nowhere near an expert on multiuser database work, but I have whacked at it 
a bit over the years.

> 
> In this case I considered processing two asynchronous events (second+ user) 
> arriving at the same time, not raw processing throughput, a different kind of 
> performance.  Even the "sync data" proposal in a SOHO environment shouldn't 
> stress today's multi-core, gigabit memory commodity computers for throughput 
> in a SOHO environment.  I'm thinking more race conditions caused by async 
> (two+ user) events.  Still thinking SOHO.

That’s just simple concurrency. The part that may be outside of your experience 
is that it’s concurrency not just between different processes but between 
different computers and in your offline use-case it’s perhaps hard to recognize 
that “concurrent” doesn’t necessarily mean "at the same time”, it just means 
that there are potentially multiple updates of the same record on the two 
disconnected instances that need to be resolved somehow.

> 
> On 11/7/2018 4:16 PM, John Ralls wrote:
>> That’s a lot more complex than any backend I’d want to implement, but 
>> fortunately GnuCash’s backends are plugins so you’re welcome to write a 
>> separate one.
> Fair enough.  Hopefully architecture framework design decisions can support 
> this sort of future "plugin" expansion.  Guess I'd better look for "plugin 
> support documentation", see what I can figure out.

Unfortunately there isn’t any good documentation of how to write a plugin. 
There’s some API documentation at 
https://code.gnucash.org/docs/MAINT/group__Backend.html 
 and 
https://code.gnucash.org/docs/MAINT/group__Object__Private.html, but there’s 
not much detail and there’s no tutorial. You’ll need to study the code in 
libgnucash/engine/qof-backend.cpp and libgnucash/backend/dbi to see how to 
register your new backend.

Regrds,
John Ralls

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel