Am 24.05.20 um 01:52 schrieb David Cousens:
Christian,

I guess it depends on whether there is a performance advantage in using the
previously stored data for the transfer account associations over
constructing the frequency table on the fly. The search for matching
transactions only takes place within a narrow time window around the date of
import, so it is unlikely to canvas enough transactions to be able to
construct a valid frequency table from tokenized data within that window.
The stored frequency table would generally contain data from a much wider
range of transactions and would take much longer to construct on the fly
each time it was needed.
I'm only thinking about account matching (bayesian matching), not transaction matching. For this of course it would be necessary to work with all historical data, not only with a few transactions within a narrow time window. Can you tell, if it would be a considerable performance load to construct the frequency table on the fly from all historical transactions related to a transfer account?
I have also pondered whether it could be usefully augmented by using data
from transactions entered manually which have not been imported for the file
associations.  Could be of value where you have a good set of historical
records but it would only need a one off run through the existing
transactions to gather the data. Unless you confined it to running on a
specific set of accounts to which you import data it might cause bloat of
the data file with unnecessary and unused information.

A possible advantage of constructing the frequency table on the fly could be, that it is always up-to-date. If the user sets the "wrong" other account during import for instance and corrects this after the import, the import match map still contains the wrong matching information at the moment and will also not be corrected after the import.

Also manually entered transactions would be considered, right.

A one-off manual run through all transactions to update the import match map could be a good alternative to constructing it on the fly. Sounds good.

Why do you think, a run through all transactions "might cause a bloat of the data file"? The current import match map also contains all, maybe unused or unnecessary data from all matched accounts. I still assume in this case, that the import match map is related to one transfer account only, which already limits the set of accounts from which the import match map is constructed.

I have examined the stored data in my  data file with the import map editor
and found that there was a lot of data stored which contributes little to
the matching for the transfer account ( dates, connectors (a, and, the
etc.), transaction amounts ?) which often have a fairly uniform frequency
for all accounts which were used as transfer accounts. After a bit of
pruning of the stored data my matching reliability seemed to improve a bit.
Ok, I see. If the import match map has to be pruned to get reliable results from the bayesian matching algorithm, a frequency table, which is constructed on the fly or is rebuilt on a one-off run, is a big disadvantage. If it is constructed on the fly, nothing can be pruned. And if it is rebuilt, all pruned data will back after the run.
I don't know at the moment if the tokens stored for transfer account
matching are a subset of the tokens used for transaction matching (haven't
checked) but restricting the set of tokens used may possibly improve
performance and reduce the amount of data stored if all tokens associated
with a transaction are currently being stored in the frequency table which
is what I suspect from examining my import map data.
Yes, this is the current situation, every token is stored. Do you have suggestions, how tokens could be automatically pruned in a meaningful way?

David Cousens



-----
David Cousens
--
Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
_______________________________________________
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel
_______________________________________________
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel

Reply via email to