Christian,
I haven't experimented to know whether constructing the frequency
table on
the fly creates a performance bottleneck or not but am guessing the
original
developer thought it might. It would require a detailed look at the code
involved but my suspicion would be that the performance penalty is
likely to
be significant.
My comment about bloat is that at present data is only maintained for
accounts you specifically import data into and if that data is
stored. If it
isn't then bloat doesn't apply obviously. Any sort of generalized
procedure
could allow selection of accounts for which Bayesian matching is
required,
i.e. those for which importing is used to input data. My initial
thought was
that you would run it for all accounts but it is really only
necessary for
the specific subset of accounts into which you import data. It would
then
require the ability to run the procedure on an account if it occurred in
import data but didn't have existing account matching data. If it is
on the
fly then no problem it can run whenever a new account being imported
into
appears in the imported data. The most common use case is probably
importing
data to one specific account but GnuCash is also able to specify the
account
being imported into in the import data itself. I haven't looked at
how the
frequency table is currently stored in memory but I am guessing it is
constructed in memory when the data file is read in.
The up-to-date aspect is one advantage and if the current procedure is
changed to improve performance then that is not hampered by the
presence of
historical data which would be updated automatically when the
procedure is
run. If the table is stored as it is at present and a procedure was
available to trawl the current transactions for an account then it
can be
kept up to date by running that procedure periodically. the use of
data from
manually entered transactions would then be incorporated whether on
the fly
or just run as required.
Having a standalone procedure to trawl an existing file to update the
stored
data for an account would allow exploration of whether this is
likely to be
a significant performance hit if it was run on the fly so that could
perhaps
be a first step. The core part of the code to store the data has to
exist
in the matcher code already and it will be a case of wrapping this in
a loop
through the transactions existing in an account and setting up the gui
interface to select accounts to run on.
The problem with pruning the data is that GnuCash has no way of knowing
apriori which tokens are most relevant. I would think that date
information
is not really relevant and amount/value information does little in most
cases to identify a transfer account.
The main difficulty I have with transfer account assignment is that
some
regular transactions use a unique code in the description each time they
occur with no separate unique identifier of the transaction source.
My wife
and I both have separte gym membership subscriptions and the transaction
descriptions neither identify the gym or for which of us the
transaction
applies. Options are to persuade the source to include specific data
or only
use a single account to record both but I like to track both our
individual
and joint expenses
Some regular transactions also get matched to previous payments in the
transaction matching within the date range window where the amounts and
descriptions are usually identical. The current 42 day window
captures both
fortnightly and monthly regular income transactions for example.
This only
affects a few transactions each month and I don't have huge numbers of
transactions to process now that I have retired but that may not be
the case
for other users. Maybe making the date range window adjustable rather
than
fixed might be a cure for this. Setting it at <14 days would cure the
problems I have for example, but that again would not work for
everybody.
I am currently committed to a bit on the documentation front so I
will be
unlikey to consider this for the near future in other than general
terms but
someone else may be willing to take it up.
David
-----
David Cousens
--
Sent from:
http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
_______________________________________________
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel