This is my understanding after having looked at some of the matcher code recently. It may not be exactly correct as I wasn't looking specifically at the Bayes algorithm but I did skim through the code.
It tokenizes (parses) for key information in the import data. It seems to maintain a table of the tokens present in a transactions when you manually assign a transaction to a particular transfer account or accept an automatic transaction to that account. If a transfer account is specified in the import data then it will use that otherwise when you import a transaction the code appears to examine the tokens in the incoming transaction, compares these with the table it has accumulated and calculates a probability that this transaction should be assigned to a given account. It then searches for and calculates a probability that it is an exact match or close match for an existing transaction. Based on these probabilities it marks the transaction in the matcher window either to be Added, Reconciled or Updated and sets the appropriate checkbox and displays the relevant probabilities in the colour bars and if the probability of a match in the transfer account is considered high enough, it actually assigns a transfer account to the transaction. If the matcher is completely successful in assigning a transfer account and there is no match to an existing transaction, the transaction is marked in a light green. The transaction row gets maked in a red/pink if the matcher decides it should not be imported, e.g. matches an existing transaction sufficiently closely. Any of these automatic choices can be overridden by the user by unchecking the checked box, checking another or clicking on the assigned transfer column to start the account selection dialog as you do for an unassigned transaction (row in transfer account column is marked yellow and assigned to the Imbalance account). I think the changes David Carlson referred to was to limit the date range over which an exact match was searched for but not exactly sure on the details. The matcher regularly rejects some regular transactions of mine that are direct deposits to a payee because my bank includes a unique request number from the payee in each transaction description. If there is a number or word present in the description that is indicative of a payee that always goes to a given account then the matcher will generally get it. It also often identifies my fortnightly pension payments as updates to existing records because all the tokens match and the amounts are normally exactly the same but the dates match within the date window designed to catch near misses for updating. When my pension amount changes occasionally the importer will recognise it as a new transaction to be imported rather than rejecting it as a match to an already existing transaction. The matching tables are updated when you actually import the transactions into GnuCash not when you do the assignment of a transfer account so automatic matches which are overridden and successful automatic matches you accept all update the tables. After about 3 months retraining after the update to v3, most of my transactions from my bank for regular payments are picked up by the matcher mainly on the payee name if it is included in the description. New payments from a new payee aren't going to be matched although when I use a different pharmacy the word pharmacy in the description is sufficent to get a correct match to my medical expense account. I recently updated the OFX import instructions in the documentation as a result of the new feature to allow multiple selection of transactions and assignment of a transfer account to that selection in the matcher, which are planned to be incorporated in V4 and started on the CSV import documentation but I have to explore recent changes in that more fully before completing an update. I had also planned to document the matcher process above more fully after exploring the code a bit more carefully. These changes are also conflated with David T's reorganization of the Guide and Help manual layout, so will not appear in the current V3 documentation. David Cousens ----- David Cousens -- Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-User-f1415819.html _______________________________________________ gnucash-user mailing list [email protected] To update your subscription preferences or to unsubscribe: https://lists.gnucash.org/mailman/listinfo/gnucash-user If you are using Nabble or Gmane, please see https://wiki.gnucash.org/wiki/Mailing_Lists for more information. ----- Please remember to CC this list on all your replies. You can do this by using Reply-To-List or Reply-All.
