Hi, On Thu, July 2, 2020 3:10 pm, Christian Gruber wrote: > Hi, > > while further studying the bayesian import matching algorithm I'm now at > the point, where I wanted to understand, how the bayes formula is > applied to the problem of matching transactions to accounts using > tokens. But I need further information, since it doesn't come clear to > me what is really calculated there. > > The implementation can be found in the following functions in Account.cpp: > > * get_first_pass_probabilities() > * build_probabilities() > * highest_probability() > > Actually, the latter could be omitted as it only selects the account > with the highest matching probability. > > Studying the code and the rare comments on the implementation it seems > to be a variant of the naive bayes classifier > <https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Probabilistic_model> > with the tokens used as (independent) "features" and the accounts used > as "classes". But comparing this algorithm to the code leaves several > questions open. > > Does anybody know a more precise algorithm description, on which the > implementation in GnuCash is based on?
I'm not sure how detailed you need right now; I helped with some of the initial implementations but I'm sure it's all been rewritten by now. The idea is that the description/memo strings are tokenized and used as inputs into the probabilities that the transaction would go into the target account. If you have a high-enough probability it will auto-select that account for that transaction. When you assign an account (during import), it adds those tokens to the account's list of tokens for future guessing. Did you have a specific question about the process? For the complete algorithm you can look at the code. It's really not all that complicated (or at least it wasn't when first implemented). > Regards, > Christian -derek -- Derek Atkins 617-623-3745 de...@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant _______________________________________________ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel