Reg. class SimilarityComparator in similarity.py: The final check is: # Here, we have found at least one common account with a close # amount. Now, we require that the set of accounts are equal or that # one be a subset of the other. return accounts1.issubset(accounts2) or accounts2.issubset(accounts1)
I've been instead using a slightly modified version, where I just check for intersection: return accounts1.intersection(accounts2) For my use cases, this has worked better in every case. The common case is an import of a credit card transaction that is modified post-import. On a subsequent import (with an overlapping date range), dedupe does not work with the original heuristic. I can't help but wonder if this would be universally better for everyone. Thoughts? If not, perhaps an option might help users fine tune for their use cases? Suggestions: --aggressive_match --heuristic=match_on_one_common_posting (--heuristic would take in a list) Making dedupe detection better further cuts down ingest effort <https://reds-rants.netlify.app/personal-finance/the-five-minute-ledger-update/> (links to 5min ledger update article). Martin, would you be opposed to one of the approaches above? Thanks, -red -- You received this message because you are subscribed to the Google Groups "Beancount" group. To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/ee41980d-dcea-4e82-879d-9bd41b9d7363n%40googlegroups.com.