It's does a pretty good job for something you just tried :). Agree, custom, per-importer dedup solves all the problems here. Thanks!
On Tuesday, March 30, 2021 at 5:01:22 AM UTC-7 bl...@furius.ca wrote: > Dedup detection is definitely far from perfect and was just something I > tried at the time. > > In the new version - beangulp, which Daniele is driving - dedup can be > done by importer. I think that per-importer custom dedup is best. For > example, any importer that has a unique ID per transaction should leverage > this. > > > > > > On Tue, Mar 30, 2021, 06:57 redst...@gmail.com <redst...@gmail.com> wrote: > >> Reg. class SimilarityComparator in similarity.py: >> >> The final check is: >> # Here, we have found at least one common account with a close >> # amount. Now, we require that the set of accounts are equal or >> that >> # one be a subset of the other. >> return accounts1.issubset(accounts2) or >> accounts2.issubset(accounts1) >> >> I've been instead using a slightly modified version, where I just check >> for intersection: >> return accounts1.intersection(accounts2) >> >> For my use cases, this has worked better in every case. The common case >> is an import of a credit card transaction that is modified post-import. On >> a subsequent import (with an overlapping date range), dedupe does not work >> with the original heuristic. >> >> I can't help but wonder if this would be universally better for everyone. >> Thoughts? >> >> If not, perhaps an option might help users fine tune for their use cases? >> Suggestions: >> --aggressive_match >> --heuristic=match_on_one_common_posting (--heuristic would take in a >> list) >> >> Making dedupe detection better further cuts down ingest effort >> <https://reds-rants.netlify.app/personal-finance/the-five-minute-ledger-update/> >> >> (links to 5min ledger update article). >> >> Martin, would you be opposed to one of the approaches above? >> >> Thanks, >> -red >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Beancount" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to beancount+...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/beancount/ee41980d-dcea-4e82-879d-9bd41b9d7363n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/beancount/ee41980d-dcea-4e82-879d-9bd41b9d7363n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Beancount" group. To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/474fdd2b-bad4-40cf-8f7a-9081888e5c3fn%40googlegroups.com.