Dedup detection is definitely far from perfect and was just something I
tried at the time.

In the new version - beangulp, which Daniele is driving - dedup can be done
by importer. I think that per-importer custom dedup is best. For example,
any importer that has a unique ID per transaction should leverage this.





On Tue, Mar 30, 2021, 06:57 redst...@gmail.com <redstre...@gmail.com> wrote:

> Reg. class SimilarityComparator in similarity.py:
>
> The final check is:
>         # Here, we have found at least one common account with a close
>         # amount. Now, we require that the set of accounts are equal or
> that
>         # one be a subset of the other.
>         return accounts1.issubset(accounts2) or
> accounts2.issubset(accounts1)
>
> I've been instead using a slightly modified version, where I just check
> for intersection:
>         return accounts1.intersection(accounts2)
>
> For my use cases, this has worked better in every case. The common case is
> an import of a credit card transaction that is modified post-import. On a
> subsequent import (with an overlapping date range), dedupe does not work
> with the original heuristic.
>
> I can't help but wonder if this would be universally better for everyone.
> Thoughts?
>
> If not, perhaps an option might help users fine tune for their use cases?
> Suggestions:
> --aggressive_match
> --heuristic=match_on_one_common_posting  (--heuristic would take in a list)
>
> Making dedupe detection better further cuts down ingest effort
> <https://reds-rants.netlify.app/personal-finance/the-five-minute-ledger-update/>
> (links to 5min ledger update article).
>
> Martin, would you be opposed to one of the approaches above?
>
> Thanks,
> -red
>
> --
> You received this message because you are subscribed to the Google Groups
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to beancount+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beancount/ee41980d-dcea-4e82-879d-9bd41b9d7363n%40googlegroups.com
> <https://groups.google.com/d/msgid/beancount/ee41980d-dcea-4e82-879d-9bd41b9d7363n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beancount+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CAK21%2BhO%2Bz0MUf5O2OxxLwmDeKwO3ivKTLtonN68mL-KT5OvhSw%40mail.gmail.com.

Reply via email to