It's does a pretty good job for something you just tried :). Agree, custom, 
per-importer dedup solves all the problems here. Thanks!

On Tuesday, March 30, 2021 at 5:01:22 AM UTC-7 bl...@furius.ca wrote:

> Dedup detection is definitely far from perfect and was just something I 
> tried at the time.
>
> In the new version - beangulp, which Daniele is driving - dedup can be 
> done by importer. I think that per-importer custom dedup is best. For 
> example, any importer that has a unique ID per transaction should leverage 
> this.
>
>
>
>
>
> On Tue, Mar 30, 2021, 06:57 redst...@gmail.com <redst...@gmail.com> wrote:
>
>> Reg. class SimilarityComparator in similarity.py:
>>
>> The final check is:
>>         # Here, we have found at least one common account with a close
>>         # amount. Now, we require that the set of accounts are equal or 
>> that
>>         # one be a subset of the other.
>>         return accounts1.issubset(accounts2) or 
>> accounts2.issubset(accounts1) 
>>
>> I've been instead using a slightly modified version, where I just check 
>> for intersection:
>>         return accounts1.intersection(accounts2)
>>
>> For my use cases, this has worked better in every case. The common case 
>> is an import of a credit card transaction that is modified post-import. On 
>> a subsequent import (with an overlapping date range), dedupe does not work 
>> with the original heuristic.
>>
>> I can't help but wonder if this would be universally better for everyone. 
>> Thoughts?
>>
>> If not, perhaps an option might help users fine tune for their use cases? 
>> Suggestions:
>> --aggressive_match
>> --heuristic=match_on_one_common_posting  (--heuristic would take in a 
>> list)
>>
>> Making dedupe detection better further cuts down ingest effort 
>> <https://reds-rants.netlify.app/personal-finance/the-five-minute-ledger-update/>
>>  
>> (links to 5min ledger update article).
>>
>> Martin, would you be opposed to one of the approaches above?
>>
>> Thanks,
>> -red
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to beancount+...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/beancount/ee41980d-dcea-4e82-879d-9bd41b9d7363n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/beancount/ee41980d-dcea-4e82-879d-9bd41b9d7363n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beancount+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/474fdd2b-bad4-40cf-8f7a-9081888e5c3fn%40googlegroups.com.

Reply via email to