I handle this problem by including both the date and the precise
description text as metadata in each posting imported/matched with an entry
from the CSV file.  This allows reliably determining which entries in the
CSV have yet to be matched/imported, and allows detecting problems if a
given (date, description_text) pair occurs more times in the beancount
journal than it does in the CSV file (note that this handles the case of
two entries with the same date and the same exact description text,
corresponding to two different transactions, occurring in the CSV).  It
also allows training a classifier for automatically labeling CSV entries by
the appropriate "other" account, since it preserves the association to the
original CSV description text.

On Sun, Aug 7, 2016 at 12:49 PM, Simon Michael <[email protected]> wrote:

> On 8/7/16 10:17 AM, Erik Hetzner wrote:
>
>> That does seem a lot simpler than the method ledger-autosync uses. On the
>> other
>> hand it ties the system to ensuring that file ordering does not change.
>>
>
> Yes, it's for that case. For more randomly changing data, checksums would
> be more important.
>
> Would it work with multiple input sources?
>>
>
> It should, each input file/source has its own position marker.
>
> Another tricky issue might be overlap. A nice
>> feature of ledger-autosync is that it can handle overlapping imports, e.g.
>> importing from a file where some portion of the transactions are new and
>> some
>> portion are old (known).
>>
>
> That's the point of this scheme - to reliably import just the new CSV
> records from an intermittently updated bank download.
>
> I like the idea of the position marker (contents of FILE.lastimport) being
> flexible, so the deduplication strategy can be changed without changing the
> UI.
>
> Writing extra files is a bit of clutter, but hledger users are used to
> seeing FILE.csv.rules, and I think it's better to keep them visible (not
> dot files) to remind that they exist, can be edited/deleted to adjust the
> last import position, and might contain sensitive data.
>
> PS: I’ve also updated the ledger-autosync README to include some basic
>> information about tracking 401k and investment accounts. ledger-autosync
>> now
>> includes code to properly (mostly) import these type of transactions.
>>
>
> Nice, docs++.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/ms
> gid/beancount/no83bo%24s77%241%40blaine.gmane.org.
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CAKJfoCF89a1tu4z6AUbRnSBt%3D2NOTbJMVRgBQEphHAEzkyzudw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to