Re: Newbie Setting Up CSV Import / Ingest

Martin Blais Sun, 16 Sep 2018 06:39:30 -0700

There are also some examples in the source code, here:
https://bitbucket.org/blais/beancount/src/default/examples/ingest/office/





On Sun, Sep 16, 2018 at 12:15 AM <[email protected]> wrote:

> Hey,
>
> I'm in a very similar boat, were you able to post your importer files
> publicly? I think seeing the conversation of you working through this,
> along with your finished files would make your files a lot more easier to
> understand than the current examples I've seen.
>
> Cheers,
>
>
> On Friday, 20 July 2018 02:22:48 UTC+10, [email protected] wrote:
>>
>> I figured it out. The dumb_categorizer does .lower(): and I was passing
>> it a search term with a capital letter in it. Now I'm off to the races.. :)
>>
>> I think maybe I might publish my working setup once I get it all cleaned
>> up, as yet another example for others to follow.
>>
>> TRS-80
>>
>> --
>> Securely sent with Tutanota. Claim your encrypted mailbox today!
>> https://tutanota.com
>>
>> 19. Jul 2018 10:44 by [email protected]:
>>
>> OK, I am successfully calling dumb_categorizer from CSV Importer by
>> defining it at beginning of .config file, and then passing categorizer =
>> dumb_categorizer to CSV Importer. I know this because I replaced it with a
>> simple print("something") and I got a bunch of "something" on stdout. So
>> the categorizer is getting called, it's just either not matching or not
>> attaching the other leg... ?
>>
>> Any help would be greatly appreciated.
>>
>> TRS-80
>> --
>> Securely sent with Tutanota. Claim your encrypted mailbox today!
>> https://tutanota.com
>>
>> 19. Jul 2018 08:52 by [email protected]:
>>
>> I suppose I should have included a link to the CSV importer source:
>> https://bitbucket.org/blais/beancount/src/80d30d6896cf5fdcff8c1156cab77107ee8e0f96/beancount/ingest/importers/csv.py?at=default&fileviewer=file-view-default
>>
>> Down toward the bottom (line 283) is where the categorizer gets called.
>>
>> Last night at my local LUG, I volunteered to do a talk next month on
>> plain text accounting, and got the green light. So it would be nice to get
>> this working by then. :)
>>
>> TRS-80
>> --
>> Securely sent with Tutanota. Claim your encrypted mailbox today!
>> https://tutanota.com
>>
>> 19. Jul 2018 08:32 by [email protected]:
>>
>> It is still unclear to me where to put this categorizer code? I have
>> tried putting it here, there, and everywhere. I am using the provided
>> generic CSV importer, which calls it, but I cannot figure out where to put
>> it or how to instantiate it or whatever it is you need to do in Python.
>>
>> Since I don't really know Python, I am happy to pay someone few bucks to
>> help me get this working.
>>
>> (from
>> https://bitbucket.org/blais/beancount/pull-requests/24/improve-ingestimporterscsv/diff
>> ):
>>
>> def dumb_categorizer(txn):
>>     # At this time the txn has only one posting
>>     try:
>>         posting1 = txn.postings[0]
>>     except IndexError:
>>         return txn
>>
>>     # Guess the account(s) of the other posting(s)
>>     if 'nutella' in txn.narration.lower():
>>         account = 'Expenses:Food'
>>     else:
>>         return txn
>>
>>     # Make the other posting(s)
>>     posting2 = posting1._replace(
>>         account=account,
>>         units=-posting1.units
>>     )
>>
>>     # Insert / Append the posting into the transaction
>>     if posting1.units < posting2.units:
>>         txn.postings.append(posting2)
>>     else:
>>         txn.postings.insert(0, posting2)
>>
>>     return txn
>>
>>
>>
>> --
>> Securely sent with Tutanota. Claim your encrypted mailbox today!
>> https://tutanota.com
>>
>> 25. Jun 2018 16:33 by [email protected]:
>>
>> OK, stayed up late last night and actually got all my character stripping
>> accomplished in Python within the provided tools. Yay me (first Python code
>> I ever wrote)! :)
>>
>> OK so basic CSV importers are working, now trying to figure out where to
>> stick the categorizer code I found here:
>> https://bitbucket.org/blais/beancount/pull-requests/24/improve-ingestimporterscsv/diff
>>
>> I been trying here and there without success as of yet. Any
>> hints/pointers would be greatly appreciated!
>>
>> TRS-80
>> --
>> Securely sent with Tutanota. Claim your encrypted mailbox today!
>> https://tutanota.com
>>
>> 24. Jun 2018 15:21 by [email protected]:
>>
>> On Sun, Jun 24, 2018 at 11:58 AM <[email protected]> wrote:
>>
>>> [...]But by all means, please correct me if I am wrong, or have missed
>>> something.
>>>
>>> So now that I have attained some success, and see the light at the end
>>> of the tunnel, it looks like I will have to do ~ the following:
>>> 1.Manually download CSV file from bank.
>>>
>> Yes
>>
>>
>>> 2.Do some pre-processing, either manually or with macros in Emacs, or
>>> (more likely) programatically, using scripts and sed, etc. to remove parens
>>> and $s.
>>>
>> You can write code in your importer to do that.
>>
>>
>>> 3.Run the actual bean-import.
>>>
>> You mean bean-extract.
>>
>> 4.Run some post processing (I would like to change date: metadata name to
>>> transaction_date: because I think it's more descriptive).
>>>
>> Do that in your importer code as well.
>>
>>
>> 5.And then finally hand copy these transactions into my main .beancount
>>> file, double checking and tweaking (aka "clearing") them in the process,
>>> categorizing remaining ones into Expense accounts and perhaps updating my
>>> scripts in the process.
>>>
>> Yes.
>>
>> I suppose 2, 4, and 5 could be done all in Emacs, but I'll just have to
>>> figure out some workflow now that works for me.
>>>
>> Yes.
>>
>>
>>>
>>> Also not mentioned is somehow programatically inserting the other leg of
>>> the transaction (which Expense account). I agree with Martin's basic
>>> philosophy on this, and still plan on manually reviewing everything,
>>> however I am already seeing that the bulk of transactions are the same
>>> places in my case and could easily be categorized with some simple matching
>>> (either in a post matching script or within bean-extract using
>>> categorizer). I need to look into this more, and also experiment or read up
>>> on how the de-duplication works, as I think it's probably related.
>>>
>>
>> You can write some function for your importer to do that with your
>> particular rules if it saves you time.
>>
>>
>> Anyway, I will continue to report on what I find as I go along, and even
>>> though I'm not getting any replies
>>>
>> Short emails with direct questions -> more replies more quickly
>>
>>
>>
>>> hopefully this will either encourage others to try and set this up or
>>> perhaps help other noobs who come along later looking for more in depth
>>> info (or perhaps stumble across similar error messages searching the
>>> internet) and it eventually helps someone.
>>>
>>> Helpful tips, encouraging words, or even just letting me know if anyone
>>> is actually reading my idiotic ramblings are always welcomed. :D
>>>
>>
>> Sounds like you're making great progress!
>> Unfortunately automating the importing still requires writing Python code
>> and I see no way around that, I wish it was easier.
>>
>>
>>
>>>
>>> TRS-80
>>> --
>>> Securely sent with Tutanota. Claim your encrypted mailbox today!
>>> https://tutanota.com
>>>
>>> 22. Jun 2018 19:21 by [email protected]:
>>>
>>> Yeah I was completely on the wrong track before (I think). But I am on
>>> the right one now (I think)?
>>>
>>> So what I have done is just copy the csv.py file and save it as
>>> __init__.py in my importers/suncoast_g directory. Then I put the following
>>> into ledger.config:
>>> https://paste.pound-python.org/show/popHoa0wvVE2OiPCqIAL
>>>
>>> But now when doing bean-extract I get "ValueError: CSV config without
>>> header has non-index fields: {'[DATE]': 'Posted Date', '[TXN_DATE]':
>>> 'Transaction Date', '[NARRATION1]': 'Description', '[CREDIT]': 'Deposit',
>>> '[DEBIT]': 'Withdrawal', '[BALANCE]': 'Balance'}"
>>>
>>> Yes my CSV have headers. I been searching the internet for that error,
>>> but still scratching my head. Also tried to change '[DATE]' to 'DATE' etc.
>>> but that didn't seem to make a difference either.
>>>
>>> Of course, I could be completely off track (this is my fourth different
>>> approach). I been flailing around at this all day and a good part of
>>> yesterday too. Early in the morning until late at night. At this point I
>>> would be willing to send someone a few dollars to help me get this set up.
>>> I am sure I could get other accounts working and maintain it once I can
>>> just get the first one working.
>>>
>>> When I first saw my credit union's CSV file I thought "this should be
>>> easy" because it's very straightforward. I don't need all this complicated
>>> parsing like I have seen in some of the other Importers I have been
>>> studying. Just a straight CSV import. Or so I thought... :/
>>>
>>> Anyway, any help at all would be greatly appreciated at this point. Any
>>> clue might help!
>>>
>>> TRS-80
>>> --
>>> Securely sent with Tutanota. Claim your encrypted mailbox today!
>>> https://tutanota.com
>>>
>>> 22. Jun 2018 14:19 by [email protected]:
>>>
>>> OK I sought and received some help in @python. I think I am on a much
>>> better track now. I don't know where I got my original __init__.py from,
>>> some similar thread here I think.
>>>
>>> But now I have downloaded from source the utrade one from:
>>> https://bitbucket.org/blais/beancount/src/65212d1176bb427a7883d2593edbd0e0545a145a/examples/ingest/office/importers/utrade/__init__.py?at=default&fileviewer=file-view-default
>>> and am modifying that to my needs. I now see that I missed a whole bunch of
>>> the methods listed in "Writing an Importer" section of "Importing External
>>> Data" Docs. It will take me a while to work through it but I will post
>>> something back later, including results. I just didn't want anyone to spend
>>> time posting a long reply in the meantime.
>>>
>>> Fun fun! :)
>>>
>>> TRS-80
>>>
>>> --
>>> Securely sent with Tutanota. Claim your encrypted mailbox today!
>>> https://tutanota.com
>>>
>>> 22. Jun 2018 12:08 by [email protected]:
>>>
>>> OK, so this is quite challenging for someone who doesn't really know
>>> Python. However I think it's a good exercise not only for myself but also
>>> to help other newbies who would like to try and get this awesome feature
>>> working.
>>>
>>> I have read everything I can in source and mailing list about CSV Import
>>> / Ingest and I've made some progress, but now I'm stuck.
>>>
>>> Apologies in advance for ugly formatting, Google Groups apparently do
>>> not support inline text formatting, and I am communicating with the group
>>> via email.
>>>
>>> I've tried to (mostly) follow the naming conventions in the examples but
>>> it seems they have changed over time. Anyway, file structure looks like so:
>>> ~/fin
>>>     |---documents
>>>     |---Downloads
>>>     |---importers
>>>     |    |---suncoast_g
>>>     |         |---__init__.py   (this file shared below)
>>>     |    |---__init__.py        (this file is empty)
>>>     |---ledger.beancount
>>>     |---ledger.config         (I have seen this also referenced as
>>> .import in docs)
>>>
>>> Here is my ledger.config file:
>>> --------------------(begin ledger.config file)--------------------
>>> #!/usr/bin/env python3
>>> """Example import configuration."""
>>>
>>> # Insert our custom importers path here.
>>> # (In practice you might just change your PYTHONPATH environment.)
>>> import sys
>>> from os import path
>>> sys.path.insert(0, path.join(path.dirname(__file__)))
>>>
>>> from importers import suncoast_g
>>> #from importers import acme_pdf
>>>
>>> from beancount.ingest import extract
>>> #from beancount.ingest.importers import ofx
>>>
>>>
>>> # Setting this variable provides a list of importer instances.
>>> #
>>> # Removed the following from below to replace with my own, saved for
>>> reference
>>> #
>>> #    utrade.Importer("USD",
>>> #                    "Assets:US:UTrade",
>>> #                    "Assets:US:UTrade:Cash",
>>> #                    "Income:US:UTrade:{}:Dividend",
>>> #                    "Income:US:UTrade:{}:Gains",
>>> #                    "Expenses:Financial:Fees",
>>> #                    "Assets:US:BofA:Checking"),
>>> #
>>> #    ofx.Importer("379700001111222",
>>> #                 "Liabilities:US:CreditCard",
>>> #                 "bofa"),
>>> #
>>> #    acme_pdf.Importer("Assets:US:AcmeBank"),
>>> #
>>> CONFIG = [
>>>     suncoast_g.Importer("Assets:Suncoast:Checking-G"),
>>> ]
>>>
>>>
>>> # Override the header on extracted text (if desired).
>>> extract.HEADER = ';; -*- mode: org; mode: beancount; coding: utf-8;
>>> -*-\n'
>>> --------------------(end ledger.config file)--------------------
>>>
>>> OK now the __init__.py that is in suncoast_g contains following:
>>> --------------------(begin __init__.py file)--------------------
>>> #!/usr/bin/env python3
>>>
>>> #
>>> # Configuration file for extracting Suncoast-G data
>>> #
>>>
>>> from beancount.ingest import regression
>>> from beancount.ingest.importers import csv
>>>
>>> from beancount.plugins import auto_accounts
>>>
>>>
>>> class Importer(csv.Importer):
>>>
>>>     config = {csv.Col.DATE: 'Posted Date',
>>>               csv.Col.TXN_DATE: 'Transaction Date',
>>>               csv.Col.NARRATION: 'Description',
>>>               csv.Col.AMOUNT_CREDIT: 'Deposit',
>>>               csv.Col.AMOUNT_DEBIT: 'Withdrawal',
>>>               csv.Col.BALANCE: 'Balance'}
>>>
>>>     def __init__(self, account):
>>>         csv.Importer.__init__(
>>>             self, self.config,
>>>             account, 'Currency',
>>>             ('Posted Date,Transaction Date,Description,'
>>>              'Deposit,Withdrawal,Balance'),
>>>             1)
>>>
>>>     def get_description(self, row):
>>>         payee, narration = super().get_description()
>>>         narration = '{} ({})'.format(narration, row.category)
>>>         return payee, narration
>>> --------------------(end __init__.py file)--------------------
>>>
>>> I have just copied this stuff and tried to figure it out. I'm sure I've
>>> got something wrong in here but I don't really know what I'm doing. FYI
>>> here is what the data looks like which is in G.csv in Downloads:
>>>
>>> Posted Date,Transaction Date,Description,Deposit,Withdrawal,Balance
>>> 6/4/2018,6/4/2018,Withdrawal Debit Card SOME BAR & GRILL CITY ST Card
>>> XXXX,,($59.83),$229.15
>>>
>>> OK I think that's all the relevant info. So now when I do:
>>>
>>> ~/fin$ bean-identify ledger.config Downloads
>>>
>>> I get:
>>>
>>> **** /home/myname/fin/Downloads/A Sunnet History 6186156
>>> 23032018_21062018.csv
>>> **** /home/myname/fin/Downloads/G.csv
>>>
>>> Which I think means it is identifying those 2 files (the only ones in
>>> there) as CSV, correct? I will point out that G.csv is an Asset account and
>>> is my first target here. The other one is a Liability account (credit card)
>>> and therefore has different fields (only one amount, and no balance). But I
>>> figure once I get this one working, that other one (and subsequent others)
>>> should be pretty easy.
>>>
>>> OK so now when I do:
>>>
>>> ~/fin$ bean-extract ledger.config Downloads
>>>
>>> I get:
>>>
>>> **** /home/myname/fin/Downloads/A Sunnet History 6186156
>>> 23032018_21062018.csv
>>>
>>> ****
>>> /home/myname/fin/Downloads/G.csv
>>>
>>> ERROR:root:Importer importers.suncoast_g.Importer:
>>> "Assets:Suncoast:Checking-G".extract() raised an unexpected error: CSV
>>> config without header has non-index fields: {<Col.DATE: '[DATE]'>: 'Posted
>>> Date', <Col.TXN_DATE: '[TXN_DATE]'>: 'Transaction Date', <Col.NARRATION:
>>> '[NARRATION1]'>: 'Description', <Col.AMOUNT_CREDIT: '[CREDIT]'>: 'Deposit',
>>> <Col.AMOUNT_DEBIT: '[DEBIT]'>: 'Withdrawal', <Col.BALANCE: '[BALANCE]'>:
>>> 'Balance'}
>>>
>>> ERROR:root:Traceback: Traceback (most recent call
>>> last):
>>>
>>>   File
>>> "/usr/local/lib/python3.6/dist-packages/beancount/ingest/extract.py", line
>>> 187, in extract
>>>
>>> allow_none_for_tags_and_links=allow_none_for_tags_and_links)
>>>
>>>   File
>>> "/usr/local/lib/python3.6/dist-packages/beancount/ingest/extract.py", line
>>> 69, in extract_from_file
>>>     new_entries = importer.extract(file, **kwargs)
>>>   File
>>> "/usr/local/lib/python3.6/dist-packages/beancount/ingest/importers/csv.py",
>>> line 189, in extract
>>>     iconfig, has_header = normalize_config(self.config, file.head())
>>>   File
>>> "/usr/local/lib/python3.6/dist-packages/beancount/ingest/importers/csv.py",
>>> line 340, in normalize_config
>>>     "{}".format(config))
>>> ValueError: CSV config without header has non-index fields: {<Col.DATE:
>>> '[DATE]'>: 'Posted Date', <Col.TXN_DATE: '[TXN_DATE]'>: 'Transaction Date',
>>> <Col.NARRATION: '[NARRATION1]'>: 'Description', <Col.AMOUNT_CREDIT:
>>> '[CREDIT]'>: 'Deposit', <Col.AMOUNT_DEBIT: '[DEBIT]'>: 'Withdrawal',
>>> <Col.BALANCE: '[BALANCE]'>: 'Balance'}
>>>
>>> ;; -*- mode: org; mode: beancount; coding: utf-8; -*-
>>>
>>> And this is where I'm currently stuck. I feel like it's something dumb,
>>> something not pointing at something else correctly but I don't know enough
>>> Python (yet) to figure it out myself. Any halp would be greatly
>>> appreciated. :)
>>>
>>> TRS-80
>>> --
>>> Securely sent with Tutanota. Claim your encrypted mailbox today!
>>> https://tutanota.com
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Beancount" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/beancount/LFcF9ZJ--3-0%40tutanota.com
>>> <https://groups.google.com/d/msgid/beancount/LFcF9ZJ--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Beancount" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/beancount/LFciKzu--3-0%40tutanota.com
>>> <https://groups.google.com/d/msgid/beancount/LFciKzu--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Beancount" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/beancount/LFdnLh3--3-0%40tutanota.com
>>> <https://groups.google.com/d/msgid/beancount/LFdnLh3--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Beancount" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/beancount/LFmJI7Y--B-0%40tutanota.com
>>> <https://groups.google.com/d/msgid/beancount/LFmJI7Y--B-0%40tutanota.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/beancount/CAK21%2BhNT9Wvhd9EtFvp_F6sNKBV4NAFBmw_yJyu_umkHPwY%2Bsw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/beancount/CAK21%2BhNT9Wvhd9EtFvp_F6sNKBV4NAFBmw_yJyu_umkHPwY%2Bsw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/beancount/LFsdlPg--3-0%40tutanota.com
>> <https://groups.google.com/d/msgid/beancount/LFsdlPg--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/beancount/LHmWkuU--3-0%40tutanota.com
>> <https://groups.google.com/d/msgid/beancount/LHmWkuU--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/beancount/LHmaD4f--F-0%40tutanota.com
>> <https://groups.google.com/d/msgid/beancount/LHmaD4f--F-0%40tutanota.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Beancount" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/beancount/LHmzwng--3-0%40tutanota.com
>> <https://groups.google.com/d/msgid/beancount/LHmzwng--3-0%40tutanota.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Beancount" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beancount/660e92ff-2ba4-4c47-9fbd-eb76b8ec6571%40googlegroups.com
> <https://groups.google.com/d/msgid/beancount/660e92ff-2ba4-4c47-9fbd-eb76b8ec6571%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/CAK21%2BhNo9eyhKOLsaa%3DgrHV%3D-_fmBKv6D6J5kAr4HcVy9BTTEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Newbie Setting Up CSV Import / Ingest

Reply via email to