OK so this got a little long. Go grab yourself your favorite tasty cold
delicious adult beverage and get comfortable. I did put in some
headings
at least to mitigate the wall of text. :D
I am on my second or third go 'round with Beancount, over a period of
some
years. I have had various levels of success or failure, but for me
personally, I really felt like I was the most successful when automating
as
much as possible.
This post will be about how I arrived at that conclusion, and some
things I
learned along the way. I hope it ends up being useful to others who may
have similar ideas, but perhaps not put all the pieces together yet. Or
maybe need some encouragement, or...?
* Know Thyself
I guess I felt the need to make this post because Martin himself
throughout
the docs seems to put forth his more "manual" way of doing things (as a
way
to keep more "in touch" with his numbers, if I am reading him
correctly).
But perhaps I read too much into that. I can only say that generally I
do
try and follow the recommendations made in docs by founder and those
more
involved in a particular project (especially when I am just starting
out).
I suppose I figure there must be some reason for it, even if I do not
yet
understand what those reason(s) might be... So maybe this is just me
finally gaining enough experience to know what is what, and perhaps more
importantly "knowing myself" enough to recognize what works for me,
personally.
If you prefer the more "manual" approach (or any other approach, for
that
matter) I encourage you to "do what works for you." Thankfully we have
such flexible tools available to us...
* Automate as much as Possible
For me, what seems to work is "as much automation as possible." I still
end up manually doing some stuff of course, but for me if I can get that
down to 10% or 5% (or whatever) the way I see it is I have reduced
90-95%
of the work (and drudgery) involved. I mean, this is what computers are
best at, isn't it?
* Moving from CSV to OFX import
Along those lines, I recently moved from CSV import to OFX. It's still
early, but I am well on my way to nearly //completely// automating my
download, import, and categorization.
Before with CSV I had to log on to my bank and click through stuff, save
the file (and then remember my file naming scheme), etc. and some times
that just became too much friction and sooner or later I would start
falling behind from the simple drudgery of it all.
Further complicating the issue, my bank only keeps "transactions" around
for 90 days, so if I got busy or fell behind, I would be back to
/manually/
entering any "missed" transactions (yeah, right!).
Enter OFX (via ofxclient), which solves these problems by being
completely
scriptable (and thus automateable) tool.
There were a couple bugbears with ofxclient[0] however. The guy is not
really actively maintaining it. However after fixing a couple missing
apostrophes I /finally/ got it to work. I guess my Python must be
getting a
little better, because 1-2 years ago I had already failed once or twice
at
this exact same task. :)
So, on to the next hurdle...
* No (built in) OFX "categorizer"
Anyway so then it was a little disappointing to learn that there is no
callable "categorizer" available in the OFX importer example the same
way
that there was in the CSV importer example.
Until I found a recent post titled "Categorizing transactions
automatically
on import" which solved that particular part of the problem. I left a
more
fleshed out example as a reply to that thread for anyone who is
interested
(search the mailing list for that or "OFX categorizer" etc.).
* Next steps (Selenium WebDriver)
At this point I am satisfied enough in my progress (and have learned
enough) that I felt it would be worth sharing that progress with others.
But already I am looking forward to next steps. And I am getting
excited
about Beancount again. :)
The last days I have already been reading up docs about Selenium
WebDriver.
I have heard about Selenium before of course, but what I think motivated
me
to really give it a try now was an article I recently came across over
at
plaintextaccounting.org[1] by Lee Yingtong Li titled "Using selenium to
scrape/import bank transactions for ledger-cli."[2] This is a quite
recent
article (2020-04-29) as you can see by the link.
Anyway he is using it to get his "transactions" but that is not what I
plan
on using it for (I have OFX for that). For me, the only remaining piece
of
the puzzle that is left to automate is...
* Automatically downloading PDF statements
Like my "transactions", downloading these PDF "statements" was an
exercise
in drudgery, for all the reasons already mentioned above (clicking
through
bank website, remembering file naming convention, etc.).
First I tried doing this through OFX protocol itself. And maybe there
is a
way? The standard would seem to indicate maybe there is. But I made
posts
about this not only here but on ledger mailing list before and received
exactly zero replies so far (which is also why I am not even going to
bother looking them up in order to link to them). So I gave up on that
way
(for now).
So then I got the idea to maybe automate this drudgery using Selenium
(WebDriver).
* Arguments for Selenium WebDriver (in general)
Now, I have not even got this actually working yet, and the
implementation
details will of course be very bank (web site) dependant. So why bother
bringing it up now (or at all, for that matter)?
Well for same reason as posted very early on, mainly I have heard of
this
sort of thing being referred to mostly as "too much trouble" and took
that
assessment at face value. But is it? Some things I learned in my
research
the last few days started to change my mind:
1. So far, the Selenium WebDriver docs[3] seem to be very good. Simple
and
to the point.
2. There are bindings for several different languages. And the lanuage
bindings (I was looking at Python mostly) seem to be quite clean,
straightforward, and easy to remember / intuitive.
3. It appears to be quite a mature and reliable thing nowadays, with
browser vendors like Google and Mozilla (and others) actually
maintaining their own drivers for each particular browser. No more
"PhantomJS" and feeling like you are in some neverending cat and
mouse
with an opponent.
4. Not only that, apparently the whole notion of automated browser /
site
testing has actually become an W3C recommendation by now(!). [4]
It really appears to me to be a completely different dynamic nowadays.
Therefore I would challenge the notion that the ROI is not there. Not
only
is this looking quite easy, but dare I say, /well supported/ even! :)
Of course if I run into some brick wall (or get along swimmingly) I will
try and make some time and remember to report back in either case. :)
Which leads me into my final point...
* Choice of tools
At some point during this whole adventure (a while back) I thought long
and
hard about choice of tools.
There are other ways to accomplish "automation." Mainly online
"aggregators" like Plaid, Mint, and probably some others. I actually
had
signed up for a Plaid developer account at one point, before getting
ofxclient working. Those are certainly viable, perhaps even
preferrable,
depending on your personal proclivities. But not for me and here is
why.
First, it is a matter of dependance. Do I want to come to rely on some
centralized service, who could change their API or "developer" terms at
any
time and lock me out? Personally, no, I do not.
Second aspect is trust/security. Do I really trust a third party to
hold
all my various banking credentials? Personally, no, I do not.
And finally, independence and learning new skills in general. We all
have
very limited resources (mostly time). Do I want to spend my valuable
time
learning one particular (likely proprietary) API? Or should I instead
spend it learning a much more general (and F/LOSS) tool (like Selenium)
which also has the benefit of being able to solve lots of other
problems,
in addition to this particular one I am trying to solve right now?
Personally, <s>I think</s> I know that I prefer the latter.
So that is why I have chosen to go this particular route.
I'd love to hear anyone's thoughts on any or all of the above. Please
also
chime in if you have gotten stuck at any particular point along the way,
and maybe myself (or others) can help you get un-stuck. Thanks for
sticking with me if you made it this far. :)
Cheers,
TRS-80
[0] https://github.com/captin411/ofxclient
[1] https://plaintextaccounting.org/#articles-blog-posts
[2] https://yingtongli.me/blog/2020/04/29/hbs-scrape.html
[3] https://www.selenium.dev/documentation/en/webdriver
[4] https://www.w3.org/TR/webdriver1
--
You received this message because you are subscribed to the Google Groups
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to beancount+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/beancount/8afa88ebc06c94b865d51a40244bc8fd%40isnotmyreal.name.