Automation being the key to success (aka Know Thyself) featuring ofxclient and Selenium WebDriver

TRS-80 Wed, 15 Jul 2020 11:32:36 -0700

OK so this got a little long.  Go grab yourself your favorite tasty cold

delicious adult beverage and get comfortable. I did put in someheadings

at least to mitigate the wall of text.  :D

I am on my second or third go 'round with Beancount, over a period ofsome

years.  I have had various levels of success or failure, but for me

personally, I really felt like I was the most successful when automatingas

much as possible.

This post will be about how I arrived at that conclusion, and somethings I

learned along the way.  I hope it ends up being useful to others who may
have similar ideas, but perhaps not put all the pieces together yet.  Or
maybe need some encouragement, or...?

* Know Thyself

I guess I felt the need to make this post because Martin himselfthroughoutthe docs seems to put forth his more "manual" way of doing things (as awayto keep more "in touch" with his numbers, if I am reading himcorrectly).

But perhaps I read too much into that. I can only say that generally Idotry and follow the recommendations made in docs by founder and thosemoreinvolved in a particular project (especially when I am just startingout).I suppose I figure there must be some reason for it, even if I do notyet

understand what those reason(s) might be...  So maybe this is just me
finally gaining enough experience to know what is what, and perhaps more
importantly "knowing myself" enough to recognize what works for me,
personally.

If you prefer the more "manual" approach (or any other approach, forthat

matter) I encourage you to "do what works for you."  Thankfully we have
such flexible tools available to us...

* Automate as much as Possible

For me, what seems to work is "as much automation as possible."  I still
end up manually doing some stuff of course, but for me if I can get that

down to 10% or 5% (or whatever) the way I see it is I have reduced90-95%

of the work (and drudgery) involved.  I mean, this is what computers are
best at, isn't it?

* Moving from CSV to OFX import

Along those lines, I recently moved from CSV import to OFX.  It's still
early, but I am well on my way to nearly //completely// automating my
download, import, and categorization.

Before with CSV I had to log on to my bank and click through stuff, save
the file (and then remember my file naming scheme), etc. and some times
that just became too much friction and sooner or later I would start
falling behind from the simple drudgery of it all.

Further complicating the issue, my bank only keeps "transactions" around

for 90 days, so if I got busy or fell behind, I would be back to/manually/

entering any "missed" transactions (yeah, right!).

Enter OFX (via ofxclient), which solves these problems by beingcompletely

scriptable (and thus automateable) tool.

There were a couple bugbears with ofxclient[0] however.  The guy is not
really actively maintaining it.  However after fixing a couple missing

apostrophes I /finally/ got it to work. I guess my Python must begetting alittle better, because 1-2 years ago I had already failed once or twiceat

this exact same task.  :)

So, on to the next hurdle...

* No (built in) OFX "categorizer"

Anyway so then it was a little disappointing to learn that there is no

callable "categorizer" available in the OFX importer example the sameway

that there was in the CSV importer example.

Until I found a recent post titled "Categorizing transactionsautomaticallyon import" which solved that particular part of the problem. I left amorefleshed out example as a reply to that thread for anyone who isinterested

(search the mailing list for that or "OFX categorizer" etc.).

* Next steps (Selenium WebDriver)

At this point I am satisfied enough in my progress (and have learned
enough) that I felt it would be worth sharing that progress with others.

But already I am looking forward to next steps. And I am gettingexcited

about Beancount again.  :)

The last days I have already been reading up docs about SeleniumWebDriver.I have heard about Selenium before of course, but what I think motivatedmeto really give it a try now was an article I recently came across overat

plaintextaccounting.org[1] by Lee Yingtong Li titled "Using selenium to

scrape/import bank transactions for ledger-cli."[2] This is a quiterecent

article (2020-04-29) as you can see by the link.

Anyway he is using it to get his "transactions" but that is not what Iplanon using it for (I have OFX for that). For me, the only remaining pieceof

the puzzle that is left to automate is...

* Automatically downloading PDF statements

Like my "transactions", downloading these PDF "statements" was anexercisein drudgery, for all the reasons already mentioned above (clickingthrough

bank website, remembering file naming convention, etc.).

First I tried doing this through OFX protocol itself. And maybe thereis away? The standard would seem to indicate maybe there is. But I madeposts

about this not only here but on ledger mailing list before and received
exactly zero replies so far (which is also why I am not even going to

bother looking them up in order to link to them). So I gave up on thatway

(for now).

So then I got the idea to maybe automate this drudgery using Selenium
(WebDriver).

* Arguments for Selenium WebDriver (in general)

Now, I have not even got this actually working yet, and theimplementation

details will of course be very bank (web site) dependant.  So why bother
bringing it up now (or at all, for that matter)?

Well for same reason as posted very early on, mainly I have heard ofthissort of thing being referred to mostly as "too much trouble" and tookthatassessment at face value. But is it? Some things I learned in myresearch

the last few days started to change my mind:

1. So far, the Selenium WebDriver docs[3] seem to be very good. Simpleand

   to the point.

2. There are bindings for several different languages.  And the lanuage
   bindings (I was looking at Python mostly) seem to be quite clean,
   straightforward, and easy to remember / intuitive.

3. It appears to be quite a mature and reliable thing nowadays, with
   browser vendors like Google and Mozilla (and others) actually
   maintaining their own drivers for each particular browser.  No more

"PhantomJS" and feeling like you are in some neverending cat andmouse

   with an opponent.

4. Not only that, apparently the whole notion of automated browser /site

   testing has actually become an W3C recommendation by now(!). [4]

It really appears to me to be a completely different dynamic nowadays.

Therefore I would challenge the notion that the ROI is not there. Notonly

is this looking quite easy, but dare I say, /well supported/ even!  :)

Of course if I run into some brick wall (or get along swimmingly) I will
try and make some time and remember to report back in either case.  :)
Which leads me into my final point...

* Choice of tools

At some point during this whole adventure (a while back) I thought longand

hard about choice of tools.

There are other ways to accomplish "automation."  Mainly online

"aggregators" like Plaid, Mint, and probably some others. I actuallyhad

signed up for a Plaid developer account at one point, before getting

ofxclient working. Those are certainly viable, perhaps evenpreferrable,depending on your personal proclivities. But not for me and here iswhy.


First, it is a matter of dependance.  Do I want to come to rely on some

centralized service, who could change their API or "developer" terms atany

time and lock me out?  Personally, no, I do not.

Second aspect is trust/security. Do I really trust a third party tohold

all my various banking credentials?  Personally, no, I do not.

And finally, independence and learning new skills in general. We allhavevery limited resources (mostly time). Do I want to spend my valuabletime

learning one particular (likely proprietary) API?  Or should I instead
spend it learning a much more general (and F/LOSS) tool (like Selenium)

which also has the benefit of being able to solve lots of otherproblems,

in addition to this particular one I am trying to solve right now?
Personally, <s>I think</s> I know that I prefer the latter.

So that is why I have chosen to go this particular route.

I'd love to hear anyone's thoughts on any or all of the above. Pleasealso

chime in if you have gotten stuck at any particular point along the way,
and maybe myself (or others) can help you get un-stuck.  Thanks for
sticking with me if you made it this far.  :)

Cheers,

TRS-80

[0] https://github.com/captin411/ofxclient
[1] https://plaintextaccounting.org/#articles-blog-posts
[2] https://yingtongli.me/blog/2020/04/29/hbs-scrape.html
[3] https://www.selenium.dev/documentation/en/webdriver
[4] https://www.w3.org/TR/webdriver1

--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beancount+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/8afa88ebc06c94b865d51a40244bc8fd%40isnotmyreal.name.

Automation being the key to success (aka Know Thyself) featuring ofxclient and Selenium WebDriver

Reply via email to