New issue 356: same code base for fetching prices and importing transactions?
https://bitbucket.org/blais/beancount/issues/356/same-code-base-for-fetching-prices-and
Johannes Harms:
Could / should we use the same code base for fetching prices and importing
transactions? Using importers for importing downloaded prices could help reduce
duplicate code. The only thing missing is to extend the importer base classes
to support fetching (of not only prices, but also transactions).
@blais: I would be glad to hear your thoughts on this, including the previous
design rationale to split price-fetching and transaction-importing into
separate modules.
@seltzered: I am posting this as follow-up to #329 "Bean-price: support fetch
over range of dates", because I did not want to take the other issue off-topic.
Motivation:
My personal observation is that importing prices is very similar to importing
transactions.
* fetching: While prices are usually fetched automatically, I found this is not
always possible. In a similar way, transactions could be fetched automatically,
but that's not always possible, and not worth the effort, given that screen
scraping implementations break easily.
* Importing (identifying, filing, extracting): These steps are nearly identical
for prices and transactions.
Example: I wrote this importer for yahoo prices:
1. Fetching: As mentioned above, automatic fetching is not always easy. In this
case, I found it easier to manually fetch prices by scraping the HTML table
using artoo.js.
2. Importing: The code snippet below illustrates that it makes perfectly sense
to import prices using import functionality.
```
#!python
"""
Imports prices from CSV that was scraped from yahoo finance
"""
# pylint: disable=C0411,C0330
from _pydecimal import DecimalException
import csv
import logging
from typing import Dict, Iterable, NamedTuple
from beancount.core.amount import Amount
from beancount.core.data import Price, new_metadata, sorted as sorted_entries
from beancount.core.number import D
from beancount.ingest.cache import _FileMemo
from beancount.ingest.importer import ImporterProtocol
from beancount.ingest.importers.csv import Col
from beancount.ingest.importers.mixins.identifier import IdentifyMixin
from beancount.utils.date_utils import parse_date_liberally
logger = logging.getLogger(__name__) # pylint: disable=C0103
Row = NamedTuple(
"Row", [("file_name", str), ("line_number", int), ("data", Dict)]
)
class PricesImporter(ImporterProtocol):
"""Imports prices from CSV"""
def __init__(self, **kwargs): # pylint: disable=R0913
"""
Initializes the importer.
"""
# gets required arguments:
self.columns = kwargs.pop("columns")
self.commodity = kwargs.pop("commodity")
self.currency = kwargs.pop("currency")
# gets optional arguments:
self.debug = kwargs.pop("debug", False)
self.csv_dialect = kwargs.get("csv_dialect", None)
self.dateutil_kwds = kwargs.get("dateutil_kwds", None)
super().__init__(**kwargs)
def extract(self, file: _FileMemo, existing_entries=None):
"""Extracts price entries from CSV file"""
rows = self.read_lines(file.name)
price_entries = sorted_entries(self.get_price_entries(rows))
return price_entries
def read_lines(self, file_name: str) -> Iterable[Row]:
"""Parses CSV lines into Row objects"""
with open(file_name) as file:
reader = csv.DictReader(file, dialect=self.csv_dialect)
for row in reader:
yield Row(file_name, reader.line_num, row)
def get_price_entries(self, lines: Iterable[Row]) -> Iterable[Price]:
"""Converts Row objects to beancount Price objects"""
for line in lines:
try:
self.validate_line(line)
meta = self.build_metadata(line.file_name, line.line_number)
date = self.parse_date(line.data[self.columns[Col.DATE]])
amount = self.parse_amount(line.data[self.columns[Col.AMOUNT]])
amount_with_currency = Amount(amount, self.currency)
yield Price( # pylint: disable=E1102
meta, date, self.commodity, amount_with_currency
)
except (ValueError, DecimalException, AssertionError) as exception:
logger.warning(
"Skipped CSV line due to %s exception at %s line %d: %s",
exception.__class__.__name__,
line.file_name,
line.line_number,
line.data,
)
def validate_line(self, row):
"""Validates CSV rows. If invalid, an AssertionError is thrown."""
data = row.data
assert data[self.columns[Col.AMOUNT]]
def build_metadata(self, file_name, line_number):
"""Constructs beancount metadata"""
line_number = str(line_number)
return new_metadata(
file_name,
line_number,
{"source_file": file_name, "source_line": line_number}
if self.debug
else None,
)
def parse_date(self, date_str):
"""Parses the date string"""
return parse_date_liberally(date_str, self.dateutil_kwds)
def parse_amount(self, amount_str): # pylint: disable=R0201
"""Parses an amount string to decimal"""
return D(amount_str)
class YahooFinancePricesImporter(IdentifyMixin, PricesImporter):
"""
Imports CSV scraped from finance.yahoo.com
Usage:
Scrape historical prices using artoo.js, for example from:
https://finance.yahoo.com/quote/EXS2.DE/history?p=EXS2.DE
artoo.scrapeTable('table[data-test="historical-prices"]', {
headers: 'th',
done: artoo.saveCsv
})
Then run this importer to convert the scraped csv file to beancount prices.
"""
def __init__(self, **kwargs):
kwargs.setdefault(
"columns", {Col.DATE: "Date", Col.AMOUNT: "Adj Close**"}
)
self.matchers = [
("content", r"Date,Open,High,Low,Close\*,Adj Close.*")
]
super().__init__(**kwargs)
class TecdaxImporter(YahooFinancePricesImporter):
"""
Imports CSV scraped from:
https://finance.yahoo.com/quote/EXS2.DE/history?p=EXS2.DE
"""
def __init__(self, **kwargs):
kwargs.setdefault("commodity", "TECDAX")
kwargs.setdefault("currency", "EUR")
super().__init__(**kwargs)
```
In my opinion, the above code illustrates that prices and transactions could
use the same import process. I would therefore like to propose: Let's use
importers for importing downloaded prices. And let's extend the importer base
classes to support fetching of not only prices, but also transactions.
--
You received this message because you are subscribed to the Google Groups
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/beancount/20181228134927.40350.21823%40app-137.ash1.bb-inf.net.
For more options, visit https://groups.google.com/d/optout.