Issue #356: same code base for fetching prices and importing transactions? (blais/beancount)

Johannes Harms Fri, 28 Dec 2018 05:50:12 -0800

New issue 356: same code base for fetching prices and importing transactions?
https://bitbucket.org/blais/beancount/issues/356/same-code-base-for-fetching-prices-and


Johannes Harms:

Could / should we use the same code base for fetching prices and importing 
transactions? Using importers for importing downloaded prices could help reduce 
duplicate code. The only thing missing is to extend the importer base classes 
to support fetching (of not only prices, but also transactions).

@blais: I would be glad to hear your thoughts on this, including the previous 
design rationale to split price-fetching and transaction-importing into 
separate modules.  
@seltzered: I am posting this as follow-up to #329 "Bean-price: support fetch 
over range of dates", because I did not want to take the other issue off-topic.

Motivation:
My personal observation is that importing prices is very similar to importing 
transactions.

* fetching: While prices are usually fetched automatically, I found this is not 
always possible. In a similar way, transactions could be fetched automatically, 
but that's not always possible, and not worth the effort, given that screen 
scraping implementations break easily. 
* Importing (identifying, filing, extracting): These steps are nearly identical 
for prices and transactions. 

Example: I wrote this importer for yahoo prices:  
1. Fetching: As mentioned above, automatic fetching is not always easy. In this 
case, I found it easier to manually fetch prices by scraping the HTML table 
using artoo.js.  
2. Importing: The code snippet below illustrates that it makes perfectly sense 
to import prices using import functionality.


```
#!python

"""
Imports prices from CSV that was scraped from yahoo finance
"""
# pylint: disable=C0411,C0330


from _pydecimal import DecimalException
import csv
import logging
from typing import Dict, Iterable, NamedTuple

from beancount.core.amount import Amount
from beancount.core.data import Price, new_metadata, sorted as sorted_entries
from beancount.core.number import D
from beancount.ingest.cache import _FileMemo
from beancount.ingest.importer import ImporterProtocol
from beancount.ingest.importers.csv import Col
from beancount.ingest.importers.mixins.identifier import IdentifyMixin
from beancount.utils.date_utils import parse_date_liberally

logger = logging.getLogger(__name__)  # pylint: disable=C0103

Row = NamedTuple(
    "Row", [("file_name", str), ("line_number", int), ("data", Dict)]
)


class PricesImporter(ImporterProtocol):
    """Imports prices from CSV"""

    def __init__(self, **kwargs):  # pylint: disable=R0913
        """
        Initializes the importer.
        """
        # gets required arguments:
        self.columns = kwargs.pop("columns")
        self.commodity = kwargs.pop("commodity")
        self.currency = kwargs.pop("currency")

        # gets optional arguments:
        self.debug = kwargs.pop("debug", False)
        self.csv_dialect = kwargs.get("csv_dialect", None)
        self.dateutil_kwds = kwargs.get("dateutil_kwds", None)
        super().__init__(**kwargs)

    def extract(self, file: _FileMemo, existing_entries=None):
        """Extracts price entries from CSV file"""
        rows = self.read_lines(file.name)
        price_entries = sorted_entries(self.get_price_entries(rows))
        return price_entries

    def read_lines(self, file_name: str) -> Iterable[Row]:
        """Parses CSV lines into Row objects"""
        with open(file_name) as file:
            reader = csv.DictReader(file, dialect=self.csv_dialect)
            for row in reader:
                yield Row(file_name, reader.line_num, row)

    def get_price_entries(self, lines: Iterable[Row]) -> Iterable[Price]:
        """Converts Row objects to beancount Price objects"""
        for line in lines:
            try:
                self.validate_line(line)
                meta = self.build_metadata(line.file_name, line.line_number)
                date = self.parse_date(line.data[self.columns[Col.DATE]])
                amount = self.parse_amount(line.data[self.columns[Col.AMOUNT]])
                amount_with_currency = Amount(amount, self.currency)
                yield Price(  # pylint: disable=E1102
                    meta, date, self.commodity, amount_with_currency
                )
            except (ValueError, DecimalException, AssertionError) as exception:
                logger.warning(
                    "Skipped CSV line due to %s exception at %s line %d: %s",
                    exception.__class__.__name__,
                    line.file_name,
                    line.line_number,
                    line.data,
                )

    def validate_line(self, row):
        """Validates CSV rows. If invalid, an AssertionError is thrown."""
        data = row.data
        assert data[self.columns[Col.AMOUNT]]

    def build_metadata(self, file_name, line_number):
        """Constructs beancount metadata"""
        line_number = str(line_number)
        return new_metadata(
            file_name,
            line_number,
            {"source_file": file_name, "source_line": line_number}
            if self.debug
            else None,
        )

    def parse_date(self, date_str):
        """Parses the date string"""
        return parse_date_liberally(date_str, self.dateutil_kwds)

    def parse_amount(self, amount_str):  # pylint: disable=R0201
        """Parses an amount string to decimal"""
        return D(amount_str)


class YahooFinancePricesImporter(IdentifyMixin, PricesImporter):
    """
    Imports CSV scraped from finance.yahoo.com

    Usage:

    Scrape historical prices using artoo.js, for example from:
    https://finance.yahoo.com/quote/EXS2.DE/history?p=EXS2.DE

    artoo.scrapeTable('table[data-test="historical-prices"]', {
      headers: 'th',
      done: artoo.saveCsv
    })

    Then run this importer to convert the scraped csv file to beancount prices.
    """

    def __init__(self, **kwargs):
        kwargs.setdefault(
            "columns", {Col.DATE: "Date", Col.AMOUNT: "Adj Close**"}
        )
        self.matchers = [
            ("content", r"Date,Open,High,Low,Close\*,Adj Close.*")
        ]
        super().__init__(**kwargs)


class TecdaxImporter(YahooFinancePricesImporter):
    """
    Imports CSV scraped from:
    https://finance.yahoo.com/quote/EXS2.DE/history?p=EXS2.DE
    """

    def __init__(self, **kwargs):
        kwargs.setdefault("commodity", "TECDAX")
        kwargs.setdefault("currency", "EUR")
        super().__init__(**kwargs)
```

In my opinion, the above code illustrates that prices and transactions could 
use the same import process. I would therefore like to propose: Let's use 
importers for importing downloaded prices. And let's extend the importer base 
classes to support fetching of not only prices, but also transactions.


-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/20181228134927.40350.21823%40app-137.ash1.bb-inf.net.
For more options, visit https://groups.google.com/d/optout.

Issue #356: same code base for fetching prices and importing transactions? (blais/beancount)

Reply via email to