Spreadsheet::Table::Extract

Bob Rogers Mon, 06 Mar 2023 19:58:59 -0800

   I am currently finishing up a hack to extract objects from a
spreadsheet, tentatively called Spreadsheet::Table::Extract.  Here's the
summary:


    =head1 SYNOPSIS

        # say we have a checkbook with five columns that we want to
        # turn into a set of Checkbook::Entry objects:
        my $slot_from_heading = {
                # column      => object slot name
                'number'      => 'check_no',
                'date'        => 'date_days',
                'description' => 'description',
                'descr.'          => 'description',     # synonym
                'payment'     => 'payment_value',
                'deposit'     => 'deposit_value'
        };

        use Spreadsheet::Table::Extract;
        use Spreadsheet::Read;
        use Checkbook::Entry;   # for example

        my $book = ReadData('some-spreadsheet.xlsx');
        my $sheet1 = $book->[1];
        my @tables = Spreadsheet::Table::Extract->find_tables
            ($sheet, $slot_from_heading,
             validate_object => Checkbook::Entry->can('validate'));
        # assume we have only one table; if there are multiple heading
        # rows, there could be several.
        my @entries = $checkbook_table->extract($tables[0]);
        say "have ", scalar(@entries), " checkbook transactions.";

    =head1 DESCRIPTION

    C<Spreadsheet::Table::Extract> identifies subsets of spreadsheets
    and/or CSV files read with C<Spreadsheet::Read> that are conceptually
    equivalent to database tables.  Each such table is found by looking
    for its heading row(s) in the spreadsheet, and results in a
    C<Spreadsheet::Table::Extract> instance.  The instance can then
    extract the table into an arrayref of objects of a designated object
    class.  Multiple tables of different types can be combined side by
    side or above one another on the same sheet.

    The full process goes like this:

    =over

    =item 1.

    Call L</find_tables> to extract table definitions from a sheet,
    passing it a hashref that tells it how to identify the heading row.
    Do this once for each combination of sheet and table type.

    =item 2.

    Call L</make_disjoint> on all tables to adjust their boundaries for
    potential overlap.

    =item 3.

    Call L</extract> on each table to produce a list of objects of the
    desired application class.

    =back

(If you want to see it all, it's at
https://git.sr.ht/~rgrjr/Spreadsheet-Table-Extract .)

   The thing is, I'm not completely happy with the name.  It extracts
tables from spreadsheets, sure enough, but "extract" is more of a verb
than a noun, and the key method makes instances of itself that represent
table-like subsets of the spreadsheet.  Which are in fact extracts, but
it might be more accurate to make those Spreadsheet::Table instances.
But that seems overly grandiose for such a small module (it's 550 lines
all told).  Spreadsheet::ExtractedTable?  Suggestions?

                                        -- Bob Rogers
                                           http://www.rgrjr.com/

Spreadsheet::Table::Extract

Reply via email to