Interesting!

I've been working on a similar library for processing CSV files,
called Grafter.  We released version 0.3.0. about a week ago and have
been using it to ingest and clean large amounts of CSV/Excel for over
9 months.

http://grafter.org/

http://github.com/Swirrl/grafter

It consists of a few parts:

1) An API for expressing lazy transformations on tabular data.  We are
compatible with Incanter Datasets, but differ in that we try to be
lazy rather than eager to prevent consuming memory on large conversion
tasks.  Our API isn't complete yet, but its growing all the time.  We
plan to migrate from Incanter towards using core.matrix Dataset
protocols; which will broaden our compatibility horizon even further.

This API can be used to express relatively complex transformations on
tabular data, cleanly and batch clean data, and even convert from CSV
-> Excel or vice versa.

2) We have another few companion API's which are concerned with adding
Linked Data, Semantic Web and ETL support.  This is essentially our
motivation for writing Grafter - to perform long running batch ETL
processes with high reliability and repeatability.  Its still early
stages - but we've been using Grafter to transform 100s of gigabytes
of production data with increasingly complex pipelines.

This graph data support, essentially allows you to specify a graph
template to express a mapping from tabular form into Linked Data
(RDF).  At some point I'd like to add support for datomic, neo4j, SQL
adapters and others too; which should be a relatively small amount of
work.  We're trying our best to be extensible.

3) We have extensive documentation (see http://grafter.org/ ), a
leiningen template and plugin for getting started started quickly and
finding and executing pipelines.

Right now Grafter carries quite a few dependencies, not directly
relevant to CSV... e.g. Excel & Linked Data support.  However we have
started engineering it into distinct libraries which package smaller
subsets of functionality.  I was hoping to have released this with
0.3.0 but an as yet undiagnosed bug in lein-repack has prevented me
from doing this.

However when this is done, you'll be able to request just the
grafter-tabular.csv package for example.

I'd be very interested to hear your thoughts on Grafter, and will be
sure to take a look at semantic-csv when I return to work.

R.

On 27 January 2015 at 09:22, Christopher Small <metasoar...@gmail.com> wrote:
> Hi everyone
>
> I'm pleased to announce the release of
> [semantic-csv](https://github.com/metasoarous/semantic-csv), a humble
> library for working with CSV data.
>
> Existing Clojure libraries for working with CSV data
> ([clojure.data.csv](https://github.com/clojure/data.csv) and
> [clojure-csv](https://github.com/davidsantiago/clojure-csv) being the most
> notable), only concern themselves with the _syntax_ of CSV; They take CSV
> text, transform it into a collection of vectors of string values (or when
> writing, write from a sequence of string vectors), and that's it.  Semantic
> CSV takes the next step by giving you tools for addressing the _semantics_
> of your data, helping you put it into the form that better reflects what it
> means, and make it easier to work with.
>
> ## Features
>
> * Absorb header row as a vector of column names, and return remaining rows
> as maps of `column-name -> row-val`
> * Write from a collection of maps, given a header
> * Apply casting/formatting functions by column name, while reading or
> writing
> * Remove commented out lines (by default, those starting with `#`)
> * Compatible with any CSV parsing library returning/writing a sequence of
> row vectors
> * (SOON) A "sniffer" that reads in N lines, and uses them to guess column
> types
>
> ### Structure
>
> Semantic CSV is structured around a number of composable processing
> functions for transforming data as it comes out of or goes into a CSV file.
> This leaves room for you to use whatever parsing/formatting tools you like,
> reflecting a nice decoupling of grammar and semantics.  However, a couple of
> convenience functions are also provided which wrap these individual steps in
> an opinionated but customizable manner, helping you move quickly while
> prototyping or working at the REPL.
>
> ## Where you come in
>
> Semantic CSV is still in alpha, but I'm excited to start getting people
> using it and providing feedback.  I'm particularly interested at this phase
> in hearing what people think of the overall structure, what pain points come
> up, and what features would be nice.  And bugs of course.  Feel free to
> submit feedback via [Github
> issues](https://github.com/metasoarous/semantic-csv/issues) or the [project
> chat room](https://gitter.im/metasoarous/semantic-csv).
>
> https://github.com/metasoarous/semantic-csv
>
>
> <br/>
>
> Thanks for your time; I hope you find this useful.
>
> Chris Small
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to