Hi Rick

Looks like a cool project!

Grafter seems much more ambitious in scope than semantic-csv 
<http://github.com/metasoarous/semantic-csv>, aiming to be a rather 
complete framework for creating ETL pipelines, and translations between 
different formats. It feels to me like the sort of thing with which one 
could build a more extensible incarnation of python's csvkit command line 
utilities.

In contrast, semantic-csv has a more focused scope of providing 
transformation tools for dealing with the problem of getting their tabular 
data into a more use-able form within the context of a pipeline. However, 
it makes no assumptions about what the rest of that pipeline might look 
like, trying to be as modular/composable as possible. It sounds like the 
`grafter-tabular.csv` you're thinking of pulling out might be of similar 
scope, though perhaps with a different flavor and set of 
opinions/assumptions.

A more specific distinction: For "named column" functionality, semantic-csv 
offers the `mappify` function, which simply returns a lazy sequence of 
maps. This is in contrast to your approach of implementing lazy Datasets, 
which I think is really neat idea. My suggestion would be to see if Mike 
Anderson thinks tooling around lazy Datasets would fit within the scope of 
core.matrix's Dataset implementations/API. If not, I think *that* could be 
the basis of a really valuable spin-off library. The next major version of 
Incanter will actually be switching over to core.matrix Datasets, so 
developing around/towards the latter is definitely the way to go.

In summary, I think the differing scopes of our two projects make them both 
valuable tools, depending on the needs. I'm looking forward to seeing how 
grafter develops.

Hope my thoughts have been helpful. Best of luck.

Chris



On Friday, February 6, 2015 at 4:35:55 PM UTC-8, Rick Moynihan wrote:
>
> Interesting! 
>
> I've been working on a similar library for processing CSV files, 
> called Grafter.  We released version 0.3.0. about a week ago and have 
> been using it to ingest and clean large amounts of CSV/Excel for over 
> 9 months. 
>
> http://grafter.org/ 
>
> http://github.com/Swirrl/grafter 
>
> It consists of a few parts: 
>
> 1) An API for expressing lazy transformations on tabular data.  We are 
> compatible with Incanter Datasets, but differ in that we try to be 
> lazy rather than eager to prevent consuming memory on large conversion 
> tasks.  Our API isn't complete yet, but its growing all the time.  We 
> plan to migrate from Incanter towards using core.matrix Dataset 
> protocols; which will broaden our compatibility horizon even further. 
>
> This API can be used to express relatively complex transformations on 
> tabular data, cleanly and batch clean data, and even convert from CSV 
> -> Excel or vice versa. 
>
> 2) We have another few companion API's which are concerned with adding 
> Linked Data, Semantic Web and ETL support.  This is essentially our 
> motivation for writing Grafter - to perform long running batch ETL 
> processes with high reliability and repeatability.  Its still early 
> stages - but we've been using Grafter to transform 100s of gigabytes 
> of production data with increasingly complex pipelines. 
>
> This graph data support, essentially allows you to specify a graph 
> template to express a mapping from tabular form into Linked Data 
> (RDF).  At some point I'd like to add support for datomic, neo4j, SQL 
> adapters and others too; which should be a relatively small amount of 
> work.  We're trying our best to be extensible. 
>
> 3) We have extensive documentation (see http://grafter.org/ ), a 
> leiningen template and plugin for getting started started quickly and 
> finding and executing pipelines. 
>
> Right now Grafter carries quite a few dependencies, not directly 
> relevant to CSV... e.g. Excel & Linked Data support.  However we have 
> started engineering it into distinct libraries which package smaller 
> subsets of functionality.  I was hoping to have released this with 
> 0.3.0 but an as yet undiagnosed bug in lein-repack has prevented me 
> from doing this. 
>
> However when this is done, you'll be able to request just the 
> grafter-tabular.csv package for example. 
>
> I'd be very interested to hear your thoughts on Grafter, and will be 
> sure to take a look at semantic-csv when I return to work. 
>
> R. 
>
> On 27 January 2015 at 09:22, Christopher Small <metas...@gmail.com 
> <javascript:>> wrote: 
> > Hi everyone 
> > 
> > I'm pleased to announce the release of 
> > [semantic-csv](https://github.com/metasoarous/semantic-csv), a humble 
> > library for working with CSV data. 
> > 
> > Existing Clojure libraries for working with CSV data 
> > ([clojure.data.csv](https://github.com/clojure/data.csv) and 
> > [clojure-csv](https://github.com/davidsantiago/clojure-csv) being the 
> most 
> > notable), only concern themselves with the _syntax_ of CSV; They take 
> CSV 
> > text, transform it into a collection of vectors of string values (or 
> when 
> > writing, write from a sequence of string vectors), and that's it. 
>  Semantic 
> > CSV takes the next step by giving you tools for addressing the 
> _semantics_ 
> > of your data, helping you put it into the form that better reflects what 
> it 
> > means, and make it easier to work with. 
> > 
> > ## Features 
> > 
> > * Absorb header row as a vector of column names, and return remaining 
> rows 
> > as maps of `column-name -> row-val` 
> > * Write from a collection of maps, given a header 
> > * Apply casting/formatting functions by column name, while reading or 
> > writing 
> > * Remove commented out lines (by default, those starting with `#`) 
> > * Compatible with any CSV parsing library returning/writing a sequence 
> of 
> > row vectors 
> > * (SOON) A "sniffer" that reads in N lines, and uses them to guess 
> column 
> > types 
> > 
> > ### Structure 
> > 
> > Semantic CSV is structured around a number of composable processing 
> > functions for transforming data as it comes out of or goes into a CSV 
> file. 
> > This leaves room for you to use whatever parsing/formatting tools you 
> like, 
> > reflecting a nice decoupling of grammar and semantics.  However, a 
> couple of 
> > convenience functions are also provided which wrap these individual 
> steps in 
> > an opinionated but customizable manner, helping you move quickly while 
> > prototyping or working at the REPL. 
> > 
> > ## Where you come in 
> > 
> > Semantic CSV is still in alpha, but I'm excited to start getting people 
> > using it and providing feedback.  I'm particularly interested at this 
> phase 
> > in hearing what people think of the overall structure, what pain points 
> come 
> > up, and what features would be nice.  And bugs of course.  Feel free to 
> > submit feedback via [Github 
> > issues](https://github.com/metasoarous/semantic-csv/issues) or the 
> [project 
> > chat room](https://gitter.im/metasoarous/semantic-csv). 
> > 
> > https://github.com/metasoarous/semantic-csv 
> > 
> > 
> > <br/> 
> > 
> > Thanks for your time; I hope you find this useful. 
> > 
> > Chris Small 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "Clojure" group. 
> > To post to this group, send email to clo...@googlegroups.com 
> <javascript:> 
> > Note that posts from new members are moderated - please be patient with 
> your 
> > first post. 
> > To unsubscribe from this group, send email to 
> > clojure+u...@googlegroups.com <javascript:> 
> > For more options, visit this group at 
> > http://groups.google.com/group/clojure?hl=en 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Clojure" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to clojure+u...@googlegroups.com <javascript:>. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to