Thad, In the case of something comparable to the Spark DataFrame / SQL -- you may be able to build Avro and/or Parquet TableSources[1] and TableSinks [2] for Flink. The CSVTableSource is here[3]. Then you should be able to have a comparable experience.
[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table_api.html#register-an-external-table-using-a-tablesource [2] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table_api.html#writing-tables-to-external-sinks [3] https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/sources/CsvTableSource.scala Hope that helps. On Wed, Jun 14, 2017 at 4:25 PM, Thad Guidry <thadgui...@gmail.com> wrote: > Thanks Fabian and Andrew for the responses. > > Fabian - Yes that is what I was afraid of. Flink seems perfect for batch > processing a pipeline. In OpenRefine, we work with finite datasets and > just want an easier way to have distributed data storage for when our users > want to work with very large finite datasets. > > Andrew - Apache Zeppelin performs some of the same magic that OpenRefine > does, but is more focused on exploratory analysis and leverages some of the > same technology that we are also looking at in more detail to see where/how > it fits with OpenRefine. > > I especially like the idea of Apache YARN's NodeManager and also Apache > Spark's data access through DataFrame API and SQL against data sources like > Avro and Parquet, which is were both Jacky and I see perhaps the most > alignment with OpenRefine and giving our users an alternative storage / > compute option for handling bigger datasets than can fit in memory > currently with OpenRefine. > > Any other thoughts, ideas. or pros/cons from anyone about anything I > mentioned ? > > -Thad > +ThadGuidry <https://www.google.com/+ThadGuidry> > -- Thanks, Andrew Subscribe to my book: Streaming Data <http://manning.com/psaltis> <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>