Thad,
In the case of something comparable to the Spark DataFrame / SQL -- you may
be able to build Avro and/or Parquet TableSources[1] and TableSinks [2] for
Flink. The CSVTableSource is here[3]. Then you should be able to have a
comparable experience.

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table_api.html#register-an-external-table-using-a-tablesource
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table_api.html#writing-tables-to-external-sinks
[3]
https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/sources/CsvTableSource.scala

Hope that helps.

On Wed, Jun 14, 2017 at 4:25 PM, Thad Guidry <thadgui...@gmail.com> wrote:

> Thanks Fabian and Andrew for the responses.
>
> Fabian - Yes that is what I was afraid of.  Flink seems perfect for batch
> processing a pipeline.  In OpenRefine, we work with finite datasets and
> just want an easier way to have distributed data storage for when our users
> want to work with very large finite datasets.
>
> Andrew - Apache Zeppelin performs some of the same magic that OpenRefine
> does, but is more focused on exploratory analysis and leverages some of the
> same technology that we are also looking at in more detail to see where/how
> it fits with OpenRefine.
>
> I especially like the idea of Apache YARN's NodeManager and also Apache
> Spark's data access through DataFrame API and SQL against data sources like
> Avro and Parquet, which is were both Jacky and I see perhaps the most
> alignment with OpenRefine and giving our users an alternative storage /
> compute option for handling bigger datasets than can fit in memory
> currently with OpenRefine.
>
> Any other thoughts, ideas. or pros/cons from anyone about anything I
> mentioned ?
>
> -Thad
> +ThadGuidry <https://www.google.com/+ThadGuidry>
>



-- 
Thanks,
Andrew

Subscribe to my book: Streaming Data <http://manning.com/psaltis>
<https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>

Reply via email to