Flink is not really well suited for interactive / adhoc processing.
What could work is to use some local tool to identify the transformation
rules and apply them with Flink to a large data set.
But that's probably not what you are looking for, right?
Best, Fabian
2017-06-15 3:11 GMT+02:00 qi cui
Hi Andrew,
That will be great if you can come up with something to show the idea.
There are lots of wiki pages on the github you can refer to(including the
server side architecture and client side architecture). The unique feature
of the OpenRefine is its ability to have the user to interact with t
Thanks Andrew !
That would be fantastic ! Even if your not successful at the trivial use
case, just having a look at our source code and providing your comments or
thoughts in our code on a forked branch as you explore and
investigate...would be tremendously useful to us !
-Thad
+ThadGuidry
Thad,
Based on your description that OpenRefine uses similar techniques as
Zeeplin then I *think* the reading and writing will work.
The Undo/Redo I am fuzzy on as.
I will try over the next couple of days and see if I can make something
like this work (at lest a trivial use case). Personally I th
Andrew,
So you idea is that Flink could be used as a storage abstraction layer for
OpenRefine ? Where OpenRefine would use TableSources for reading and
TableSinks for writing ?
And would that still work with our concept of Undo/Redo in OpenRefine to
use Flink's Savepoints in concert with TableSou
Thad,
In the case of something comparable to the Spark DataFrame / SQL -- you may
be able to build Avro and/or Parquet TableSources[1] and TableSinks [2] for
Flink. The CSVTableSource is here[3]. Then you should be able to have a
comparable experience.
[1]
https://ci.apache.org/projects/flink/flin
Thanks Fabian and Andrew for the responses.
Fabian - Yes that is what I was afraid of. Flink seems perfect for batch
processing a pipeline. In OpenRefine, we work with finite datasets and
just want an easier way to have distributed data storage for when our users
want to work with very large fin
Hi Thad,
I am not sure if this would work for OpenRefine, but could you follow the
model that is used for Apache Zeppelin? Granted OpenRefine does not have
the notion of an interpreter and Zeppelin is not holding all of the data in
memory. However, you may be able to take that type of idea and and
Hi Thad,
I'm not familiar with the internals of OpenRefine, but I would assume that
users can apply ad-hoc / exploratory transformations on data which is
loaded in memory (please correct me if my assumption is wrong).
Flink stores data in memory for efficient processing of data in motion
(either
Hello Community !
I'm a contributor to OpenRefine. You might have known about us previously
as Google Refine. :) We are thinking of giving an alternative
compute/storage engine in addition to our already existing one developed by
Stephano Mazzocchi of Apache Cocoon fame. :) We need some insight fr
10 matches
Mail list logo