Hello Community !

I'm a contributor to OpenRefine. You might have known about us previously
as Google Refine. :) We are thinking of giving an alternative
compute/storage engine in addition to our already existing one developed by
Stephano Mazzocchi of Apache Cocoon fame. :) We need some insight from this
community.
The data is loaded into memory and the users workspace where the data lives
is saved to disk occasionally or upon project closing.
https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/ProjectManager.java

We have the concept of Undo and Redo within OpenRefine as well, and this
seems to be equivalent perhaps to Flink's savepoints
Besides Flink, we are thinking perhaps that Apache Spark might also work ?
But unsure, and are looking at what can best align with our current data
store to memory modeling that we have.

We also have a cross() function which is similar to CoGroup in Flink.

Thoughts ?
-Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

Reply via email to