Hello Community ! I'm a contributor to OpenRefine. You might have known about us previously as Google Refine. :) We are thinking of giving an alternative compute/storage engine in addition to our already existing one developed by Stephano Mazzocchi of Apache Cocoon fame. :) We need some insight from this community. The data is loaded into memory and the users workspace where the data lives is saved to disk occasionally or upon project closing. https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/ProjectManager.java
We have the concept of Undo and Redo within OpenRefine as well, and this seems to be equivalent perhaps to Flink's savepoints Besides Flink, we are thinking perhaps that Apache Spark might also work ? But unsure, and are looking at what can best align with our current data store to memory modeling that we have. We also have a cross() function which is similar to CoGroup in Flink. Thoughts ? -Thad +ThadGuidry <https://www.google.com/+ThadGuidry>