Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138844139 Hey @fhueske , thanks for your comments. I was not aware this was intended to allow for recovery on failed jobs. For reusing among different jobs in the same session, I don't see how this doesn't solve the issue. If the Memory manager is alive, the results will be there for any job to use. For a true across-job sharing, one possible feature would be to add a method for initialization from the environment as `getPersistedSource(String)` which would access results from a persisted data set from some entirely independent job. Further, this kind of makes sense on an operator level. User should have to ability to explicitly persist a data set in memory, which calls for providing a function call. I was only drawing the analogy from spark's api. I have no idea how they internally implement this, but if an API function is to be provided, it can only be done in two ways. Either return a new Operator, as a transformation on the original data set, or just by returning the same data set [like `withBroadcastSet` does]. The former seemed easier to work with, because it doesn't interfere with the existing mechanisms. I have implemented no new internal functionality, but only used the existing system. I would've loved more discussion on this but frankly, once I started going through the internal mechanisms, it seemed like a pretty trivial thing to implement. Of course that was when I wasn't aware it was intended to be used for recovery. If there is some work on persisting intermediate results for recovery, the same mechanism can be used for a persist operation, in which case this work is anyways moot. But there has to be an API call to allow users to explicitly cache results in memory. This is a major problem I'm facing in implementing a randomized splitting algorithm.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---