[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138926576 Aha. I had only searched Jira for any existing work on this. Anyways, I'm assuming you'll be rebasing the same PR after session management is in. I'm quite intere

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread mxm
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138923111 @sachingoel0101 I've opened a pull request some time ago to backtrack intermediate results on the network layer and then "backtrack" them during scheduling time: #640. At th

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138921118 Great! One step closer. Would love to see this feature soon. :) --- If your project is set up for it, you can reply to this email and have your reply appear on Gi

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138918787 @sachingoel0101 No need to apologize. It is just that you probably worked for quite a bit on this, and it is sad that this work is lost. @mxm is merging the

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread fhueske
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138884465 Hi @sachingoel0101, session management is a prerequisite for sharing persisted data sets across jobs. However, persisting and reusing data sets is not part of PR #858.

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread sachingoel0101
Github user sachingoel0101 closed the pull request at: https://github.com/apache/flink/pull/1083 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138868165 I apologize. Like I said before, the way I've implemented this, it seemed pretty trivial. Foremost, I needed this for something else and decided to take a shot at

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread StephanEwen
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138861709 I don't think this really works. It violates so many assumptions, like the fact that memory is available after a job ends. The accounting for that depends on task sl

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread fhueske
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138861596 You are certainly right that there should be an API call to explicitly persist data in memory (or transparently on disk if memory is short) and later access this data (w

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138844139 Hey @fhueske , thanks for your comments. I was not aware this was intended to allow for recovery on failed jobs. For reusing among different jobs in the sam

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-09 Thread fhueske
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1083#issuecomment-138825832 Hi @sachingoel0101, I think there was no consensus in the discussion (FLINK-1730) for how to implement this feature. @StephanEwen pointed that persisting data se

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

2015-09-01 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/1083 [FLINK-1730]Persist operator on Data Sets This PR introduces a `persist` operation on `DataSet` which allows persisting the data set in memory, allowing for direct access if this data set is