Re: [ANNOUNCE] Build Issues Solved

2016-05-30 Thread Chiwan Park
Thanks for the great work! :-) Regards, Chiwan Park > On May 31, 2016, at 7:47 AM, Flavio Pompermaier wrote: > > Awesome work guys! > And even more thanks for the detailed report...This troubleshooting summary > will be undoubtedly useful for all our maven projects! > > Best, > Flavio > On 30

Re: Side-effects of DataSet::count

2016-05-30 Thread Greg Hogan
Hi Stephan, Is there a design document, prior discussion, or background material on this enhancement? Am I correct in understanding that this only applies to DataSet since streams run indefinitely? Thanks, Greg On Mon, May 30, 2016 at 5:49 PM, Stephan Ewen wrote: > Hi Eron! > > Yes, the idea i

Re: Side-effects of DataSet::count

2016-05-30 Thread Greg Hogan
Hi Simone, This can be done with a map followed by a reduce. DataSet#count leverages accumulators which perform an inherent reduce. Also, DataSet#count implements RichOutputFormat as an optimization to only require a single operator. Previously the counting and accumulating was handled in a RichMa

Re: [ANNOUNCE] Build Issues Solved

2016-05-30 Thread Flavio Pompermaier
Awesome work guys! And even more thanks for the detailed report...This troubleshooting summary will be undoubtedly useful for all our maven projects! Best, Flavio On 30 May 2016 23:47, "Ufuk Celebi" wrote: > Thanks for the effort, Max and Stephan! Happy to see the green light again. > > On Mon,

Re: Side-effects of DataSet::count

2016-05-30 Thread Simone Robutti
On this same subject, I have a question. Is it possible to achieve a lazy count that transforms a DataSet[T] to a DataSet[Long] with a single value containing the length of the original DataSet? Otherwise what is the best way to count the elements lazily? 2016-05-30 23:49 GMT+02:00 Stephan Ewen :

Re: Side-effects of DataSet::count

2016-05-30 Thread Stephan Ewen
Hi Eron! Yes, the idea is to actually switch all executions to a backtracking scheduling mode. That simultaneously solves both fine grained recovery and lazy execution, where later stages build on prior stages. With all the work around streaming, we have not gotten to this so far, but it is one f

Re: [ANNOUNCE] Build Issues Solved

2016-05-30 Thread Ufuk Celebi
Thanks for the effort, Max and Stephan! Happy to see the green light again. On Mon, May 30, 2016 at 11:03 PM, Stephan Ewen wrote: > Hi all! > > After a few weeks of terrible build issues, I am happy to announce that the > build works again properly, and we actually get meaningful CI results. > >

Re: Iteration Intermediate Output

2016-05-30 Thread Andrew Palumbo
Greg, We ran into this Issue when implementing the Mahout bindings for Flink [1]. It ended up being the major bottleneck for Mahout on Flink, and makes iterative algorithms basically unreasonable. While it is understook that that Flink's Delta-iterations are intended for use when iterating ov

[ANNOUNCE] Build Issues Solved

2016-05-30 Thread Stephan Ewen
Hi all! After a few weeks of terrible build issues, I am happy to announce that the build works again properly, and we actually get meaningful CI results. Here is a story in many acts, from builds deep red to bright green joy. Kudos to Max, who did most of this troubleshooting. This evening, Max

Re: Side-effects of DataSet::count

2016-05-30 Thread Eron Wright
Thinking out loud now… Is the job graph fully mutable? Can it be cleared? For example, shouldn’t the count method remove the sink after execution completes? Can numerous job graphs co-exist within a single driver program?How would that relate to the session concept? Seems the count met

PojoComparator question

2016-05-30 Thread Gábor Horváth
Hi! While I was working on code generation support for PojoComparators, I stumbled upon the compareSerialized method [1]. It first creates two new instances and then it is using the reusing overloads of the serializer. Calling the non-reusing overload would create the instance anyways. Is there a

Re: Iteration Intermediate Output

2016-05-30 Thread Kostas Tzoumas
Thanks Greg for opening this discussion! I really really don't want to derail the discussion here, just a quick clarification regarding Suneel's last email: folks that are working at data Artisans are participating in this community as individuals, not as a corporation, and the dev list is not a s

[jira] [Created] (FLINK-3992) Remove Key interface

2016-05-30 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3992: --- Summary: Remove Key interface Key: FLINK-3992 URL: https://issues.apache.org/jira/browse/FLINK-3992 Project: Flink Issue Type: Sub-task Affects Ver

[jira] [Created] (FLINK-3991) Remove deprecated configuration keys from ConfigConstants

2016-05-30 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-3991: - Summary: Remove deprecated configuration keys from ConfigConstants Key: FLINK-3991 URL: https://issues.apache.org/jira/browse/FLINK-3991 Project: Flink Iss

Re: Iteration Intermediate Output

2016-05-30 Thread Suneel Marthi
This is a feature that was requested by the Mahout project few months before for the very same reasons as mentioned in previous emails on this thread, but we were snubbed by the flink folks as this being '*WAY too specific*' request for flink to deal with and 'its got to be done the way Flink has i

Re: Iteration Intermediate Output

2016-05-30 Thread Gábor Gévay
Hello, > Would the best way be to extend the iteration operators to support > intermediate outputs or revisit the idea of caching intermediate results > and thus allow efficient for-loop iterations? Caching intermediate results would also help a lot to projects that are targeting Flink as a backe

Re: NLP & Constraint Programming

2016-05-30 Thread Simone Robutti
I can't say for the second, but Deep NLP is an extremely specific niche and it's out of scope for Flink to support such a functionality. Deep Learning is not supported at all anyway. FlinkML, the machine learning library of Flink, as many other ML libraries on distributed environments are focused o

NLP & Constraint Programming

2016-05-30 Thread Debusmann, Ralph
Hi, I am still a Flink newbie who'd like to contribute. There are two topics which I am most interested in: 1) Deep NLP (Syntactic/Semantic analysis) 2) Constraint Programming For both, I see no built-in support in Flink yet. Or is there (planned maybe)? Cheers, Ralph

Re: Savepoints and memory statebackend

2016-05-30 Thread Ufuk Celebi
Hey Gyula! You are right that in this case the memory snapshots go to the job manager and are part of the save point. The docs seem to be off there. The whole save point backend and pointer business should be removed though in favour of making save points self contained and always go to files. I

Re: [DISCUSS] Allowed Lateness in Flink

2016-05-30 Thread Aljoscha Krettek
Thanks for the feedback! :-) I already read the comments on the file. On Mon, 30 May 2016 at 11:10 Gyula Fóra wrote: > Thanks Aljoscha :) I added some comments that might seem relevant from the > users point of view. > > Gyula > > Aljoscha Krettek ezt írta (időpont: 2016. máj. 30., > H, 10:33):

Re: [DISCUSS] Allowed Lateness in Flink

2016-05-30 Thread Gyula Fóra
Thanks Aljoscha :) I added some comments that might seem relevant from the users point of view. Gyula Aljoscha Krettek ezt írta (időpont: 2016. máj. 30., H, 10:33): > Hi, > I created a new doc specifically about the interplay of lateness and > window state garbage collection: > https://docs.goo

Re: [DISCUSS] Allowed Lateness in Flink

2016-05-30 Thread Aljoscha Krettek
Hi, I created a new doc specifically about the interplay of lateness and window state garbage collection: https://docs.google.com/document/d/1vgukdDiUco0KX4f7tlDJgHWaRVIU-KorItWgnBapq_8/edit?usp=sharing There is still some stuff that needs to be figured out, both in the new doc and the existing do