Re: Memleak in the SessionWindowing example

2015-05-28 Thread Gábor Gévay
> Let's not get all dramatic :D Ok, sorry :D > If we don't call any methods on the empty groups we can still keep them > off-memory in a persistent storage with a lazy checkpoint/state-access > logic with practically 0 memory overhead. So you mean that whether to call notifyOnLastGlobalElement w

[ANNOUNCE] Introducing Bay Area Apache Flink meetup

2015-05-28 Thread Henry Saputra
Hi All, I would like to announce the new Apache Flink meetup in bay area: http://www.meetup.com/Bay-Area-Apache-Flink-Meetup/ We are cooking the first event for the meetup soon and will have awesome speakers to talk about Apache Flink =) Please join the Bay Area meetup to get the latest news ab

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Gyula Fóra
Let's not get all dramatic :D If we don't call any methods on the empty groups we can still keep them off-memory in a persistent storage with a lazy checkpoint/state-access logic with practically 0 memory overhead. Automatically dropping everything will break a lot of programs without people noti

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Gábor Gévay
Hi, I would vote for making the default behaviour to drop all state for empty groups, and allow a configuration to set the current behaviour instead. This issue will probably have a paragraph in the documentation, but if someone overlooks this, then there is potential for a greater disaster with t

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Gyula Fóra
Hi, Indeed a good catch, and a valid issue exactly because of the stateful nature of the trigger and eviction policies. I agree with the suggested approach that this should be configurable for the discretizers (and could be set through the API). As for the default behaviour, I am not 100%. It co

[jira] [Created] (FLINK-2110) Early slot release after Execution failure

2015-05-28 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2110: -- Summary: Early slot release after Execution failure Key: FLINK-2110 URL: https://issues.apache.org/jira/browse/FLINK-2110 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-2109) CancelTaskException leads to FAILED task state

2015-05-28 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2109: -- Summary: CancelTaskException leads to FAILED task state Key: FLINK-2109 URL: https://issues.apache.org/jira/browse/FLINK-2109 Project: Flink Issue Type: Bug

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Theodore Vasiloudis
+1 This separation was the idea from the start, there is trade-off between having highly configureable optimizers and ensuring that the right types of regularization can only be applied to optimization algorithms that support them. It comes down to viewing the optimization framework mostly as a b

[jira] [Created] (FLINK-2108) Add score function for Predictors

2015-05-28 Thread Theodore Vasiloudis (JIRA)
Theodore Vasiloudis created FLINK-2108: -- Summary: Add score function for Predictors Key: FLINK-2108 URL: https://issues.apache.org/jira/browse/FLINK-2108 Project: Flink Issue Type: Impro

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Till Rohrmann
I think so too. Ok, I'll try to update the PR accordingly. On Thu, May 28, 2015 at 5:36 PM, Mikio Braun wrote: > Ah yeah, I see.. . > > Yes, it's right that many algorithms perform quite differently > depending on the kind of regularization... . Same holds for cutting > plane algorithms which ei

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Mikio Braun
Ah yeah, I see.. . Yes, it's right that many algorithms perform quite differently depending on the kind of regularization... . Same holds for cutting plane algorithms which either reduce to linear or quadratic programs depending on L1 or L2. Generally speaking, I think this is also not surprising

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Till Rohrmann
Yes GradientDescent == (batch-)SGD. That was also my first idea of how to implement it. However, what happens if the regularization is specific to the actually used algorithm. For example, for L-BFGS with L1 regularization you have a different `parameterUpdate` step (Orthant-wise Limited Memory Qu

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Mikio Braun
GradientDescent is the just the (batch-)SGD optimizer right? Actually I think the parameter update should be done by a RegularizationFunction. IMHO the structure should be like this: GradientDescent - collects gradient and regularization updates from - CostFunction LinearModelCostFunction - i

Re: Changed the behavior of "DataSet.print()"

2015-05-28 Thread Chiwan Park
I agree that avoiding name which starts with “print” is better. Regards, Chiwan Park > On May 28, 2015, at 11:35 PM, Maximilian Michels wrote: > > +1 for printOnTaskManager() > > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian > wrote: > >> Thanks, for your quick responses! >> >> I also t

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Till Rohrmann
Hey Mikio, yes you’re right. The SGD only needs to know the gradient of the loss function and some mean to update the weights in accordance with the regularization scheme. Additionally, we also need to be able to compute the loss for the convergence criterion. That’s also how it is implemented in

[jira] [Created] (FLINK-2107) Implement Hash Outer Join algorithm

2015-05-28 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2107: Summary: Implement Hash Outer Join algorithm Key: FLINK-2107 URL: https://issues.apache.org/jira/browse/FLINK-2107 Project: Flink Issue Type: Sub-task

Re: Changed the behavior of "DataSet.print()"

2015-05-28 Thread Maximilian Michels
+1 for printOnTaskManager() On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian wrote: > Thanks, for your quick responses! > > I also think that renaming the old print method should do the trick. As a > contribution to your brainstorming for a name, I propose logOnTaskManager() > ;) > > Cheers, >

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Mikio Braun
[Ok, so maybe this is exactly what is implemented, sorry if I'm just repeating you... ] So C(w, xys) = C regularization(w) + sum over yxs of losses Gradient is C grad reg(w) + sum grad losses(w, xy) For some regularization functions, regularization is better performed by some explicit op

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Mikio Braun
Oh wait.. continue to type. accidentally sent out the message to early. On Thu, May 28, 2015 at 4:03 PM, Mikio Braun wrote: > Hi Till and Theodore, > > I think the code is cleaned up a lot now, introducing the > mapWithBcVariable helped a lot. > > I also get that the goal was to make a cost funct

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Mikio Braun
Hi Till and Theodore, I think the code is cleaned up a lot now, introducing the mapWithBcVariable helped a lot. I also get that the goal was to make a cost function for learning linear model configurable well. My main concern was that the solver itself was already too specifically bound to the ki

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Till Rohrmann
What tweaks would that be? I mean what is required to implement L-BFGS? I guess that we won’t get rid of the case statements because we have to decide between two code paths: One with and the other without convergence criterion. But I think by pulling each branch in its own function, it becomes cl

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Márton Balassi
Thanks for debugging this Gabor, indeed a good catch. I am not so sure about surfacing it in the API though - it seems very specific for the session windowing case. I am also wondering whether maybe this should actually be the default behavior - if there are already empty windows for a group why n

Memleak in the SessionWindowing example

2015-05-28 Thread Gábor Gévay
Hi, At Ericsson, we are implementing something similar to what the SessionWindowing example does: There are events belonging to phone calls (sessions), and every event has a call_id, which tells us which call it belongs to. At the end of every call, a large event has to be emitted that contains s

RE: Changed the behavior of "DataSet.print()"

2015-05-28 Thread Kruse, Sebastian
Thanks, for your quick responses! I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager() ;) Cheers, Sebastian -Original Message- From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Donnersta

Re: Changed the behavior of "DataSet.print()"

2015-05-28 Thread Fabian Hueske
As I said, the common print prefix might indicate eager execution. I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO. 2015-05-28 14:29 GMT+02:00 Stephan Ewen : > Actually, there is a method "print(String prefix)" which still

Re: Changed the behavior of "DataSet.print()"

2015-05-28 Thread Stephan Ewen
Actually, there is a method "print(String prefix)" which still goes to the sysout of where the job is executed. Let's give that one the name "printOnTaskManager()" and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske wrote: > I would avoid to call it printXYZ, since prin

Re: Changed the behavior of "DataSet.print()"

2015-05-28 Thread Fabian Hueske
I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger : > Okay, you are right, local is actually confusing. > I'm against introducing "worker" as a term in the API. Its still called > "TaskManager". Maybe "printOnTaskMana

Re: Changed the behavior of "DataSet.print()"

2015-05-28 Thread Robert Metzger
Okay, you are right, local is actually confusing. I'm against introducing "worker" as a term in the API. Its still called "TaskManager". Maybe "printOnTaskManager()" ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske wrote: > +1 for both. > > printLocal() might not be the best name, because "local

[jira] [Created] (FLINK-2106) Add outer joins to API, Optimizer, and Runtime

2015-05-28 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2106: Summary: Add outer joins to API, Optimizer, and Runtime Key: FLINK-2106 URL: https://issues.apache.org/jira/browse/FLINK-2106 Project: Flink Issue Type: Sub-

Re: Changed the behavior of "DataSet.print()"

2015-05-28 Thread Fabian Hueske
+1 for both. printLocal() might not be the best name, because "local" is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution? 20

[jira] [Created] (FLINK-2105) Implement Sort-Merge Outer Join algorithm

2015-05-28 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2105: Summary: Implement Sort-Merge Outer Join algorithm Key: FLINK-2105 URL: https://issues.apache.org/jira/browse/FLINK-2105 Project: Flink Issue Type: Sub-task

Re: Changed the behavior of "DataSet.print()"

2015-05-28 Thread Robert Metzger
Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called "printLocal()" that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian wrote: > Hi everyone, > > I am a bit worried abou

RE: Changed the behavior of "DataSet.print()"

2015-05-28 Thread Kruse, Sebastian
Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is, th

[jira] [Created] (FLINK-2104) Fallback implicit values for PredictOperation and TransformOperation don't work if Nothing is inferred as the output type

2015-05-28 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2104: Summary: Fallback implicit values for PredictOperation and TransformOperation don't work if Nothing is inferred as the output type Key: FLINK-2104 URL: https://issues.apache.org/j

Re: Adding custom Tuple to a DataSet

2015-05-28 Thread Amit Pawar
Thanks Stephan :) Will try the same. Thanks and Regards Amit Pawar On Thu, May 28, 2015 at 10:41 AM, Stephan Ewen wrote: > Hi! > > If you want to have type hierarchies (like base tuples and different > classes), you cannot use tuples (they are expected to be 'exact schema'), > but you can use

Re: Adding custom Tuple to a DataSet

2015-05-28 Thread Stephan Ewen
Hi! If you want to have type hierarchies (like base tuples and different classes), you cannot use tuples (they are expected to be 'exact schema'), but you can use other classes. Create your own tuple POJO with subclasses, and it should work. Stephan On Thu, May 28, 2015 at 1:30 AM, Amit Pawar

[jira] [Created] (FLINK-2103) Expose partitionBy to the user in Stream API

2015-05-28 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2103: --- Summary: Expose partitionBy to the user in Stream API Key: FLINK-2103 URL: https://issues.apache.org/jira/browse/FLINK-2103 Project: Flink Issue Type:

Re: Storm compatibility layer currently does not support Storm's SimpleJoin example

2015-05-28 Thread Szabó Péter
Hi Matthias, Of course, here is the package that contains the example's source classes. https://github.com/mbalassi/flink/tree/storm-backup/flink-staging/flink-streaming/flink-storm-examples/src/main/java/org/apache/flink/stormcompatibility/singlejoin It is mostly a copy-paste of SimpleJoin from s