Re: [ml] Convergence Criterias

2015-07-08 Thread Till Rohrmann
I would make the convergence criterion a parameter which is not mandatory for all Predictors. If you implement an iterative Predictor, then you can define a setConvergenceCriterion method or pass the convergence criterion to the Predictor via the ParameterMap. You can also open a JIRA issue for th

Re: Building several models in parallel

2015-07-08 Thread Felix Neutatz
Hi Felix, thanks for the idea. But doesn't this mean that I can only train one model per partition? The thing is, I have way more models than that :( Best regards, Felix 2015-07-07 22:37 GMT+02:00 Felix Schüler : > Hi Felix! > > We had a similar usecase and I trained multiple models on partitio

Re: Building several models in parallel

2015-07-08 Thread Till Rohrmann
Hi Felix, this is currently not supported by FlinkML. The MultipleLinearRegression algorithm expects a DataSet and not a GroupedDataSet as input. What you can do is to extract each group from the original DataSet by using a filter operation. Once you have done this, you can train the linear model

Re: Building several models in parallel

2015-07-08 Thread Felix Neutatz
Thanks for the information Till :) So at the moment the iteration is the only way. Best regards, Felix 2015-07-08 10:43 GMT+02:00 Till Rohrmann : > Hi Felix, > > this is currently not supported by FlinkML. The MultipleLinearRegression > algorithm expects a DataSet and not a GroupedDataSet as in

Re: Building several models in parallel

2015-07-08 Thread Till Rohrmann
Yes it is. But you can still run the calculation in parallel because `fit` does not trigger the execution of the job graph. It simply builds the data flow. Only if you call `predict` or collect the weights, it is executed. Cheers, Till On Wed, Jul 8, 2015 at 10:52 AM, Felix Neutatz wrote: > Tha

Re: Redesigned "Features" page

2015-07-08 Thread Stephan Ewen
So, what do we do with this now? On Tue, Jul 7, 2015 at 1:41 PM, Stephan Ewen wrote: > +1 to adding links > > In fact, all points should link to some documentation part. > > > > On Tue, Jul 7, 2015 at 1:33 PM, Gyula Fóra wrote: > >> I think the content is pretty good, much better than before. B

Re: Redesigned "Features" page

2015-07-08 Thread Aljoscha Krettek
I would publish it, it is definitely better than the old one. On Wed, 8 Jul 2015 at 11:32 Stephan Ewen wrote: > So, what do we do with this now? > > On Tue, Jul 7, 2015 at 1:41 PM, Stephan Ewen wrote: > > > +1 to adding links > > > > In fact, all points should link to some documentation part. >

Re: Redesigned "Features" page

2015-07-08 Thread Maximilian Michels
+1 for updating it. It is a great improvement and we can still change details subsequently. On Wed, Jul 8, 2015 at 11:32 AM, Aljoscha Krettek wrote: > I would publish it, it is definitely better than the old one. > > On Wed, 8 Jul 2015 at 11:32 Stephan Ewen wrote: > > > So, what do we do with t

Re: Redesigned "Features" page

2015-07-08 Thread Ufuk Celebi
+1 to publish now. On Wednesday, July 8, 2015, Maximilian Michels wrote: > +1 for updating it. It is a great improvement and we can still change > details subsequently. > > On Wed, Jul 8, 2015 at 11:32 AM, Aljoscha Krettek > > wrote: > > > I would publish it, it is definitely better than the ol

Re: How do network transmissions in Flink work?

2015-07-08 Thread Stephan Ewen
Hi! Here are a few pointers: - The data transfer is the responsibility of the receiver. The sender cannot know ahead of time where data is sent - On the receiver side, you should be able to count the received bytes in the RemoteInputChannel or LocalInputChannel. - The JobManager is notifi

[jira] [Created] (FLINK-2327) Log the limit of open file handles at startup

2015-07-08 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2327: --- Summary: Log the limit of open file handles at startup Key: FLINK-2327 URL: https://issues.apache.org/jira/browse/FLINK-2327 Project: Flink Issue Type: Improve

[jira] [Created] (FLINK-2328) Applying more than one transformation on an IterativeDataStream fails

2015-07-08 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2328: - Summary: Applying more than one transformation on an IterativeDataStream fails Key: FLINK-2328 URL: https://issues.apache.org/jira/browse/FLINK-2328 Project: Flink

[jira] [Created] (FLINK-2329) Refactor RPCs from within the ExecutionGraph

2015-07-08 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2329: Summary: Refactor RPCs from within the ExecutionGraph Key: FLINK-2329 URL: https://issues.apache.org/jira/browse/FLINK-2329 Project: Flink Issue Type: Sub-ta

[jira] [Created] (FLINK-2330) Make FromElementsFunction checkpointable

2015-07-08 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2330: --- Summary: Make FromElementsFunction checkpointable Key: FLINK-2330 URL: https://issues.apache.org/jira/browse/FLINK-2330 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-2332) Assign session IDs to JobManager and TaskManager messages

2015-07-08 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2332: Summary: Assign session IDs to JobManager and TaskManager messages Key: FLINK-2332 URL: https://issues.apache.org/jira/browse/FLINK-2332 Project: Flink Issue

[jira] [Created] (FLINK-2331) Whitelist some exceptions to be valid in YARN logs

2015-07-08 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2331: --- Summary: Whitelist some exceptions to be valid in YARN logs Key: FLINK-2331 URL: https://issues.apache.org/jira/browse/FLINK-2331 Project: Flink Issue Type: Bu