Testing Apache Flink 0.9.0-rc1

2015-06-08 Thread Ufuk Celebi
Hey Chiwan! Is the problem reproducible? Does it always deadlock? Can you please wait for it to deadlock and then post a stacktrace (jps and jstack) of the process? Please post it to this issue: FLINK-2183. Thanks :) – Ufuk On Monday, June 8, 2015, Chiwan Park > wrote: > Hi. I have a problem r

[jira] [Created] (FLINK-2187) KMeans clustering is not present in release-0.9-rc1

2015-06-08 Thread Sachin Goel (JIRA)
Sachin Goel created FLINK-2187: -- Summary: KMeans clustering is not present in release-0.9-rc1 Key: FLINK-2187 URL: https://issues.apache.org/jira/browse/FLINK-2187 Project: Flink Issue Type: Bug

Re: Testing Apache Flink 0.9.0-rc1

2015-06-08 Thread Chiwan Park
Hi. I have a problem running `mvn clean verify` command. TaskManagerFailsWithSlotSharingITCase hangs in Oracle JDK 7 (1.7.0_80). But in Oracle JDK 8 the test case doesn’t hang. I’ve investigated about this problem but I cannot found the bug. Regards, Chiwan Park > On Jun 9, 2015, at 2:11 AM, Má

Re: Testing Apache Flink 0.9.0-rc1

2015-06-08 Thread Márton Balassi
Added F7 Running against Kafka cluster for me in the doc. Doing it tomorrow. On Mon, Jun 8, 2015 at 7:00 PM, Chiwan Park wrote: > Hi. I’m very excited about preparing a new major release. :) > I just picked two tests. I will report status as soon as possible. > > Regards, > Chiwan Park > > > On

Re: Testing Apache Flink 0.9.0-rc1

2015-06-08 Thread Chiwan Park
Hi. I’m very excited about preparing a new major release. :) I just picked two tests. I will report status as soon as possible. Regards, Chiwan Park > On Jun 9, 2015, at 1:52 AM, Maximilian Michels wrote: > > Hi everyone! > > As previously discussed, the Flink developer community is very eager

Testing Apache Flink 0.9.0-rc1

2015-06-08 Thread Maximilian Michels
Hi everyone! As previously discussed, the Flink developer community is very eager to get out a new major release. Apache Flink 0.9.0 will contain lots of new features and many bugfixes. This time, I'll try to coordinate the release process. Feel free to correct me if I'm doing something wrong beca

[jira] [Created] (FLINK-2186) Reworj SVM import to support very wide files

2015-06-08 Thread Theodore Vasiloudis (JIRA)
Theodore Vasiloudis created FLINK-2186: -- Summary: Reworj SVM import to support very wide files Key: FLINK-2186 URL: https://issues.apache.org/jira/browse/FLINK-2186 Project: Flink Issue

[jira] [Created] (FLINK-2185) Rework semantics for .setSeed function of SVM

2015-06-08 Thread Theodore Vasiloudis (JIRA)
Theodore Vasiloudis created FLINK-2185: -- Summary: Rework semantics for .setSeed function of SVM Key: FLINK-2185 URL: https://issues.apache.org/jira/browse/FLINK-2185 Project: Flink Issue

[jira] [Created] (FLINK-2184) Cannot get last element with maxBy/minBy

2015-06-08 Thread JIRA
Gábor Hermann created FLINK-2184: Summary: Cannot get last element with maxBy/minBy Key: FLINK-2184 URL: https://issues.apache.org/jira/browse/FLINK-2184 Project: Flink Issue Type: Improvemen

[jira] [Created] (FLINK-2183) TaskManagerFailsWithSlotSharingITCase fails.

2015-06-08 Thread Sachin Goel (JIRA)
Sachin Goel created FLINK-2183: -- Summary: TaskManagerFailsWithSlotSharingITCase fails. Key: FLINK-2183 URL: https://issues.apache.org/jira/browse/FLINK-2183 Project: Flink Issue Type: Bug

Re: ALS implementation

2015-06-08 Thread Till Rohrmann
Hi Felix, I tried to reproduce the problem with the *Hash join exceeded maximum number of recursions, without reducing partitions enough to be memory resident.* exception. I used the same data set and the same settings for ALS. However, on my machine it runs through without this exception. Could yo

Re: Problem with ML pipeline

2015-06-08 Thread Sachin Goel
That would be better of course. My opinion had to do with not-implementing-exactly-the-same-thing-twice. Perhaps Till could weigh in here. We really do need to come up with a general mechanism for this. Testing labeled vectors has exactly the same problem. I'll look into how Spark and sci-kit appro

Re: Problem with ML pipeline

2015-06-08 Thread Felix Neutatz
I am in favor of efficiency. Therefore I would be prefer to introduce new methods, in order to save memory and network traffic. This would also solve the problem of "how to come up with ids?" Best regards, Felix Am 08.06.2015 12:52 nachm. schrieb "Sachin Goel" : > I think if the user doesn't prov

Re: Problem with ML pipeline

2015-06-08 Thread Sachin Goel
I think if the user doesn't provide IDs, we can safely assume that they don't need it. We can just simply assign an ID of one as a temporary measure and return the result, with no IDs [just to make the interface cleaner]. If the IDs are provided, in that case, we simply use those IDs. A possible te

Re: Problem with ML pipeline

2015-06-08 Thread Till Rohrmann
My gut feeling is also that a `Transformer` would be a good place to implement feature selection. Then you can simply reuse it across multiple algorithms by simply chaining them together. However, I don't know yet what's the best way to realize the IDs. One way would be to add an ID field to `Vect

Re: Problem with ML pipeline

2015-06-08 Thread Sachin Goel
Yes. I agree too. It makes no sense for the learning algorithm to have extra payload. Only relevant data makes sense. Further, adding ID to the predict operation type definition seems a legitimate choice. +1 from my side. Regards Sachin Goel On Mon, Jun 8, 2015 at 4:06 PM, Theodore Vasiloudis < t

Re: Problem with ML pipeline

2015-06-08 Thread Theodore Vasiloudis
I agree with Mikio; ids would be useful overall, and feature selection should not be a part of learning algorithms, all features in a LabeledVector should be assumed to be relevant by the learners. On Mon, Jun 8, 2015 at 12:00 PM, Mikio Braun wrote: > Hi all, > > I think there are number of issu

[jira] [Created] (FLINK-2182) Add stateful Streaming Sequence Source

2015-06-08 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2182: --- Summary: Add stateful Streaming Sequence Source Key: FLINK-2182 URL: https://issues.apache.org/jira/browse/FLINK-2182 Project: Flink Issue Type: Improv

Re: Problem with ML pipeline

2015-06-08 Thread Mikio Braun
Hi all, I think there are number of issues here: - whether or not we generally need ids for our examples. For time-series, this is a must, but I think it would also help us with many other things (like partitioning the data, or picking a consistent subset), so I would think adding (numeric) ids i

Re: Problem with ML pipeline

2015-06-08 Thread Till Rohrmann
You're right Felix. You need to provide the `FitOperation` and `PredictOperation` for the `Predictor` you want to use and the `FitOperation` and `TransformOperation` for all `Transformer`s you want to chain in front of the `Predictor`. Specifying which features to take could be a solution. However

Re: Planning the 0.9 Release

2015-06-08 Thread Márton Balassi
The problem is still there. @Aljoscha: It would be great if you could take it. On Mon, Jun 8, 2015 at 9:41 AM, Gyula Fóra wrote: > I agree with Marton. I thought Aljoscha was working on that. > > On Monday, June 8, 2015, Márton Balassi wrote: > > > FLINK-2054 is definitely a problem if it persi

Re: Memleak in the SessionWindowing example

2015-06-08 Thread Gábor Gévay
I have now created the JIRA: https://issues.apache.org/jira/browse/FLINK-2181 Best regards, Gabor 2015-06-08 0:55 GMT+02:00 Robert Metzger : > What is the status of this issue? > I think we should at least file a JIRA for it to have it around as a TODO. > > On Thu, May 28, 2015 at 10:01 PM, Gábo

[jira] [Created] (FLINK-2181) SessionWindowing example has a memleak

2015-06-08 Thread Gabor Gevay (JIRA)
Gabor Gevay created FLINK-2181: -- Summary: SessionWindowing example has a memleak Key: FLINK-2181 URL: https://issues.apache.org/jira/browse/FLINK-2181 Project: Flink Issue Type: Bug Co

Re: Planning the 0.9 Release

2015-06-08 Thread Gyula Fóra
I agree with Marton. I thought Aljoscha was working on that. On Monday, June 8, 2015, Márton Balassi wrote: > FLINK-2054 is definitely a problem if it persists. Sorry for missing it, > solving it asap. > > On Mon, Jun 8, 2015 at 7:18 AM, Ufuk Celebi > > wrote: > > > > > On 08 Jun 2015, at 00:22,

Re: Planning the 0.9 Release

2015-06-08 Thread Márton Balassi
FLINK-2054 is definitely a problem if it persists. Sorry for missing it, solving it asap. On Mon, Jun 8, 2015 at 7:18 AM, Ufuk Celebi wrote: > > On 08 Jun 2015, at 00:22, Robert Metzger wrote: > > > What about https://issues.apache.org/jira/browse/FLINK-2177 and > > https://issues.apache.org/ji