[DISCUSS] Behaviour of Streaming Sources

2015-05-08 Thread Aljoscha Krettek
Hi, in the process of reworking the Streaming Operator model I'm also reworking the sources in order to get rid of the loop in each source. Right now, the interface for sources (SourceFunction) has one method: run(). This is called when the source starts and can just output elements at any time usi

Re: [DISCUSS] Behaviour of Streaming Sources

2015-05-08 Thread Matthias J. Sax
Did you consider the Storm way to handle this? Storm offers a method "void next()" that uses a collector object to emit new tuples. Using this interface, "next()" can loop internally as long as tuples are available and return if there is (currently) no input. What I have seen, people tend to emit

Re: [DISCUSS] Behaviour of Streaming Sources

2015-05-08 Thread Gyula Fóra
I think the problem with this void next() approach is exactly the way it works: "Using this interface, "next()" can loop internally as long as tuples are available and return if there is (currently) no input." We dont want the user to loop internally in the next because then we have almost the sa

[DISCUSS] Naming and Functionality of Stream Operators and Tasks

2015-05-08 Thread Aljoscha Krettek
Hi, since I'm currently reworking the Stream operators I thought it's a good time to talk about the naming of some classes. We have some legacy problems with lots of Operators, OperatorBases, TwoInput, OneInput, Unary, Binary, etc. And maybe we can break things in streaming to have more consistent

[jira] [Created] (FLINK-1992) Add convergence criterion to SGD optimizer

2015-05-08 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-1992: Summary: Add convergence criterion to SGD optimizer Key: FLINK-1992 URL: https://issues.apache.org/jira/browse/FLINK-1992 Project: Flink Issue Type: Improvem

[jira] [Created] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD

2015-05-08 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-1993: Summary: Replace MultipleLinearRegression's custom SGD with optimization framework's SGD Key: FLINK-1993 URL: https://issues.apache.org/jira/browse/FLINK-1993 Project

Re: [DISCUSS] Behaviour of Streaming Sources

2015-05-08 Thread Matthias J. Sax
You are right. That is why I pointed out this already: > -> You could force the UDF to return each time, be disallowing >>> consecutive calls to Collector.out(...). The Storm design would avoid the "NULL-Problem" Aljoscha mentioned, too. -Matthias On 05/08/2015 10:59 AM, Gyula Fóra wrote: >

[jira] [Created] (FLINK-1994) Add stepsize calculation schemes to SGD

2015-05-08 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-1994: Summary: Add stepsize calculation schemes to SGD Key: FLINK-1994 URL: https://issues.apache.org/jira/browse/FLINK-1994 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-1995) The Flink project is categorized under "Incubator" in the Apache JIRA tracker

2015-05-08 Thread Theodore Vasiloudis (JIRA)
Theodore Vasiloudis created FLINK-1995: -- Summary: The Flink project is categorized under "Incubator" in the Apache JIRA tracker Key: FLINK-1995 URL: https://issues.apache.org/jira/browse/FLINK-1995

[jira] [Created] (FLINK-1996) Add output methods to Table API

2015-05-08 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-1996: Summary: Add output methods to Table API Key: FLINK-1996 URL: https://issues.apache.org/jira/browse/FLINK-1996 Project: Flink Issue Type: Improvement

Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks

2015-05-08 Thread Gyula Fóra
Generally I am in favor of making these name changes. My only concern is regarding to the one-input and multiple inputs operators. There is a general problem with the n-ary operators regarding type safety, thats why we now have SingleInput and Co (two-input) operators. I think we should keep these

[jira] [Created] (FLINK-1997) Neither "<>" nor "!=" supported for non-equals predicates in .filter()

2015-05-08 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-1997: Summary: Neither "<>" nor "!=" supported for non-equals predicates in .filter() Key: FLINK-1997 URL: https://issues.apache.org/jira/browse/FLINK-1997 Project: Flink

[jira] [Created] (FLINK-1998) Equality filter predicate not correctly evaluated by Table API

2015-05-08 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-1998: Summary: Equality filter predicate not correctly evaluated by Table API Key: FLINK-1998 URL: https://issues.apache.org/jira/browse/FLINK-1998 Project: Flink

SparseVector.fromCOO keeps zero entries

2015-05-08 Thread Christoph Alt
Hi, Felix and I are currently working on the implementation of the FeatureHasher (Issue #1735), which in the end returns a SparseVector. When using “SparseVector.fromCOO" I’m facing some odd behaviour I haven’t expected. Assume I create a SparseVector.fromCOO(numFeatures, Map((0, 1.0), (1, 1.0

Generat DataSet gaussian distribution

2015-05-08 Thread Yi ZHOU
Hello, all when I tested AP algorithm, I had a little question : how to generate a DataSet in gaussian distribution? Is there a implemented funtion? Does any one has a solution? Thank you, ZHOU Yi

Re: Generat DataSet gaussian distribution

2015-05-08 Thread Andra Lungu
Hi Yi, To my knowledge, there is no simple way to generate this kind of DataSet(i.e. there is no env.generateGaussianSequence()). However, if you look in flink-perf, Till used something like this there: https://github.com/project-flink/flink-perf/blob/master/flink-jobs/src/main/scala/com/github/pr