Re: Type erasure problem solely on cluster execution

2016-10-19 Thread Fabian Hueske
Hi Martin, thanks for reporting the problem and providing code to reproduce it. Would you mind to describe the problem with the forwarding annotations in more detail? I would be interested in the error message and how the semantic annotation is provided (@ForwardFields or withForwardedFields()).

Re: Add MapState for keyed streams

2016-10-19 Thread Till Rohrmann
Hi Xiaogang, I really like your proposal and think that this would be a valuable addition to Flink :-) For convenience we could maybe add contains(K key), too. Java's Map interface returns a Set of Entry when calling entrySet() (which is the equivalent of iterator() in our interface). The Entry

Re: Type erasure problem solely on cluster execution

2016-10-19 Thread Martin Junghanns
Hi Fabian, Thank you for the quick reply and for looking into it. Sorry, I was a bit too quick with the field reference accusation. Turns out, my TypeInformation was wrong, hence the invalid reference exception. However, the type erasure problem still holds. The actual code can be found here

Re: Add MapState for keyed streams

2016-10-19 Thread SHI Xiaogang
Agreed. contains(K key) should be provided. The iterator() method should return Iterator> instead of Iterator>. Besides, size() may also be provided. With these methods, MapStates appear very similar to Java Maps. Users will be very happy to use them. Regards, Xiaogang 2016-10-19 16:55 GMT+08

Re: Add MapState for keyed streams

2016-10-19 Thread Aljoscha Krettek
Hi, just making sure I understand this correctly. Would the MapState keys be the same keys as the one provided when creating the KeyedStream or a different key. As an example, would it be like this: DataStream> input = ...; KeyedStream keyed = input.keyBy(0) keyed.map( Tuple2 input -> mapState.pu

Re: Add MapState for keyed streams

2016-10-19 Thread SHI Xiaogang
The keys in the MapState are different from the keys in the KeyedStream. There are two types of keys: Stream Keys and User Keys. StreamKeys are those created by the keyBy() operation. As well as other states, MapStates are created under StreamKeys. The keys of the MapState are called UserKeys. U

Re: Add MapState for keyed streams

2016-10-19 Thread Jark Wu
Hi Xiaogang, I think maybe return Set> is better than Iterator>. Because users can use foreach on Set but not Iterator, and can use iterator access via set.iterator(). Maybe Map.entrySet() is a more familiar way to users. - Jark Wu > 在 2016年10月19日,下午5:18,SHI Xiaogang 写道: > > Agreed. > >

Re: Add MapState for keyed streams

2016-10-19 Thread Aljoscha Krettek
Perfect! Then it's pretty much what we discussed here: https://issues.apache.org/jira/browse/FLINK-3947 and I'm very much in favour of that. Just the implementation of RocksDB could be a bit tricky but it should be doable. Cheers, Aljoscha On Wed, 19 Oct 2016 at 11:43 Jark Wu wrote: > Hi Xiaoga

Re: Add MapState for keyed streams

2016-10-19 Thread SHI Xiaogang
Hi Jark If the state is very big, it may occupy a lot of memory if we return Set>. By wrapping the returned iterator, we can easily implement a method returning Iterable>. Users can use that returned Iterable in the foreach loop. Regards Xiaogang 2016-10-19 17:43 GMT+08:00 Jark Wu : > Hi Xi

Re: Add MapState for keyed streams

2016-10-19 Thread Jark Wu
That makes sense! Maybe we can make MapState implement Iterable interface, so that we can use foreach directly on MapState. - Jark Wu > 在 2016年10月19日,下午5:48,SHI Xiaogang 写道: > > Hi Jark > > If the state is very big, it may occupy a lot of memory if we return > Set>. > > By wrapping the ret

Re: Add MapState for keyed streams

2016-10-19 Thread Jark
That makes sense! Maybe we can make MapState implement Iterable interface. > 在 2016年10月19日,下午5:48,SHI Xiaogang 写道: > > Hi Jark > > If the state is very big, it may occupy a lot of memory if we return > Set>. > > By wrapping the returned iterator, we can easily implement a method > returning I

Removing flink-contrib/flink-operator-stats

2016-10-19 Thread Ufuk Celebi
Hey devs, I would like to propose the removal of the flink-contrib/flink-operator-stats module. It is currently causing some build stability issues (https://issues.apache.org/jira/browse/FLINK-4833) and there is no active maintainer for it as far as I can tell. Are there any objections to this?

[jira] [Created] (FLINK-4857) ZooKeeperUtils have a throws exception clause without throwing exceptions

2016-10-19 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-4857: Summary: ZooKeeperUtils have a throws exception clause without throwing exceptions Key: FLINK-4857 URL: https://issues.apache.org/jira/browse/FLINK-4857 Project: Flin

FlinkML - Evaluate function should manage LabeledVector

2016-10-19 Thread Thomas FOURNIER
Hi, I'd like to improve SVM evaluate function so that it can use LabeledVector (and not only Vector). Indeed, what is done in test is the following (data is a DataSet[LabeledVector]): val test = data.map(l => (l.vector, l.label)) svm.evaluate(test) We would like to do: sm.evaluate(data) Adding

Re: Removing flink-contrib/flink-operator-stats

2016-10-19 Thread Robert Metzger
If there are no users and no contributors of the module, I'm +1 to remove it. If we decide to remove it from flink-contrib, but there are some contributors interested in it, I can offer to assist the contributors to add the extension to Apache Bahir. On Wed, Oct 19, 2016 at 2:00 PM, Ufuk Celebi

Re: Removing flink-contrib/flink-operator-stats

2016-10-19 Thread Maximilian Michels
+1 for removing it in case it is not widely used. Apache Bahir would be a more appropriate place for this module then. -Max On Wed, Oct 19, 2016 at 3:52 PM, Robert Metzger wrote: > If there are no users and no contributors of the module, I'm +1 to remove > it. > > If we decide to remove it from

[jira] [Created] (FLINK-4858) Remove Legacy Checkpointing Interfaces

2016-10-19 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-4858: --- Summary: Remove Legacy Checkpointing Interfaces Key: FLINK-4858 URL: https://issues.apache.org/jira/browse/FLINK-4858 Project: Flink Issue Type: Improv

[jira] [Created] (FLINK-4859) Clearly Separate Responsibilities of StreamOperator and StreamTask

2016-10-19 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-4859: --- Summary: Clearly Separate Responsibilities of StreamOperator and StreamTask Key: FLINK-4859 URL: https://issues.apache.org/jira/browse/FLINK-4859 Project: Flink

Re: FlinkML - Evaluate function should manage LabeledVector

2016-10-19 Thread Theodore Vasiloudis
Hello Thomas, since you are calling evaluate here, you should be creating an EvaluateDataSet operation that works with LabeledVector, I see you are creating a new PredictOperation. On Wed, Oct 19, 2016 at 3:05 PM, Thomas FOURNIER < thomasfournier...@gmail.com> wrote: > Hi, > > I'd like to improv

Re: Removing flink-contrib/flink-operator-stats

2016-10-19 Thread Greg Hogan
Based on a cursory reading of FLINK-1297 I would lean toward dropping the code rather than moving to Apache Bahir. This looks to only be appropriate for batch and this module was not integrated into the runtime. If there is a way forward to make use this code in core Flink then that would be even

[jira] [Created] (FLINK-4860) Sort performance

2016-10-19 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-4860: - Summary: Sort performance Key: FLINK-4860 URL: https://issues.apache.org/jira/browse/FLINK-4860 Project: Flink Issue Type: Improvement Reporter: Greg H

[jira] [Created] (FLINK-4861) Package optional project artifacts

2016-10-19 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-4861: - Summary: Package optional project artifacts Key: FLINK-4861 URL: https://issues.apache.org/jira/browse/FLINK-4861 Project: Flink Issue Type: New Feature

Re: FlinkML - Evaluate function should manage LabeledVector

2016-10-19 Thread Thomas FOURNIER
Hi, Two questions: 1- I was thinking of doing this: implicit def evaluateLabeledVector[T <: LabeledVector] = { new EvaluateDataSetOperation[SVM,T,Double]() { override def evaluateDataSet(instance: SVM, evaluateParameters: ParameterMap, testing: DataSet[T]): DataSet[(Double, Double)] = {

[jira] [Created] (FLINK-4862) NPE on EventTimeSessionWindows with ContinuousEventTimeTrigger

2016-10-19 Thread Manu Zhang (JIRA)
Manu Zhang created FLINK-4862: - Summary: NPE on EventTimeSessionWindows with ContinuousEventTimeTrigger Key: FLINK-4862 URL: https://issues.apache.org/jira/browse/FLINK-4862 Project: Flink Issue