Re: [DISCUSS] Gelly planning for release 1.3 and roadmap

2017-02-28 Thread Vasiliki Kalavri
Hi Xingcan, thank you for your input! On 27 February 2017 at 14:03, Xingcan Cui wrote: > Hi Vasia and Greg, > > thanks for the discussion. I'd like to share my thoughts. > > 1) I don't think it's necessary to extend the algorithm list intentionally. > It's just like a textbook that can not cove

Re: [DISCUSS] Table API / SQL indicators for event and processing time

2017-02-28 Thread jincheng sun
Hi,Fabian, Thanks for your attention to this discussion. Let me share some ideas about this. :) 1. Yes, the solution I have proposed can indeed be extended to support multi-watermarks. A single watermark is a special case of multiple watermarks (n = 1). I agree that for the realization of the si

Re: [Dev] Issue related to using Flink DataSet methods

2017-02-28 Thread Pawan Manishka Gunarathna
Hi, So how can I read the available records of my datasource. I saw in some examples that print() method will print the available data of that datasource. ( like files ) Thanks, Pawan On Wed, Mar 1, 2017 at 11:30 AM, Xingcan Cui wrote: > Hi Pawan, > > in Flink, most of the methods for DataSet

Re: [Dev] Issue related to using Flink DataSet methods

2017-02-28 Thread Xingcan Cui
Hi Pawan, in Flink, most of the methods for DataSet (including print()) will just add operators to the plan but not really run it. If the DASInputFormat has no error, you can run the plan by calling environment.execute(). Best, Xingcan On Wed, Mar 1, 2017 at 12:17 PM, Pawan Manishka Gunarathna <

[Dev] Issue related to using Flink DataSet methods

2017-02-28 Thread Pawan Manishka Gunarathna
Hi, I have implemented a Flink InputFormat interface related to my datasource. It have our own data type as *Record*. So my class seems as follows, public class DASInputFormat implements InputFormat { } So when I executed the print() method, my console shows the Flink execution, but nothing will

Re: [DISCUSS] Table API / SQL indicators for event and processing time

2017-02-28 Thread Xingcan Cui
Hi all, I have a question about the designate time for `rowtime`. The current design do this during the DataStream to Table conversion. Does this mean that `rowtime` is only valid for the source streams and can not be designated after a subquery? (That's why I considered using alias to dynamically

Re: [DISCUSS] Table API / SQL indicators for event and processing time

2017-02-28 Thread Fabian Hueske
Hi Jincheng Sun, registering watermark functions for different attributes to allow each of them to be used in a window is an interesting idea. However, watermarks only work well if the streaming data is (almost) in timestamp order. Since it is not possible to sort a stream, all attributes that wo

Re: [DISCUSS] Side Outputs and Split/Select

2017-02-28 Thread Fabian Hueske
Hi Chen and Aljoscha, thanks for the great proposal and work. I prefer the WindowedOperator.getLateStream() variant without explicit tags. I think it is fine to start adding side output to ProcessFunction (keyed and non-keyed) and window operators and see how it is picked up by users. Best, Fabi

Re: [DISCUSS] Flink ML roadmap

2017-02-28 Thread Gábor Hermann
Hi Philipp, It's great to hear you are interested in Flink ML! Based on your description, your prototype seems like an interesting approach for combining online+offline learning. If you're interested, we might find a way to integrate your work, or at least your ideas, into Flink ML if we deci

RE: [DISCUSS] Table API / SQL indicators for event and processing time

2017-02-28 Thread Stefano Bortoli
Hi all, I have completed a first implementation that works for the SQL query SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY a RANGE BETWEEN 2 PRECEDING) AS sumB FROM MyTable I have SUM, MAX, MIN, AVG, COUNT implemented but I could test it just on simple queries such as the one above. Is there a

[jira] [Created] (FLINK-5939) Wrong version in README.md for several Apache Flink extensions

2017-02-28 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-5939: -- Summary: Wrong version in README.md for several Apache Flink extensions Key: FLINK-5939 URL: https://issues.apache.org/jira/browse/FLINK-5939 Project: Fli

[jira] [Created] (FLINK-5938) Replace ExecutionContext by Executor in Scheduler

2017-02-28 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-5938: Summary: Replace ExecutionContext by Executor in Scheduler Key: FLINK-5938 URL: https://issues.apache.org/jira/browse/FLINK-5938 Project: Flink Issue Type: I

Re: [DISCUSS] Per-key event time

2017-02-28 Thread Jamie Grier
Thinking about this a bit more... I think it may be interesting to enable two modes for event-time advancement in Flink 1) The current mode which I'll call partition-based, pessimistic, event-time advancement 2) Key-based, eager, event-time advancement In this key-based eager mode it's actually

[jira] [Created] (FLINK-5937) Add documentation about the task lifecycle.

2017-02-28 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-5937: - Summary: Add documentation about the task lifecycle. Key: FLINK-5937 URL: https://issues.apache.org/jira/browse/FLINK-5937 Project: Flink Issue Type: Bug

Re: [DISCUSS] Side Outputs and Split/Select

2017-02-28 Thread Aljoscha Krettek
Quick update: I created a branch where I make the result type of WindowedStream operations more specific: https://github.com/aljoscha/flink/blob/windowed-stream-result-specific/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/WindowedStream.java We would need this for t

Re: [DISCUSS] Per-key event time

2017-02-28 Thread Aljoscha Krettek
@Tzu-Li Yes, the deluxe stream would not allow another keyBy(). Or we could allow it but then we would exit the world of the deluxe stream and per-key watermarks and go back to the realm of normal streams and keyed streams. On Tue, 28 Feb 2017 at 10:08 Tzu-Li (Gordon) Tai wrote: > Throwing in so

Re: [DISCUSS] Code style / checkstyle

2017-02-28 Thread Aljoscha Krettek
By the way, I also don't see the benefit of doing the transition piece by piece. On Mon, 27 Feb 2017 at 22:21 Dawid Wysakowicz wrote: > I agree with adopting a custom codestyle/checkstyle for flink, but as I > understood correctly most people agree there is no point of providing an > unenforced

[jira] [Created] (FLINK-5936) Can't pass keyed vectors to KNN join algorithm

2017-02-28 Thread Alex DeCastro (JIRA)
Alex DeCastro created FLINK-5936: Summary: Can't pass keyed vectors to KNN join algorithm Key: FLINK-5936 URL: https://issues.apache.org/jira/browse/FLINK-5936 Project: Flink Issue Type: Im

[jira] [Created] (FLINK-5935) confusing/misleading error message when failing to restore savepoint

2017-02-28 Thread David Anderson (JIRA)
David Anderson created FLINK-5935: - Summary: confusing/misleading error message when failing to restore savepoint Key: FLINK-5935 URL: https://issues.apache.org/jira/browse/FLINK-5935 Project: Flink

[jira] [Created] (FLINK-5934) Scheduler in ExecutionGraph null if failure happens in ExecutionGraph.restoreLatestCheckpointedState

2017-02-28 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-5934: Summary: Scheduler in ExecutionGraph null if failure happens in ExecutionGraph.restoreLatestCheckpointedState Key: FLINK-5934 URL: https://issues.apache.org/jira/browse/FLINK-5934

Re: [DISCUSS] Table API / SQL indicators for event and processing time

2017-02-28 Thread jincheng sun
Hi everyone, thanks for sharing your thoughts. I really like Timo’s proposal, and I have a few thoughts want to share. We want to keep the query same for batch and streaming. IMO. “process time” is something special to dataStream while it is not a well defined term for batch query. So it is kind o

Re: [DISCUSS] Side Outputs and Split/Select

2017-02-28 Thread Ufuk Celebi
On Tue, Feb 28, 2017 at 11:38 AM, Aljoscha Krettek wrote: > I see the ProcessFunction as a bit of the generalised future of FlatMap, so > to me it makes sense to only allow side outputs on the ProcessFunction but > I'm open for anything. If we decide for this I'm happy with an additional > method

Re: [DISCUSS] Side Outputs and Split/Select

2017-02-28 Thread Aljoscha Krettek
About 1: We can definitely go with Jamie's proposal for the late data side output, for me this is just a name and anything that has "late" in it is perfect! Regarding 2: I agree, and I though about implementing split/select on top of side outputs and it should be easily doable. I think side output

[jira] [Created] (FLINK-5933) Allow Evictor for merging windows

2017-02-28 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-5933: --- Summary: Allow Evictor for merging windows Key: FLINK-5933 URL: https://issues.apache.org/jira/browse/FLINK-5933 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-5932) Order of legacy vs new state initialization in the AbstractStreamOperator.

2017-02-28 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-5932: - Summary: Order of legacy vs new state initialization in the AbstractStreamOperator. Key: FLINK-5932 URL: https://issues.apache.org/jira/browse/FLINK-5932 Project: F

Re: [DISCUSS] Side Outputs and Split/Select

2017-02-28 Thread Ufuk Celebi
1. I like the variant without the explicit OutputTag for the WindowOperator: WindowedOperator windowedResult = input .keyBy(...) .window(...) .apply(...) DataStream lateData = windowedResult.getLateDataSideOutput(); I like Jamie's proposal getLateStream() a little better though. On the oth

Re: [DISCUSS] Per-key event time

2017-02-28 Thread Tzu-Li (Gordon) Tai
Throwing in some thoughts: When a source determines that no more data will come for a key (which  in itself is a bit of a tricky problem) then it should signal to downstream  operations to take the key out of watermark calculations, that is that we  can release some space.  I don’t think this is p