Re: Applying the same operator twice on a windowed stream

2017-01-27 Thread Abdul Salam Shaikh
Yes, your assumption is right. My TrafficWindow is emitting multiple records and I am looking for a way to iterate over these values and emit another set of multiple records(which would be the computed values from the previous stream). Thanks a lot for your input Mr. Hueske :) On Fri, Jan 27, 201

Re: Applying the same operator twice on a windowed stream

2017-01-27 Thread Fabian Hueske
Hi, the window operation is completed after you called apply the first time. The result is a regular DataStream. I assume your TrafficWindow emits multiple records. Otherwise, you'd probably apply a simple MapFunction after the window. So you are looking for a way to iterate over all values retur

Applying the same operator twice on a windowed stream

2017-01-27 Thread Abdul Salam Shaikh
Hi everyone, I have a window definition like this at the moment in snapshot version 1.2.0: final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); env.setParallelism(1); Da

Re: start-cluster.sh issue

2017-01-27 Thread Greg Hogan
Hi Lior, Try adding this to your flink-conf.yaml: env.ssh.opts: FLINK_CONF_DIR=/tmp/parallelmachine/lior/flink/conf I think this is expected and not a bug (setting FLINK_CONF_DIR from the environment is supported for YARN). Please do file a JIRA for this feature as I think it would be a nice im

Re: Queryable State

2017-01-27 Thread Dawid Wysakowicz
Hi Nico, No problem at all, I've already presented my showcase with ValueStateDescriptor. Anyway, if I could help you somehow with the Queryablestate let me know. I will be happy to contribute some code. 2017-01-25 14:47 GMT+01:00 Nico Kruber : > Hi Dawid, > sorry for the late reply, I was fixi

TaskManager randomly dies

2017-01-27 Thread Malte Schwarzer
Hi all, when running a Flink batch job, from time to time a TaskManager dies randomly, which makes the full job failing. All other nodes then throw the following exception: Error obtaining the sorted input: Thread 'SortMerger Reading Thread' terminated due to an exception: Connection unexpectedly

Re: Events are assigned to wrong window

2017-01-27 Thread Aljoscha Krettek
Yes, that's true. On Fri, 27 Jan 2017 at 13:16 Nico wrote: > Hi Aljoscha, > > got it!!! :) Thank you. So, in order to retain the "original" timestamps, > it would be necessary to assign the timestemps after the MapFunction > instead of the kafka source? At lest, this solves the issue in the exam

Re: Datastream - writeAsCsv creates empty File

2017-01-27 Thread Timo Walther
Hi Nico, writeAsCsv has limited functionality in this case. I recommend to use the Bucketing File Sink[1] where you can specify a interval and batch size when to flush. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/filesystem_sink.html#bucketing-file-sink T

Re: Events are assigned to wrong window

2017-01-27 Thread Nico
Hi Aljoscha, got it!!! :) Thank you. So, in order to retain the "original" timestamps, it would be necessary to assign the timestemps after the MapFunction instead of the kafka source? At lest, this solves the issue in the example. Best, Nico 2017-01-27 11:49 GMT+01:00 Aljoscha Krettek : > Now

Re: Events are assigned to wrong window

2017-01-27 Thread Aljoscha Krettek
Now I see. What you're doing in this example is basically reassigning timestamps to other elements in your stateful MapFunction. Flink internally keeps track of the timestamp of an element. This can normally not be changed, except by using a TimestampAssigner, which you're doing. Now, the output fr

setParallelism() for addSource() in streaming

2017-01-27 Thread Sendoh
Hi Flink users, I'd like to setup different parallelisms for different source. I found we cannot DataStream bigStream = env.addSource( new FooSource(properties, bigTopic) ).setParallelism(5)..rebalance(); DataStream smallStream = env.addSource( new FooSou

Datastream - writeAsCsv creates empty File

2017-01-27 Thread Nico
Hi, I am running my Flink job in the local IDE and want to write the results in a csv file using: stream.writeAsCsv("...", FileSystem.WriteMode.OVERWRITE).setParallelism(1) While the file is created, it is empty inside. However, writeAsText works. I have checked the CsvOutputFormat and I think t

Re: parallelism for window operations

2017-01-27 Thread Ovidiu-Cristian MARCU
Now I see (documentation clear), just a correction: because I set PInput as slot sharing group for flatMap, source and flatMap are in different slots. Also that means S6 and S7 are the same slot, as expected because they share the same slot group output. Best, Ovidiu > On 27 Jan 2017, at 10:43

RE: Dummy DataStream

2017-01-27 Thread Radu Tudoran
Hi Duck, I am not 100% sure I understand your exact scenario but I will try to give you some pointers, maybe it will help. Typically when you do the split you have some knowledge about the criterion to do the split. For example if you follow the example from the website https://ci.apache.org/pr

Re: .keyBy() on ConnectedStream

2017-01-27 Thread Timo Walther
Hi Matt, the keyBy() on ConnectedStream has two parameters to specify the key of the left and of the right stream. Same keys end up in the same CoMapFunction/CoFlatMapFunction. If you want to group both streams on a common key, then you can use .union() instead of .connect(). I hope that hel