Re: Kafka 09 consumer does not commit offsets

2017-01-09 Thread Tzu-Li (Gordon) Tai
Good to know! On January 10, 2017 at 1:06:29 PM, Renjie Liu (liurenjie2...@gmail.com) wrote: Hi, all: I used kafka connector 0.10 and the problem is fixed. I think this maybe caused by incompatible between consumer 0.9 and broker 0.10. Thanks Henri and Gordon. On Tue, Jan 10, 2017 at 4:46 AM H

Re: Kafka 09 consumer does not commit offsets

2017-01-09 Thread Renjie Liu
Hi, all: I used kafka connector 0.10 and the problem is fixed. I think this maybe caused by incompatible between consumer 0.9 and broker 0.10. Thanks Henri and Gordon. On Tue, Jan 10, 2017 at 4:46 AM Henri Heiskanen wrote: > Hi, > > We had the same problem when running 0.9 consumer against 0.10

Re: access to key in sink

2017-01-09 Thread Jark Wu
Hi Telco, What do you mean about the “keyBy value” ? Is it the string parameter value, i.e. “partition” in your case , or the real key value of an actual element being processed ? If you mean the string parameter value, it seems that currently it doesn’t support. If you mean the latter one,

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Yassine MARZOUGUI
Hi, I found the root cause of the problem : the listEligibleFiles method in ContinuousFileMonitoringFunction scans only the topmost files and ignores the nested files. By fixing that I was able to get the expected output. I created Jira issue: https://issues.apache.org/jira/browse/FLINK-5432. @Ko

Re: Flink streaming questions

2017-01-09 Thread Henri Heiskanen
Hi, Unfortunately I can not use reduce function. I am now going with WindowFunction and see how it works on our production load. Br, Henkka On Wed, Jan 4, 2017 at 2:46 PM, Fabian Hueske wrote: > Hi Henri, > > can you express the logic of your FoldFunction (or WindowFunction) as a > combinatio

Re: Are heterogeneous DataStreams possible?

2017-01-09 Thread Henri Heiskanen
Hi, We have been using HashMap and has been working fine so far. Br, Henkka On Mon, Jan 9, 2017 at 5:35 PM, Aljoscha Krettek wrote: > You could try using JSON for all your data, this might me slow, however. > The other route, which I would suggest, is to have your own custom > TypeSerializers

Re: Kafka 09 consumer does not commit offsets

2017-01-09 Thread Henri Heiskanen
Hi, We had the same problem when running 0.9 consumer against 0.10 Kafka. Upgrading Flink Kafka connector to 0.10 fixed our issue. Br, Henkka On Mon, Jan 9, 2017 at 5:39 PM, Tzu-Li (Gordon) Tai wrote: > Hi, > > Not sure what might be going on here. I’m pretty certain that for > FlinkKafkaConsu

Re: Joining two kafka streams

2017-01-09 Thread Igor Berman
Hi Tzu-Li, Huge thanks for the input, I'll try to implement prototype of your idea and see if it answers my requirements On 9 January 2017 at 08:02, Tzu-Li (Gordon) Tai wrote: > Hi Igor! > > What you can actually do is let a single FlinkKafkaConsumer consume from > both topics, producing a sing

Re: keyBy called twice. Second time, INetAddress and Array[Byte] are empty

2017-01-09 Thread Jonas
So I created a minimal working example where this behaviour can still be seen. It is 15 LOC and can be downloaded here: https://github.com/JonasGroeger/flink-inetaddress-zeroed To run it, use sbt: If you don't want to do the above fear not, here is the code: For some reason, java.net.InetAddress

Sliding Event Time Window Processing: Window Function inconsistent behavior

2017-01-09 Thread Sujit Sakre
Hi, We are using Sliding Event Time Window with Kafka Consumer. The window size is 6 minutes, and slide is 2 minutes. We have written a window function to select a particular window out of multiple windows for a keyed stream, e.g. we select about 16 windows out of multiple windows for the keyed st

Sliding Event Time Window Processing: Window Function inconsistent behavior

2017-01-09 Thread Sujit Sakre
Hi, We are using Sliding Event Time Window with Kafka Consumer. The window size is 6 minutes, and slide is 2 minutes. We have written a window function to select a particular window out of multiple windows for a keyed stream, e.g. we select about 16 windows out of multiple windows for the keyed st

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Yassine MARZOUGUI
Hi Kostas, I debugged the code and the nestedFileEnumeration parameter was always true during the execution. I noticed however that in the following loop in ContinuousFileMonitoringFunction, for some reason, the fileStatus was null for files in nested folders, and non null for files directly under

Re: Regarding ordering of events

2017-01-09 Thread Abdul Salam Shaikh
Thanks a lot Aljoshca, this was a perfect answer to my vague question. On 09-Jan-2017 4:52 pm, "Aljoscha Krettek" wrote: Hi, to clarify what Kostas said. A "single window" in this case is a window for a given key and time period so the window for "key1" in time t1 to t2 can be processed on a dif

Re: failure-rate restart strategy not working?

2017-01-09 Thread Aljoscha Krettek
Hi, did you create a Jira issue for this? (I'm just getting up to speed after vacation so sorry if you already did this, I didn't yet read the Jira mail.) Cheers, Aljoscah On Fri, 6 Jan 2017 at 19:08 Stephan Ewen wrote: > I think you are right, enabling checkpointing should not override the > c

Re: Regarding ordering of events

2017-01-09 Thread Aljoscha Krettek
Hi, to clarify what Kostas said. A "single window" in this case is a window for a given key and time period so the window for "key1" in time t1 to t2 can be processed on a different machine from the window for "key2" in time t1 to t2. Cheers, Aljoscha On Thu, 5 Jan 2017 at 21:56 Kostas Kloudas w

Re: Are heterogeneous DataStreams possible?

2017-01-09 Thread Aljoscha Krettek
You could try using JSON for all your data, this might me slow, however. The other route, which I would suggest, is to have your own custom TypeSerializers than can efficiently deal with different types and dynamic schemas. Cheers, Aljoscha On Thu, 5 Jan 2017 at 07:02 ljwagerfield wrote: > I sh

Re: Processing-Time Timers and Checkpoint State

2017-01-09 Thread Scott Kidder
I too was able to confirm that the issue was fixed in the 1.2 release-candidate 0 (zero) tag. I was unable to get it working with 1.1.4, but with 1.2 it worked without any additional modifications. Best, --Scott Kidder On Mon, Jan 9, 2017 at 7:03 AM, Aljoscha Krettek wrote: > I just verified t

Re: Caching collected objects in .apply()

2017-01-09 Thread Aljoscha Krettek
Hi, I think your approach with two window() operations is fine. There is no way to retrieve the result from a previous window because it is not strictly defined what the previous window is. Also, keeping data inside your user functions (in fields) is problematic because these function instances are

Re: Kafka 09 consumer does not commit offsets

2017-01-09 Thread Tzu-Li (Gordon) Tai
Hi, Not sure what might be going on here. I’m pretty certain that for FlinkKafkaConsumer09 when checkpointing is turned off, the internally used KafkaConsumer client will auto commit offsets back to Kafka at a default interval of 5000ms (the default value for “auto.commit.interval.ms”). Could

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Kostas Kloudas
Yes, thanks for the effort. I will look into it. Kostas > On Jan 9, 2017, at 4:24 PM, Lukas Kircher > wrote: > > Thanks for your suggestions: > > @Timo > 1) Regarding the recursive.file.enumeration parameter: I think what counts > here is the enumerateNestedFiles parameter in FileInputFormat

Re: Shared Object Instance over different RichMapFunctions

2017-01-09 Thread Aljoscha Krettek
Hi, Flink will serialise uses functions when distributing work across the cluster. Therefore your shared objects will not be shared objects anymore once your program executes. You will still get object sharing because only one instance of your function is used to process data on one parallel instan

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Lukas Kircher
Thanks for your suggestions: @Timo 1) Regarding the recursive.file.enumeration parameter: I think what counts here is the enumerateNestedFiles parameter in FileInputFormat.java. Calling the setter for enumerateNestedFiles is expected to overwrite recursive.file.enumeration. Not literally - I th

Re: Events are assigned to wrong window

2017-01-09 Thread Aljoscha Krettek
Hi, I'm assuming you also have the call to assignTimestampsAndWatermarks() somewhere in there as well, as in: stream .assignTimestampsAndWatermarks(new TimestampGenerator()) // or somewhere else in the pipeline .keyBy("id") .map(...) .filter(...) .map(...) .keyB

Re: Processing-Time Timers and Checkpoint State

2017-01-09 Thread Aljoscha Krettek
I just verified that this does indeed work on master and the Flink 1.2 branch. I'm also pushing a test that verifies this. Stephan is right that this doesn't work on Flink 1.1? Is that critical for you, if yes, then we should think about pushing out another Flink 1.x bugfix release. Cheers, Aljos

Re: access to key in sink

2017-01-09 Thread Aljoscha Krettek
Hi, it's not possible to access the key in the open method because without an element that is being processed there is no key. The user function is being used to produce elements of different keys that are being processed on the same shard (instance of a parallel operator). You can get the key manu

Re: window function outputs two different values

2017-01-09 Thread tao xiao
Hi team, any suggestions on below topic? I have a requirement that wants to output two different values from a time window reduce function. Here is basic workflow 1. fetch data from Kafka 2. flow the data to a event session window. kafka source -> keyBy -> session window -> reduce 3. inside the

Re: Efficiently splitting a stream 3 ways

2017-01-09 Thread C B
On Jan 9, 2017 3:41 PM, "Aljoscha Krettek" wrote: > I think the split/select variant should be a bit faster because it creates > less object copies internally. It should also be more future proof because > it will benefit from improvements (if any) in the way split/select works. > > There is also

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Kostas Kloudas
Hi Yassine, I suspect that the problem is in the way the input format (and not the reader) scans nested files, but could you see if in the code that is executed by the tasks, the nestedFileEnumeration parameter is still true? I am asking in order to pin down if the problem is in the way we shi

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Kostas Kloudas
Hi Lukas, Are you sure that the tempFile.deleteOnExit() does not remove the files before the test completes. I am just asking to be sure. Also from the code, I suppose that you run it locally. I suspect that the problem is in the way the input format scans nested files, but could you see if in

Re: Finding the address of the jobmanager cluster leader

2017-01-09 Thread Aljoscha Krettek
+Till Off the top of your head, do you know how this could be done? On Fri, 23 Dec 2016 at 11:21 Jim Raney wrote: > Hello all, > > Is there any simple way to get the current address of the jobmanager > cluster leader from any of the jobmanagers? I know it gives you a > redirect on the webui, bu

Re: Efficiently splitting a stream 3 ways

2017-01-09 Thread Aljoscha Krettek
I think the split/select variant should be a bit faster because it creates less object copies internally. It should also be more future proof because it will benefit from improvements (if any) in the way split/select works. There is also some ongoing work in adding support for side outputs which a

Re: Fwd: Continuous File monitoring not reading nested files

2017-01-09 Thread Timo Walther
Hi Lukas, have you tried to set the parameter " recursive.file.enumeration" to true? |// create a configuration object Configuration parameters = new Configuration(); // set the recursive enumeration parameter parameters.setBoolean("recursive.file.enumeration", true); | If this also does not

Fwd: Continuous File monitoring not reading nested files

2017-01-09 Thread Lukas Kircher
Hi all, this is probably related to the problem that I reported in December. In case it helps you can find a self contained example below. I haven't looked deeply into the problem but it seems like the correct file splits are determined but somehow not processed. If I read from HDFS nested file

Re: Continuous File monitoring not reading nested files

2017-01-09 Thread Yassine MARZOUGUI
Hi, Any updates on this issue? Thank you. Best, Yassine On Dec 20, 2016 6:15 PM, "Aljoscha Krettek" wrote: +kostas, who probably has the most experience with this by now. Do you have an idea what might be going on? On Fri, 16 Dec 2016 at 15:45 Yassine MARZOUGUI wrote: > Looks like this is

Re: Kafka 09 consumer does not commit offsets

2017-01-09 Thread Timo Walther
I'm not a Kafka expert but maybe Gordon (in CC) knows more. Timo Am 09/01/17 um 11:51 schrieb Renjie Liu: Hi, all: I'm using flink 1.1.3 and kafka consumer 09. I read its code and it says that the kafka consumer will turn on auto offset commit if checkpoint is not enabled. I've turned off ch

Re: Increasing parallelism skews/increases overall job processing time linearly

2017-01-09 Thread Chakravarthy varaga
Anything that I could check or collect for you for investigation ? On Sat, Jan 7, 2017 at 1:35 PM, Chakravarthy varaga < chakravarth...@gmail.com> wrote: > Hi Stephen > > . Kafka version is: 0.9.0.1 the connector is flinkconsumer09 > . The flatmap n coflatmap are connected by keyBy > . No data is

Kafka 09 consumer does not commit offsets

2017-01-09 Thread Renjie Liu
Hi, all: I'm using flink 1.1.3 and kafka consumer 09. I read its code and it says that the kafka consumer will turn on auto offset commit if checkpoint is not enabled. I've turned off checkpoint and it seems that kafka client is not committing to offsets to kafka? The offset is important for helpin

Re: Queryable State

2017-01-09 Thread Ufuk Celebi
Hey Dawid! Thanks for reporting this. I will try to have a look over the course of the day. From a first impression, this seems like a bug to me. On Sun, Jan 8, 2017 at 4:43 PM, Dawid Wysakowicz wrote: > Hi I was experimenting with the Query State feature and I have some problems > querying the s

Passing App configuration file while running in YARN mode

2017-01-09 Thread dinesh kumar
Hi, I am running Flink on YARN mode. My app consumes data from Kafka and then does some internal processing and then indexes the data to Solr cloud. I have a big Solr cloud (one collection for each month of a language and the collection name is decided from the date of the message read from kafka

Re: Kafka KeyedStream source

2017-01-09 Thread Tzu-Li (Gordon) Tai
Hi Niels, Thank you for bringing this up. I recall there was some previous discussion related to this before: [1]. I don’t think this is possible at the moment, mainly because of how the API is designed. On the other hand, a KeyedStream in Flink is basically just a DataStream with a hash part