Re: Flink Kafka Consumer Behaviour

2016-08-02 Thread Janardhan Reddy
Checkpointing wasn't enabled in the streaming job, but the offsets should have been committed to zookeeper. But we don't see the offsets being written to zookeeper. On Tue, Aug 2, 2016 at 7:41 PM, Till Rohrmann wrote: > Hi Janardhan, > > Flink should commit the current offsets to Zookeeper when

What is output from DataSet.print()?

2016-08-02 Thread Jon Yeargers
Topology snip: datastream = some_stream.keyBy(keySelector).timeWindow(Time.seconds(60)).reduce(new some_KeyReduce()); If I have a KeySelector that's pretty 'loose' (IE lots of matches) the 'some_KeyReduce' function gets hit frequently and some set of values is printed out via 'datastream.print(

Programmatically Creating a Flink Cluster On YARN

2016-08-02 Thread Bostow, Ben
I’m currently trying to programmatically create a Flink cluster on a given YARN cluster. I’m using the FlinkYarnClientBase class to do this currently with some limitations (Flink version 1.0.3). I’m wanting to pass in my own YARN configuration so that I can deploy Flink on different YARN cluste

Re: Reprocessing data in Flink / rebuilding Flink state

2016-08-02 Thread Gyula Fóra
Hi, I think it would probably be a good idea to make these tunable from the command line. Otherwise we might run into the problem of accidentally restoring a job that should fail like it does now. Gyula Stephan Ewen ezt írta (időpont: 2016. aug. 2., K, 17:17): > +1 to ignore unmatched state. >

Re: Container running beyond physical memory limits when processing DataStream

2016-08-02 Thread Jack Huang
Hi Max, Is there a way to limit the JVM memory usage (something like the -Xmx flag) for the task manager so that it won't go over the YARN limit but will just run GC until there is memory to use? Trying to allocate "enough" memory for this stream task is not ideal because I could have indefinitely

Re: Reprocessing data in Flink / rebuilding Flink state

2016-08-02 Thread Stephan Ewen
+1 to ignore unmatched state. Also +1 to allow programs that resume partially (add some new state that starts empty) Both are quite important for program evolution. On Tue, Aug 2, 2016 at 2:58 PM, Ufuk Celebi wrote: > No, unfortunately this is the same for 1.1. The idea was to be explicit > ab

Re: How to read AVRO data from Kafka using Flink

2016-08-02 Thread Shannon Carey
I can tell you that we are reading Avro data from Kafka on Flink without problems. It seems like you have a mistake somewhere in your system. If I were you I would try your serialization & deserialization code in a simple program within the same JVM, then gradually add the other components in or

Re: CEP and Within Clause

2016-08-02 Thread Sameer W
Thank you- It is very clear now. Sameer On Tue, Aug 2, 2016 at 10:29 AM, Till Rohrmann wrote: > The CEP operator maintains for each pattern a window length. This means > that every starting event will set its own timeout value. > > So if T=51 arrives in the 11th minute, then it depends whether

RE: What is the recommended way to read AVRO data from Kafka using flink.

2016-08-02 Thread Alam, Zeeshan
Hi Stephan, I went through one of the old mail thread http://mail-archives.apache.org/mod_mbox/flink-user/201510.mbox/%3CCANC1h_vq-TVjTNhXyYLoVso7GRGkdGWioM5Ppg%3DGoQPjvigqYg%40mail.gmail.com%3E Here it is mentioned that When reading from Kafka you are expected to define a DeserializationSche

Re: CEP and Within Clause

2016-08-02 Thread Till Rohrmann
The CEP operator maintains for each pattern a window length. This means that every starting event will set its own timeout value. So if T=51 arrives in the 11th minute, then it depends whether the second T=31 arrived sometime between the 1st and 11th minute. If that's the case, then you should als

Re: partial savepoints/combining savepoints

2016-08-02 Thread Till Rohrmann
Hi Claudia, 1) At the moment the offset information will be written to the ZooKeeper quorum used by Kafka as well as to the savepoint. Reading the savepoint is not so easy to do since you would need to know the internal representation of the savepoint. But you could try to read the Kafka offsets f

Re: What is the recommended way to read AVRO data from Kafka using flink.

2016-08-02 Thread Stephan Ewen
Hi! I think this is a known limitation for Flink 1.0 and it is fixed in Flink 1.1 Here is the JIRA ticket: https://issues.apache.org/jira/browse/FLINK-3691 Here is the mail thread: http://mail-archives.apache.org/mod_mbox/flink-user/201604.mbox/%3CCAOFSxKtJXfxRKm2=bplu+xvpwqrwd3c8ynuk3iwk9aqvgrc

Re: CEP and Within Clause

2016-08-02 Thread Sameer W
Thanks Till, In that case if I have a pattern - First = T > 30 Followed By = T > 50 Within 10 minutes If I get the following sequence of events within 10 minutes T=31, T=51, T=31, T=51 I assume the alert will fire twice now. But what happens if the last T=51 arrives in the 11th minute. If the p

Re: Flink Kafka Consumer Behaviour

2016-08-02 Thread Till Rohrmann
Hi Janardhan, Flink should commit the current offsets to Zookeeper whenever a checkpoint has been completed. In case that you disabled checkpointing, then the offsets will be periodically committed to ZooKeeper. The default value is 60s. Could it be that there wasn't yet a completed checkpoint? W

What is the recommended way to read AVRO data from Kafka using flink.

2016-08-02 Thread Alam, Zeeshan
Hi, I am using Flink 1.0.3 and FlinkKafkaConsumer08 to read AVRO data from flink. I am having the AVRO schema file with me which was used to write data in Kafka. Here https://ci.apache.org/projects/flink/flink-docs-release-0.8/example_connectors.html you have mentioned that using the GenericDa

Re: CEP and Within Clause

2016-08-02 Thread Till Rohrmann
Hi Sameer, the within clause of CEP uses neither tumbling nor sliding windows. It is more like a session window which is started whenever an element which matches the starting condition arrives. As long as new events which fulfill the pattern definition arrive within the length of the window, they

Re: Reprocessing data in Flink / rebuilding Flink state

2016-08-02 Thread Ufuk Celebi
No, unfortunately this is the same for 1.1. The idea was to be explicit about what works and what not. I see that this is actually a pain for this use case (which is very nice and reasonable ;)). I think we can either always ignore state that does not match to the new job or if that is too aggressi

Re: Window Functions with Incremental Aggregation

2016-08-02 Thread David B. Ciar
Hello again, Having had another go at this today, I clearly see that I cannot pass a certain type into the fold/window function and expect to be able to return a datastream of another type from the window function. I have tried a different approach and am now receiving a run-time exception, cause

Re: Container running beyond physical memory limits when processing DataStream

2016-08-02 Thread Maximilian Michels
Your job creates a lot of String objects which need to be garbage collected. It could be that the JVM is not fast enough and Yarn kills the JVM for consuming too much memory. You can try two things: 1) Give the task manager more memory 2) Increase the Yarn heap cutoff ratio (e.g yarn.heap-cutoff-

Flink Kafka Consumer Behaviour

2016-08-02 Thread Janardhan Reddy
Hi, When the run the following command i am getting that no topic is available for that consumer group. i am suing flink-connector-kafka-0.8_${scala.version}(2.11). ./bin/kafka-consumer-groups.sh --zookeeper <> --group <> --describe No topic available for consumer group provided Does the kafk

Re: OutOfMemoryError

2016-08-02 Thread Stephan Ewen
My guess would be that you have a thread leak in the user code. More memory will not solve the problem, only push it a bit further away. On Mon, Aug 1, 2016 at 9:15 PM, Paulo Cezar wrote: > Hi folks, > > > I'm trying to run a DataSet program but after around 200k records are > processed a "java

How to read AVRO data from Kafka using Flink

2016-08-02 Thread Alam, Zeeshan
Hi All, I am trying to read AVRO data from Kafka using Flink 1.0.3 but I am getting error. I have posted this issue in Stack Overflow: http://stackoverflow.com/questions/38715286/how-to-decode-kafka-messages-using-avro-and-flink . Is there any mistake we can try to look into or there a better w