Re: Fink: KafkaProducer Data Loss

2017-07-31 Thread Tzu-Li (Gordon) Tai
Hi! Thanks a lot for providing this. I'll try to find some time this week to look into this using your example code. Cheers, Gordon On 29 July 2017 at 4:46:57 AM, ninad (nni...@gmail.com) wrote: Hi Gordon, I was able to reproduce the data loss on standalone flink cluster also. I have stripped

Re: Fink: KafkaProducer Data Loss

2017-07-28 Thread ninad
Hi Gordon, I was able to reproduce the data loss on standalone flink cluster also. I have stripped down version of our code with here: Environment: Flink standalone 1.3.0 Kafka 0.9 *What the code is doing:* -consume messages from kafka topic ('event.filter.topic' property in application.propertie

Re: Fink: KafkaProducer Data Loss

2017-07-13 Thread Piotr Nowojski
Ops, sorry, I forgot that this issue was relevant to FlinkKafkaProducer010 only. Piotrek > On Jul 13, 2017, at 9:33 AM, Tzu-Li (Gordon) Tai wrote: > > Hi Ninad & Piotr, > > AFAIK, when this issue was reported, Ninad was using 09. > FLINK-6996 only affects Flink Kafka Producer 010, so I don’t t

Re: Fink: KafkaProducer Data Loss

2017-07-13 Thread Tzu-Li (Gordon) Tai
Hi Ninad & Piotr, AFAIK, when this issue was reported, Ninad was using 09. FLINK-6996 only affects Flink Kafka Producer 010, so I don’t think that’s the cause here. @Ninad Code to reproduce this would definitely be helpful here, thanks. If you prefer to provide that privately, that would also b

Re: Fink: KafkaProducer Data Loss

2017-07-13 Thread Piotr Nowojski
Hi, I’m not sure how relevant is this, but recently I have found and fixed a bug, that in certain conditions was causing data losses for all of the FlinkKafkaProducers in Flink: https://issues.apache.org/jira/browse/FLINK-6996 Namely on check

Re: Fink: KafkaProducer Data Loss

2017-07-12 Thread ninad
Hey guys, any update on this? If needed I can attach our code. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fink-KafkaProducer-Data-Loss-tp11413p14224.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabb

Re: Fink: KafkaProducer Data Loss

2017-06-16 Thread ninad
Hi Aljoscha, I gather you guys aren't able to reproduce this. Here are the answers to your questions: How do you ensure that you only shut down the brokers once Flink has read all the data that you expect it to read Ninad: I am able to see the number of messages received on the Flink Job UI.

Re: Fink: KafkaProducer Data Loss

2017-06-15 Thread Aljoscha Krettek
ra583.log> >> >> >> BTW, I posted my test results and logs for regular Flink v1.3.0 on Jun 6, >> but don't see that post here. I did receive an email thought. Hope you guys >> saw that. >> >> Thanks for your patience guys. >> >> >> >> -- >> View this message in context: >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fink-KafkaProducer-Data-Loss-tp11413p13597.html >> >> Sent from the Apache Flink User Mailing List archive. mailing list archive >> at Nabble.com. > > > If you reply to this email, your message will be added to the discussion > below: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fink-KafkaProducer-Data-Loss-tp11413p13620.html > To unsubscribe from Fink: KafkaProducer Data Loss, click here. > NAML > > View this message in context: Re: Fink: KafkaProducer Data Loss > Sent from the Apache Flink User Mailing List archive. mailing list archive at > Nabble.com.

Re: Fink: KafkaProducer Data Loss

2017-06-11 Thread ninad
Thanks Gordon. On Jun 11, 2017 9:11 AM, "Tzu-Li (Gordon) Tai [via Apache Flink User Mailing List archive.]" wrote: > Hi Ninad, > > Thanks for the logs! > Just to let you know, I’ll continue to investigate this early next week. > > Cheers, > Gordon > > On 8 June 2017 at 7:08:23 PM, ninad ([hidden

Re: Fink: KafkaProducer Data Loss

2017-06-11 Thread Tzu-Li (Gordon) Tai
Hi Ninad, Thanks for the logs! Just to let you know, I’ll continue to investigate this early next week. Cheers, Gordon On 8 June 2017 at 7:08:23 PM, ninad (nni...@gmail.com) wrote: I built Flink v1.3.0 with cloudera hadoop and am able to see the data loss. Here are the details: *tmOneClou

Re: Fink: KafkaProducer Data Loss

2017-06-08 Thread ninad
I built Flink v1.3.0 with cloudera hadoop and am able to see the data loss. Here are the details: *tmOneCloudera583.log* Received session window task: *2017-06-08 15:10:46,131 INFO org.apache.flink.runtime.taskmanager.Task - TriggerWindow(ProcessingTimeSessionWindows(3

Re: Fink: KafkaProducer Data Loss

2017-06-06 Thread ninad
I tried to build Flink with hadoop cdc 5.8.3 and test it, but it was unable to come up. When I ran the yarn-session script, it exited with something like "Can't get the url for job manager". I didn't spend much time figuring out what's wrong, and went straight to Flink 1.3.0. I ran several tests,

Re: Fink: KafkaProducer Data Loss

2017-06-06 Thread ninad
Not yet. Planning to do that today. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fink-KafkaProducer-Data-Loss-tp11413p13544.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Fink: KafkaProducer Data Loss

2017-06-06 Thread Tzu-Li (Gordon) Tai
Hi, From the logs and the description of your test scenarios where data loss is observed and not observed, it seems like the differentiating factor here is whether or not the session windows trigger was first fired when the checkpoint occurred. It doesn’t however explain the case where your te

Re: Fink: KafkaProducer Data Loss

2017-06-06 Thread ninad
Hi Till, Attaching the logs to this post again. Thanks. jobManager.log tmOne.log tmTwo.log

Re: Fink: KafkaProducer Data Loss

2017-06-06 Thread Till Rohrmann
.n4. >> nabble.com/Fink-KafkaProducer-Data-Loss-tp11413p13479.html >> To unsubscribe from Fink: KafkaProducer Data Loss, click here. >> NAML >> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Ae

Re: Fink: KafkaProducer Data Loss

2017-06-04 Thread ninad
Yeah, this seems like a problem with flink check-pointing. The fact that flink thinks that a checkpoint was successful, but in fact it wasn't. On Jun 4, 2017 7:37 AM, "Tzu-Li (Gordon) Tai [via Apache Flink User Mailing List archive.]" wrote: > Thanks for the updates and testing efforts on this!

Re: Fink: KafkaProducer Data Loss

2017-06-04 Thread Tzu-Li (Gordon) Tai
Thanks for the updates and testing efforts on this! I’m sorry that I currently haven’t found the change to look closely into the testing scenarios you’ve listed, yet. But please keep us updated on this thread after testing it out also with the Cloudera build. One other suggestion for your test

Re: Fink: KafkaProducer Data Loss

2017-06-04 Thread ninad
I tested this with the standalone cluster, and I don't see this problem. So, the problem could be that we haven't built Flink against cloudera Hadoop? I will test it out. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fink-KafkaProducer-Data

Re: Fink: KafkaProducer Data Loss

2017-06-02 Thread ninad
I ran a few tests and was able to find the case where there won't be a data loss. And here's how the two tests are different. *The case where data loss is observed:* 1) Kafka source receives data. (Session window trigger hasn't been fired yet.) 2) Bring all Kafka brokers down. 3) Flink trigge

Re: Fink: KafkaProducer Data Loss

2017-06-02 Thread ninad
Thanks Till. The log files I have attached are the complete logs. They are DEBUG level. There are three files: jobManger.log, tmOne.log and tmTwo.log. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fink-KafkaProducer-Data-Loss-tp11413p13463.

Re: Fink: KafkaProducer Data Loss

2017-06-02 Thread Till Rohrmann
Hi Ninad, After recovery, the job should continue from where the last checkpoint was taken. Thus, it should output all remaining messages at least once to Kafka. Could you share the complete JobManager and TaskManager logs with us? Maybe they contain some information which could be helpful to get

Re: Fink: KafkaProducer Data Loss

2017-06-02 Thread ninad
Thanks Gordon. *2017-06-01 20:22:44,400 WARN org.apache.kafka.clients.producer.internals.Sender - Got error produce response with correlation id 4 on topic-partit ion topic.http.stream.event.processor-0, retrying (9 attempts left). Error: NOT_ENOUGH_REPLICAS , not sure if this may be rel

Re: Fink: KafkaProducer Data Loss

2017-06-02 Thread Tzu-Li (Gordon) Tai
Hi Ninad, Unfortunately I don’t think the provided logs shed any light here. It does complain about: 2017-06-01 20:22:44,400 WARN  org.apache.kafka.clients.producer.internals.Sender - Got error  produce response with correlation id 4 on topic-partit  ion topic.http.stream.event.processor-0, retr

Re: Fink: KafkaProducer Data Loss

2017-06-01 Thread ninad
Thanks Gordon and Kostas. Gordon, "When a failure occurs in the job, Flink uses the last completed checkpoint to restart the job. In the case of the Flink Kafka producer, this essentially makes sure that records which did not make it into Kafka and caused the last run to fail are reprocessed and

Re: Fink: KafkaProducer Data Loss

2017-06-01 Thread Tzu-Li (Gordon) Tai
Hi Ninad, This exception you’re seeing does not cause data loss. As a matter of fact, its preventing data loss based on how Flink’s checkpoints / fault-tolerance works. So, a recap of what the problem was when this “uncaught exception leak” issue was first reported: Prior to the fix, on checkpo

Re: Fink: KafkaProducer Data Loss

2017-06-01 Thread Kostas Kloudas
Hi Ninad, I think that Gordon could shed some more light on this but I suggest you should update your Flink version to at least the 1.2. The reason is that we are already in the process of releasing Flink 1.3 (which will come probably today) and a lot of things have changed/fixed/improved sin

Re: Fink: KafkaProducer Data Loss

2017-05-31 Thread ninad
Thanks for the fix guys. I am trying to test this with 1.1.5, but still seeing a data loss. I am not able to get much from logs except this: Here's our use case: 1) Consume from Kafka 2) Apply session window 3) Send messages of window to Kafka If there's a failure in step 3, because all kafka br

Re: Fink: KafkaProducer Data Loss

2017-02-03 Thread ninad
Thanks, Gordon and Till. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Fink-KafkaProducer-Data-Loss-tp11413p11431.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Fink: KafkaProducer Data Loss

2017-02-02 Thread Tzu-Li (Gordon) Tai
Hi Ninad and Till, Thank you for looking into the issue! This is actually a bug. Till’s suggestion is correct: The producer holds a `pendingRecords` value that is incremented on each invoke() and decremented on each callback, used to check if the producer needs to sync on pending callbacks on c

Re: Fink: KafkaProducer Data Loss

2017-02-02 Thread Till Rohrmann
Hi Ninad, thanks for reporting the issue. For me it looks also as if exceptions might go under certain circumstances unnoticed. So for example you have a write operation which fails this will set the asyncException field which is not checked before the next invoke call happens. If now a checkpoint