Thanks for reporting back Naveen!
2016-05-17 18:55 GMT+02:00 Madhire, Naveen <naveen.madh...@capitalone.com>: > Hi Robert, With the use of manual save points, I was able to obtain > exactly-once output with Kafka and HDFS rolling sink. > > Thanks to you and Fabian for the help. > > > From: Robert Metzger <rmetz...@apache.org> > Reply-To: "user@flink.apache.org" <user@flink.apache.org> > Date: Tuesday, May 17, 2016 at 10:02 AM > > To: "user@flink.apache.org" <user@flink.apache.org> > Subject: Re: Flink recovery > > Hi, > > Savepoints are exactly for that use case: > https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html > > http://data-artisans.com/how-apache-flink-enables-new-streaming-applications/ > > Regards, > Robert > > On Tue, May 17, 2016 at 4:25 PM, Madhire, Naveen < > naveen.madh...@capitalone.com> wrote: > >> Hey Robert, >> >> What is the best way to stop the streaming job in production if I want to >> upgrade the application without loosing messages and causing duplicates. >> How can I test this scenario? >> We are testing few recovery mechanisms like job failure, application >> upgrade and node failure. >> >> >> >> Thanks, >> Naveen >> >> From: Robert Metzger <rmetz...@apache.org> >> Reply-To: "user@flink.apache.org" <user@flink.apache.org> >> Date: Tuesday, May 17, 2016 at 6:58 AM >> To: "user@flink.apache.org" <user@flink.apache.org> >> Subject: Re: Flink recovery >> >> Hi Naveen, >> >> I think cancelling a job is not the right approach for testing our >> exactly-once guarantees. By cancelling a job, you are discarding the state >> of your job. Restarting from scratch (without using a savepoint) will cause >> duplicates. >> What you can do to validate the behavior is randomly killing a task >> manager running your job. Then, the job should restart on the remaining >> machines (make sure that enough slots are available even after the failure) >> and you shouldn't have any duplicates in HDFS. >> >> Regards, >> Robert >> >> >> >> >> >> On Tue, May 17, 2016 at 11:27 AM, Stephan Ewen <se...@apache.org> wrote: >> >>> Hi Naveen! >>> >>> I assume you are using Hadoop 2.7+? Then you should not see the >>> ".valid-length" file. >>> >>> The fix you mentioned is part of later Flink releases (like 1.0.3) >>> >>> Stephan >>> >>> >>> On Mon, May 16, 2016 at 11:46 PM, Madhire, Naveen < >>> naveen.madh...@capitalone.com> wrote: >>> >>>> Thanks Fabian. Actually I don’t see a .valid-length suffix file in the >>>> output HDFS folder. >>>> Can you please tell me how would I debug this issue or do you suggest >>>> anything else to solve this duplicates problem. >>>> >>>> >>>> Thank you. >>>> >>>> From: Fabian Hueske <fhue...@gmail.com> >>>> Reply-To: "user@flink.apache.org" <user@flink.apache.org> >>>> Date: Saturday, May 14, 2016 at 4:10 AM >>>> To: "user@flink.apache.org" <user@flink.apache.org> >>>> Subject: Re: Flink recovery >>>> >>>> The behavior of the RollingFileSink depends on the capabilities of the >>>> file system. >>>> If the file system does not support to truncate files such as older >>>> HDFS versions, an additional file with a .valid-length suffix is written to >>>> indicate how much of the file is valid. >>>> All records / data that come after the valid-length are duplicates. >>>> Please refer to the JavaDocs of the RollingFileSink for details [1]. >>>> >>>> If the .valid-length file does not solve the problem, you might have >>>> found a bug and we should have a closer look at the problem. >>>> >>>> Best, Fabian >>>> >>>> [1] >>>> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/connectors/fs/RollingSink.html >>>> >>>> 2016-05-14 4:17 GMT+02:00 Madhire, Naveen < >>>> naveen.madh...@capitalone.com>: >>>> >>>>> Thanks Fabian. Yes, I am seeing few records more than once in the >>>>> output. >>>>> I am running the job and canceling it from the dashboard, and running >>>>> again. And using different HDFS file outputs both the times. I was >>>>> thinking >>>>> when I cancel the job, it’s not doing a clean cancel. >>>>> Is there anything else which I have to use to make it exactly once in >>>>> the output? >>>>> >>>>> I am using a simple read from kafka, transformations and rolling file >>>>> sink pipeline. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> Naveen >>>>> >>>>> From: Fabian Hueske <fhue...@gmail.com> >>>>> Reply-To: "user@flink.apache.org" <user@flink.apache.org> >>>>> Date: Friday, May 13, 2016 at 4:26 PM >>>>> >>>>> To: "user@flink.apache.org" <user@flink.apache.org> >>>>> Subject: Re: Flink recovery >>>>> >>>>> Hi Naveen, >>>>> >>>>> the RollingFileSink supports exactly-once output. So you should be >>>>> good. >>>>> >>>>> Did you see events being emitted multiple times (should not happen >>>>> with the RollingFileSink) or being processed multiple times within the >>>>> Flink program (might happen as explained before)? >>>>> >>>>> Best, Fabian >>>>> >>>>> 2016-05-13 23:19 GMT+02:00 Madhire, Naveen < >>>>> naveen.madh...@capitalone.com>: >>>>> >>>>>> Thank you Fabian. >>>>>> >>>>>> I am using HDFS rolling sink. This should support the exactly once >>>>>> output in case of failures, isn’t it? I am following the below >>>>>> documentation, >>>>>> >>>>>> >>>>>> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/fault_tolerance.html#fault-tolerance-guarantees-of-data-sources-and-sinks >>>>>> >>>>>> If not what other Sinks can I use to have the exactly once output >>>>>> since getting exactly once output is critical for our use case. >>>>>> >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Naveen >>>>>> >>>>>> From: Fabian Hueske <fhue...@gmail.com> >>>>>> Reply-To: "user@flink.apache.org" <user@flink.apache.org> >>>>>> Date: Friday, May 13, 2016 at 4:13 PM >>>>>> To: "user@flink.apache.org" <user@flink.apache.org> >>>>>> Subject: Re: Flink recovery >>>>>> >>>>>> Hi, >>>>>> >>>>>> Flink's exactly-once semantics do not mean that events are processed >>>>>> exactly-once but that events will contribute exactly-once to the state of >>>>>> an operator such as a counter. >>>>>> Roughly, the mechanism works as follows: >>>>>> - Flink peridically injects checkpoint markers into the data stream. >>>>>> This happens synchronously across all sources and markers. >>>>>> - When an operator receives a checkpoint marker from all its sources, >>>>>> it checkpoints its state and forwards the marker >>>>>> - When the marker was received by all sinks, the distributed >>>>>> checkpoint is noted as successful. >>>>>> >>>>>> In case of a failure, the state of all operators is reset to the last >>>>>> successful checkpoint and the sources are reset to the point when the >>>>>> marker was injected. >>>>>> Hence, some events are sent a second time to the operators but the >>>>>> state of the operators was reset as well. So the repeated events >>>>>> contribute >>>>>> exactly once to the state of an operator. >>>>>> >>>>>> Note, you need a SinkFunction that supports Flink's checkpointing >>>>>> mechanism to achieve exactly-once output. Otherwise, it might happen that >>>>>> results are emitted multiple times. >>>>>> >>>>>> Cheers, Fabian >>>>>> >>>>>> 2016-05-13 22:58 GMT+02:00 Madhire, Naveen < >>>>>> naveen.madh...@capitalone.com>: >>>>>> >>>>>>> I checked the JIRA and looks like FLINK-2111 should address the >>>>>>> issue which I am facing. I am canceling the job from dashboard. >>>>>>> >>>>>>> I am using kafka source and HDFS rolling sink. >>>>>>> >>>>>>> https://issues.apache.org/jira/browse/FLINK-2111 >>>>>>> >>>>>>> Is this JIRA part of Flink 1.0.0? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Naveen >>>>>>> >>>>>>> From: "Madhire, Venkat Naveen Kumar Reddy" < >>>>>>> naveen.madh...@capitalone.com> >>>>>>> Reply-To: "user@flink.apache.org" <user@flink.apache.org> >>>>>>> Date: Friday, May 13, 2016 at 10:58 AM >>>>>>> To: "user@flink.apache.org" <user@flink.apache.org> >>>>>>> Subject: Flink recovery >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> We are trying to test the recovery mechanism of Flink with Kafka and >>>>>>> HDFS sink during failures. >>>>>>> >>>>>>> I’ve killed the job after processing some messages and restarted the >>>>>>> same job again. Some of the messages I am seeing are processed more than >>>>>>> once and not following the exactly once semantics. >>>>>>> >>>>>>> >>>>>>> Also, using the checkpointing mechanism and saving the state >>>>>>> checkpoints into HDFS. >>>>>>> Below is the checkpoint code, >>>>>>> >>>>>>> envStream.enableCheckpointing(11); >>>>>>> envStream.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE); >>>>>>> envStream.getCheckpointConfig().setCheckpointTimeout(60000); >>>>>>> envStream.getCheckpointConfig().setMaxConcurrentCheckpoints(4); >>>>>>> >>>>>>> envStream.setStateBackend(new >>>>>>> FsStateBackend("hdfs://ipaddr/mount/cp/checkpoint/")); >>>>>>> >>>>>>> >>>>>>> One thing I’ve noticed is lowering the time to checkpointing is >>>>>>> actually lowering the number of messages processed more than once and >>>>>>> 11ms >>>>>>> is the lowest I can use. >>>>>>> >>>>>>> Is there anything else I should try to have exactly once message >>>>>>> processing functionality. >>>>>>> >>>>>>> I am using Flink 1.0.0 and kafka 0.8 >>>>>>> >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> ------------------------------ >>>>>>> >>>>>>> The information contained in this e-mail is confidential and/or >>>>>>> proprietary to Capital One and/or its affiliates and may only be used >>>>>>> solely in performance of work or services for Capital One. The >>>>>>> information >>>>>>> transmitted herewith is intended only for use by the individual or >>>>>>> entity >>>>>>> to which it is addressed. If the reader of this message is not the >>>>>>> intended >>>>>>> recipient, you are hereby notified that any review, retransmission, >>>>>>> dissemination, distribution, copying or other use of, or taking of any >>>>>>> action in reliance upon this information is strictly prohibited. If you >>>>>>> have received this communication in error, please contact the sender and >>>>>>> delete the material from your computer. >>>>>>> >>>>>>> ------------------------------ >>>>>>> >>>>>>> The information contained in this e-mail is confidential and/or >>>>>>> proprietary to Capital One and/or its affiliates and may only be used >>>>>>> solely in performance of work or services for Capital One. The >>>>>>> information >>>>>>> transmitted herewith is intended only for use by the individual or >>>>>>> entity >>>>>>> to which it is addressed. If the reader of this message is not the >>>>>>> intended >>>>>>> recipient, you are hereby notified that any review, retransmission, >>>>>>> dissemination, distribution, copying or other use of, or taking of any >>>>>>> action in reliance upon this information is strictly prohibited. If you >>>>>>> have received this communication in error, please contact the sender and >>>>>>> delete the material from your computer. >>>>>>> >>>>>> >>>>>> >>>>>> ------------------------------ >>>>>> >>>>>> The information contained in this e-mail is confidential and/or >>>>>> proprietary to Capital One and/or its affiliates and may only be used >>>>>> solely in performance of work or services for Capital One. The >>>>>> information >>>>>> transmitted herewith is intended only for use by the individual or entity >>>>>> to which it is addressed. If the reader of this message is not the >>>>>> intended >>>>>> recipient, you are hereby notified that any review, retransmission, >>>>>> dissemination, distribution, copying or other use of, or taking of any >>>>>> action in reliance upon this information is strictly prohibited. If you >>>>>> have received this communication in error, please contact the sender and >>>>>> delete the material from your computer. >>>>>> >>>>> >>>>> >>>>> ------------------------------ >>>>> >>>>> The information contained in this e-mail is confidential and/or >>>>> proprietary to Capital One and/or its affiliates and may only be used >>>>> solely in performance of work or services for Capital One. The information >>>>> transmitted herewith is intended only for use by the individual or entity >>>>> to which it is addressed. If the reader of this message is not the >>>>> intended >>>>> recipient, you are hereby notified that any review, retransmission, >>>>> dissemination, distribution, copying or other use of, or taking of any >>>>> action in reliance upon this information is strictly prohibited. If you >>>>> have received this communication in error, please contact the sender and >>>>> delete the material from your computer. >>>>> >>>> >>>> >>>> ------------------------------ >>>> >>>> The information contained in this e-mail is confidential and/or >>>> proprietary to Capital One and/or its affiliates and may only be used >>>> solely in performance of work or services for Capital One. The information >>>> transmitted herewith is intended only for use by the individual or entity >>>> to which it is addressed. If the reader of this message is not the intended >>>> recipient, you are hereby notified that any review, retransmission, >>>> dissemination, distribution, copying or other use of, or taking of any >>>> action in reliance upon this information is strictly prohibited. If you >>>> have received this communication in error, please contact the sender and >>>> delete the material from your computer. >>>> >>> >>> >> >> ------------------------------ >> >> The information contained in this e-mail is confidential and/or >> proprietary to Capital One and/or its affiliates and may only be used >> solely in performance of work or services for Capital One. The information >> transmitted herewith is intended only for use by the individual or entity >> to which it is addressed. If the reader of this message is not the intended >> recipient, you are hereby notified that any review, retransmission, >> dissemination, distribution, copying or other use of, or taking of any >> action in reliance upon this information is strictly prohibited. If you >> have received this communication in error, please contact the sender and >> delete the material from your computer. >> > > > ------------------------------ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >