Hi Fabian, Program is running until I manually stop it. Trigger is also firing as expected because I read the entire data after the trigger firing to see what data is captured. And pass that data over to GroupByKey as Input. Its using Global window so I accumulate entire data each time the trigger fires. So I doubt if triggers are causing the issue.
Thanks & regards, Nishu On Thu, Dec 7, 2017 at 11:47 PM, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Nishu, > > the data loss might be caused by the fact that processing time triggers do > not fire when the program terminates. > So, if your program has records stored in a window and program terminates > because the input was fully consumed, the window operator won't process the > remaining windows but simply be canceled. > > Best, Fabian > > 2017-12-07 23:13 GMT+01:00 Nishu <nishuta...@gmail.com>: > >> Hi, >> >> Thanks for your inputs. >> I am reading Kafka topics in Global windows and have defined some >> ProcessingTime triggers. Hence there is no late records. >> >> Program is performing join between multiple kafka topics. It consists >> following types of Transformation sequence is something like : >> 1. Read Kafka topic >> 2. Apply Window and trigger on kafka topic >> 3. Parse the data into POJO objects >> 4. Group the POJO objects by their keys >> 5. Read other topics and perform same steps >> 6. Join the Grouped Output with other topic Grouped records. >> >> I get all records until 3rd point as expected. But in point 4, few keys >> are dropped with inconsistent behavior in each run. >> I have tried the pipeline with different-2 setup i.e 1 task slot, 1 >> parallel thread, or multiple task slot n multiple thread. >> >> It looks like BeamFlink runner has some bug in the pipeline translation >> in streaming pipeline scenario. >> >> Thanks, >> Nishu >> >> >> On Thu, Dec 7, 2017 at 7:13 PM, Chen Qin <qinnc...@gmail.com> wrote: >> >>> Nishu >>> >>> You might consider sideouput with metrics at least after window. I would >>> suggest having that to catch data screw or partition screw in all flink >>> jobs and amend if needed. >>> >>> Chen >>> >>> On Thu, Dec 7, 2017 at 9:48 AM Fabian Hueske <fhue...@gmail.com> wrote: >>> >>>> Is it possible that the data is dropped due to being late, i.e., >>>> records with timestamps behind the current watemark? >>>> What kind of operations does your program consist of? >>>> >>>> Best, Fabian >>>> >>>> 2017-12-07 10:20 GMT+01:00 Sendoh <unicorn.bana...@gmail.com>: >>>> >>>>> I would recommend to also print the count of input and output of each >>>>> operator by using Accumulator. >>>>> >>>>> Cheers, >>>>> >>>>> Sendoh >>>>> >>>>> >>>>> >>>>> -- >>>>> Sent from: http://apache-flink-user-maili >>>>> ng-list-archive.2336050.n4.nabble.com/ >>>>> >>>> >>>> -- >>> Chen >>> Software Eng, Facebook >>> >> >> >> >> -- >> Thanks & Regards, >> Nishu Tayal >> > > -- Thanks & Regards, Nishu Tayal