Hi Fabian,

Program is running until I manually stop it. Trigger is also firing as
expected because I read the entire data after the trigger firing to see
what data is captured. And pass that data over to GroupByKey as Input.
Its using Global window so I accumulate entire data each time the trigger
fires.
So I doubt if triggers are causing the issue.

Thanks & regards,
Nishu

On Thu, Dec 7, 2017 at 11:47 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Nishu,
>
> the data loss might be caused by the fact that processing time triggers do
> not fire when the program terminates.
> So, if your program has records stored in a window and program terminates
> because the input was fully consumed, the window operator won't process the
> remaining windows but simply be canceled.
>
> Best, Fabian
>
> 2017-12-07 23:13 GMT+01:00 Nishu <nishuta...@gmail.com>:
>
>> Hi,
>>
>> Thanks for your inputs.
>> I am reading Kafka topics in Global windows and have defined some
>> ProcessingTime triggers. Hence there is no late records.
>>
>> Program is performing join between multiple kafka topics. It consists
>> following types of Transformation sequence is something like :
>> 1. Read Kafka topic
>> 2. Apply Window and trigger on kafka topic
>> 3. Parse the data into POJO objects
>> 4. Group the POJO objects by their keys
>> 5. Read other topics and perform same steps
>> 6. Join the Grouped Output with other topic Grouped records.
>>
>> I get all records until 3rd point as expected. But in point 4, few keys
>> are dropped with inconsistent behavior in each run.
>> I have tried the pipeline with different-2 setup i.e 1 task slot, 1
>> parallel thread,  or multiple task slot n multiple thread.
>>
>> It looks like BeamFlink runner has some bug in the pipeline translation
>> in streaming pipeline scenario.
>>
>> Thanks,
>> Nishu
>>
>>
>> On Thu, Dec 7, 2017 at 7:13 PM, Chen Qin <qinnc...@gmail.com> wrote:
>>
>>> Nishu
>>>
>>> You might consider sideouput with metrics at least after window. I would
>>> suggest having that to catch data screw or partition screw in all flink
>>> jobs  and amend if needed.
>>>
>>> Chen
>>>
>>> On Thu, Dec 7, 2017 at 9:48 AM Fabian Hueske <fhue...@gmail.com> wrote:
>>>
>>>> Is it possible that the data is dropped due to being late, i.e.,
>>>> records with timestamps behind the current watemark?
>>>> What kind of operations does your program consist of?
>>>>
>>>> Best, Fabian
>>>>
>>>> 2017-12-07 10:20 GMT+01:00 Sendoh <unicorn.bana...@gmail.com>:
>>>>
>>>>> I would recommend to also print the count of input and output of each
>>>>> operator by using Accumulator.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Sendoh
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from: http://apache-flink-user-maili
>>>>> ng-list-archive.2336050.n4.nabble.com/
>>>>>
>>>>
>>>> --
>>> Chen
>>> Software Eng, Facebook
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Nishu Tayal
>>
>
>


-- 
Thanks & Regards,
Nishu Tayal

Reply via email to