Hi Nishu,

the data loss might be caused by the fact that processing time triggers do
not fire when the program terminates.
So, if your program has records stored in a window and program terminates
because the input was fully consumed, the window operator won't process the
remaining windows but simply be canceled.

Best, Fabian

2017-12-07 23:13 GMT+01:00 Nishu <nishuta...@gmail.com>:

> Hi,
>
> Thanks for your inputs.
> I am reading Kafka topics in Global windows and have defined some
> ProcessingTime triggers. Hence there is no late records.
>
> Program is performing join between multiple kafka topics. It consists
> following types of Transformation sequence is something like :
> 1. Read Kafka topic
> 2. Apply Window and trigger on kafka topic
> 3. Parse the data into POJO objects
> 4. Group the POJO objects by their keys
> 5. Read other topics and perform same steps
> 6. Join the Grouped Output with other topic Grouped records.
>
> I get all records until 3rd point as expected. But in point 4, few keys
> are dropped with inconsistent behavior in each run.
> I have tried the pipeline with different-2 setup i.e 1 task slot, 1
> parallel thread,  or multiple task slot n multiple thread.
>
> It looks like BeamFlink runner has some bug in the pipeline translation in
> streaming pipeline scenario.
>
> Thanks,
> Nishu
>
>
> On Thu, Dec 7, 2017 at 7:13 PM, Chen Qin <qinnc...@gmail.com> wrote:
>
>> Nishu
>>
>> You might consider sideouput with metrics at least after window. I would
>> suggest having that to catch data screw or partition screw in all flink
>> jobs  and amend if needed.
>>
>> Chen
>>
>> On Thu, Dec 7, 2017 at 9:48 AM Fabian Hueske <fhue...@gmail.com> wrote:
>>
>>> Is it possible that the data is dropped due to being late, i.e., records
>>> with timestamps behind the current watemark?
>>> What kind of operations does your program consist of?
>>>
>>> Best, Fabian
>>>
>>> 2017-12-07 10:20 GMT+01:00 Sendoh <unicorn.bana...@gmail.com>:
>>>
>>>> I would recommend to also print the count of input and output of each
>>>> operator by using Accumulator.
>>>>
>>>> Cheers,
>>>>
>>>> Sendoh
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.
>>>> nabble.com/
>>>>
>>>
>>> --
>> Chen
>> Software Eng, Facebook
>>
>
>
>
> --
> Thanks & Regards,
> Nishu Tayal
>

Reply via email to