Re: Data loss in BeamFlinkRunner

Nishu Fri, 08 Dec 2017 01:20:48 -0800

Hi Eugene,

In my usecase, I use GlobalWindow (
https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions
) and specify the trigger. In GLobal Window, entire data is accumulated
every time the trigger fires. so that we can avoid the late data issue.


I found a JIRA issue(https://issues.apache.org/jira/browse/BEAM-3225 ) for
the same issue in Beam.

Today I am going to try to write similar implementation in Flink.

Thanks,
Nishu


On Fri, Dec 8, 2017 at 12:08 AM, Eugene Kirpichov <kirpic...@google.com>
wrote:

> Most likely this is late data https://beam.apache.org/
> documentation/programming-guide/#watermarks-and-late-data . Try
> configuring a trigger with a late data behavior that is more appropriate
> for your particular use case.
>
> On Thu, Dec 7, 2017 at 3:03 PM Nishu <nishuta...@gmail.com> wrote:
>
>> Hi,
>>
>> I am running a Streaming pipeline  with Flink runner.
>> *Operator sequence* is -> Reading the JSON data, Parse JSON String to
>> the Object and  Group the object based on common key.  I noticed that
>> GroupByKey operator throws away some data in between and hence I don't get
>> all the keys as output.
>>
>> In the below screenshot, 1001 records are read from kafka Topic , each
>> record has unique ID .  After grouping it returns only 857 unique IDs.
>> Ideally it should return 1001 records from GroupByKey operator.
>>
>>
>> [image: Inline image 3]
>>
>> Any idea, what can be the issue? Thanks in advance!
>>
>> --
>> Thanks & Regards,
>> Nishu Tayal
>>
>


-- 
Thanks & Regards,
Nishu Tayal

Re: Data loss in BeamFlinkRunner

Reply via email to