Accumulation mode doesn't affect late data dropping: I see your pipeline specifies an allowed lateness of zero which means to drop all data that is beyond the watermark: is this intentional?
On Fri, Dec 8, 2017, 1:23 AM Nishu <nishuta...@gmail.com> wrote: > Below is the code, I use for Windows and triggers. > > *Trigger trigger = Repeatedly.forever(* > * > AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(delayDuration)));* > > *// Define windows for Customer Topic* > *PCollection<KV<String, String>> customerSubset = > customerInput.apply("CustomerWindowParDo",* > * Window.<KV<String, String>> into(new > GlobalWindows()).triggering(trigger).accumulatingFiredPanes()* > * .withAllowedLateness(Duration.standardMinutes(0)));* > > > > Thanks & regards, > Nishu > > On Fri, Dec 8, 2017 at 10:19 AM, Nishu <nishuta...@gmail.com> wrote: > >> Hi Eugene, >> >> In my usecase, I use GlobalWindow ( >> https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions >> ) and specify the trigger. In GLobal Window, entire data is accumulated >> every time the trigger fires. so that we can avoid the late data issue. >> >> I found a JIRA issue(https://issues.apache.org/jira/browse/BEAM-3225 ) >> for the same issue in Beam. >> >> Today I am going to try to write similar implementation in Flink. >> >> Thanks, >> Nishu >> >> >> On Fri, Dec 8, 2017 at 12:08 AM, Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >>> Most likely this is late data >>> https://beam.apache.org/documentation/programming-guide/#watermarks-and-late-data >>> . Try configuring a trigger with a late data behavior that is more >>> appropriate for your particular use case. >>> >>> On Thu, Dec 7, 2017 at 3:03 PM Nishu <nishuta...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am running a Streaming pipeline with Flink runner. >>>> *Operator sequence* is -> Reading the JSON data, Parse JSON String to >>>> the Object and Group the object based on common key. I noticed that >>>> GroupByKey operator throws away some data in between and hence I don't get >>>> all the keys as output. >>>> >>>> In the below screenshot, 1001 records are read from kafka Topic , each >>>> record has unique ID . After grouping it returns only 857 unique IDs. >>>> Ideally it should return 1001 records from GroupByKey operator. >>>> >>>> >>>> [image: Inline image 3] >>>> >>>> Any idea, what can be the issue? Thanks in advance! >>>> >>>> -- >>>> Thanks & Regards, >>>> Nishu Tayal >>>> >>> >> >> >> -- >> Thanks & Regards, >> Nishu Tayal >> > > > > -- > Thanks & Regards, > Nishu Tayal >