Yep, it sure was, now that I've removed the PEBKAC, numbers match up perfectly :)
(A somewhat sheepish) thank you, Liam On Mon, 20 Apr. 2020, 2:51 pm John Roesler, <vvcep...@apache.org> wrote: > Hey Liam, > > Hah! Tell me about it... > > Well, let’s hope that was it. > Thanks, > John > > On Sun, Apr 19, 2020, at 18:43, Liam Clarke wrote: > > Hi John, > > > > Thanks for the reply - yep, that was a dumb copy and paste error, which > is > > what I get for coding while surrounded by kids. >_< I'm deploying a fixed > > version of it as we speak. Thanks for the reply though :) > > > > Kind regards, > > > > Liam Clarke > > > > > > > > On Mon, 20 Apr. 2020, 2:08 am John Roesler, <vvcep...@apache.org> wrote: > > > > > Hi Liam, > > > > > > I took a quick look. On the output side, it looks like you’re adding > the > > > count to the prior count. Should that just set the outbound vale to > the new > > > count? Maybe I misunderstood the situation. > > > > > > What I mean is, suppose you get two events for the same window: > > > > > > Inbound map := 0+1 = 1 > > > Count = 1 > > > Outbound map := 0+1 = 1 > > > (Proposed outbound := 1) > > > > > > Then, > > > > > > Inbound map := 1+1 = 2 > > > Count = 2 > > > Outbound map := 1+2 = 3 > > > (Proposed outbound := 2) > > > > > > Does that make sense? > > > -John > > > > > > On Sun, Apr 19, 2020, at 03:08, Liam Clarke wrote: > > > > Hello all, > > > > > > > > I have been running this code against production data, and I'm > emitting > > > > counts/sums for a sentinel record id to stdout so I can observe the > > > > behaviour: > > > > > > > > > https://gist.github.com/LiamClarkeNZ/b101ce6a42a2e5e1efddfe3a98c5805f > > > > > > > > When this code is run, the window duration is 2 minutes, grace > period is > > > 20 > > > > seconds, and retention time is 20 minutes. > > > > > > > > I am endeavouring to use event time as the timestamp basis for this > > > process: > > > > > https://gist.github.com/LiamClarkeNZ/8265cec02e21f5969e0fedb8281a2180 > > > > > > > > So, my sentinel debugging output shows a surprising behaviour in > that the > > > > outbound counts for the key always sum higher than the inbound > count. For > > > > example: > > > > > > > > Sample: 2020-04-19T07:31:37.492Z > > > > > > > > Inbound > > > > { > > > > 2020-04-19T03:00:00Z=4563, > > > > 2020-04-19T04:00:00Z=5629, > > > > 2020-04-19T05:00:00Z=8489, > > > > 2020-04-19T06:00:00Z=13599 > > > > } > > > > > > > > Outbound > > > > { > > > > 2020-04-19T03:00:00Z=4717, > > > > 2020-04-19T04:00:00Z=5890, > > > > 2020-04-19T05:00:00Z=8826, > > > > 2020-04-19T06:00:00Z=13951 > > > > } > > > > > > > > This makes me suspect that either I'm not using the window I thought > I > > > was > > > > (e.g., I'm somehow using a sliding window instead of a tumbling > window) > > > or > > > > that I have made a rookie error somewhere in my aggregations, or I've > > > just > > > > misunderstood something about this. Does it matter that the window > size > > > in > > > > the persistent window store doesn't match the windowing time + grace > time > > > > in the windowing clause? > > > > > > > > Any pointers gratefully welcome. > > > > > > > > Kind regards, > > > > > > > > Liam Clarke-Hutchinson > > > > > > > > > >