Re: Beam SQL found limitations

2023-07-10 Thread Kenneth Knowles
It would be interesting to see a design for this. You'll need to partition or it won't scale because SQL "OVER" clause is linear and sorted in this case. Other than that, it should be a pretty straightforward implementation using state + timers + @RequiresTimeSortedInput. Sorting in any other way w

Re: Beam SQL found limitations

2023-05-31 Thread Kenneth Knowles
1. Yes, using state is a better fit than Beam windowing. You will want to use the new feature of annotating the DoFn with @RequiresTimeSortedInput. This will make it so you can be sure you are actually getting the "previous" event. They can arrive in any order without this annotation. You won't be

Re: Beam SQL found limitations

2023-05-31 Thread Wiśniowski Piotr
Hi Kenn, Thanks for clarification. 1. Just to put an example in front - for every event that comes in I need to find corresponding previous event of same user_id and pass previous_event_timestamp in the current event payload down (and also current event becomes previous event for future event

Re: Beam SQL found limitations

2023-05-26 Thread Kenneth Knowles
Just want to clarify that Beam's concept of windowing is really an event-time based key, and they are all processed logically simultaneously. SQL's concept of windowing function is to sort rows and process them linearly. They are actually totally different. From your queries it seems you are intere

Re: Beam SQL found limitations

2023-05-26 Thread Wiśniowski Piotr
Hi Alexey, Thank You for reference to that discussion I do actually have pretty similar thoughts on what Beam SQL needs. Update from my side: Actually did find a workaround for issue with windowing function on stream. It basically boils down to using sliding window to collect and aggregate

Re: Beam SQL found limitations

2023-05-22 Thread Alexey Romanenko
Hi Piotr, Thanks for details! I cross-post this to dev@ as well since, I guess, people there can provide more insights on this. A while ago, I faced the similar issues trying to run Beam SQL against TPC-DS benchmark. We had a discussion around that [1], please, take a look since it can be hel