Hi All,

I was under an assumption that one needs to run grouby(window(...)) to run
any stateful operations but looks like that is not the case since any
aggregation like query

"select count(*) from some_view"  is also stateful since it stores the
result of the count from the previous batch. Likewise, if I do

"select collect_list(*) from some_view" with say maxOffsetsTrigger set to 1
I can see the rows from the previous batch at every trigger.

so is it fair to say aggregations by default are stateful?

I am looking more like DStream like an approach(stateless) where I want to
collect bunch of records on each batch do some aggregation like say count
and throw the result out and next batch it should only count from that
batch only but not from the previous batch.

so If I run "select collect_list(*) from some_view" I want to collect
whatever rows are available at each batch/trigger but not from the previous
batch. How do I do that?

Thanks!

Reply via email to