Can you explain more about what exactly you are trying to do? On Thu, Dec 10, 2020 at 2:51 PM Tao Li <t...@zillow.com> wrote:
> Hi Beam community, > > > > I got a quick question about GoupByKey operator. According to this doc > <https://beam.apache.org/documentation/programming-guide/#groupbykey>, if > we are using unbounded PCollection, it’s required to specify either non-global > windowing > <https://beam.apache.org/documentation/programming-guide/#setting-your-pcollections-windowing-function> > or > an aggregation trigger > <https://beam.apache.org/documentation/programming-guide/#triggers> in > order to perform a GroupByKey operation. > > > > In comparison, KeyBy > <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/> > operator from flink does not have such a hard requirement for streamed data. > > > > In our use case, we do need to query all historical streamed data and > group by keys. KeyBy from flink satisfies our need, but Beam GoupByKey > does not satisfy this need. I thought about applying a sliding window with > a very large size (say 1 year), thus we can query the past 1 year’s data. > But not sure if this is feasible or a good practice. > > > > So what would the Beam solution be to implement this business logic? Is > there a support from beam to process a relative long history of a unbounded > PCollection? > > > > Thanks so much! > > >