Re: Question regarding GoupByKey operator on unbounded data

Reuven Lax Thu, 10 Dec 2020 15:07:43 -0800

Can you explain more about what exactly you are trying to do?

On Thu, Dec 10, 2020 at 2:51 PM Tao Li <t...@zillow.com> wrote:


> Hi Beam community,
>
>
>
> I got a quick question about GoupByKey operator. According to this doc
> <https://beam.apache.org/documentation/programming-guide/#groupbykey>,  if
> we are using unbounded PCollection, it’s required to specify either non-global
> windowing
> <https://beam.apache.org/documentation/programming-guide/#setting-your-pcollections-windowing-function>
>  or
> an aggregation trigger
> <https://beam.apache.org/documentation/programming-guide/#triggers> in
> order to perform a GroupByKey operation.
>
>
>
> In comparison, KeyBy
> <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/>
> operator from flink does not have such a hard requirement for streamed data.
>
>
>
> In our use case, we do need to query all historical streamed data and
> group by keys. KeyBy from flink satisfies our need, but Beam GoupByKey
> does not satisfy this need. I thought about applying a sliding window with
> a very large size (say 1 year), thus we can query the past 1 year’s data.
> But not sure if this is feasible or a good practice.
>
>
>
> So what would the Beam solution be to implement this business logic? Is
> there a support from beam to process a relative long history of a unbounded
> PCollection?
>
>
>
> Thanks so much!
>
>
>

Re: Question regarding GoupByKey operator on unbounded data

Reply via email to