Hi,

Thanks for the reply. So I have 2 cases:

1. timeWindowAll (length, slide).reduce (...) (with parallelism = 1)
2. groupby(someField).timeWindow(length, slide). reduce(...)

Lets say case-1 global window, case-2 partitioned window. If I have only
one key (for case-2) and I set parallelism=1  for case-1, I would expect
that both cases have similar performance both in terms of latency and
throughput. However, partitioned windows outperform global ones by orders
of magnitude in terms of throughput.
I am using Flink 1.1.3.


Thanks,
Adrienne




On Mon, May 8, 2017 at 3:55 PM, Stefan Richter <s.rich...@data-artisans.com>
wrote:

> Hi,
>
> to answer this question, we would first need to know what you mean by
> „global windows“: using „windowAll()“ or „GlobalWindows“? Also, the answer
> might depend on the Flink version that you are using.
>
> Best,
> Stefan
>
> > Am 07.05.2017 um 23:23 schrieb Adrienne Kole <adrienneko...@gmail.com>:
> >
> > Hi,
> >
> > I am doing simple aggregation with a keyed and global windows in flink.
> > When I compare the keyed window aggregation with 1 key and global window
> (which has parallelism 1) I would expect that both of them would have
> similar performance.
> >
> > However, keyed stream with 1 key performs with 2x more throughput than
> global window.
> > My configuration is 8 node cluster, 16 core in each node, parallelism =
> 128.
> >
> > AFAIK, Flink doesn't manage skew by default and uses hash function to
> assign keys to partitions. So if I have 1 key only, it should go to only
> one partition always, which is semantically similar to global windows in
> flink.
> >
> > What can be the reason behind this difference in performance?
> >
> > Thanks,
> > Adrienne
>
>

Reply via email to