Hi Senthil, since your records are so big, I recommend to take the time to evaluate some different serializers [1].
[1] https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html On Wed, May 13, 2020 at 5:40 PM Senthil Kumar <senthi...@vmware.com> wrote: > Zhijiang, > > > > Thanks for your suggestions. We will keep it in mind! > > > > Kumar > > > > *From: *Zhijiang <wangzhijiang...@aliyun.com> > *Reply-To: *Zhijiang <wangzhijiang...@aliyun.com> > *Date: *Tuesday, May 12, 2020 at 10:10 PM > *To: *Senthil Kumar <senthi...@vmware.com>, "user@flink.apache.org" < > user@flink.apache.org> > *Subject: *Re: Flink Streaming Job Tuning help > > > > Hi Kumar, > > > > I can give some general ideas for further analysis. > > > > > We are finding that flink lags seriously behind when we introduce the > keyBy (presumably because of shuffle across the network) > > The `keyBy` would break the chained operators, so it might bring obvious > performance sensitive in practice. I guess if your previous way without > keyBy can make use of chained mechanism, > > the follow-up operator can consume the emitted records from the > preceding operator directly, no need to involve in buffer serialization-> > network shuffle -> buffer deserializer processes, > > especially your record size 10K is a bit large. > > > > If the keyBy is necessary in your case, then you can further check the > current bottleneck. E.g. whether there are back pressure which you can > monitor from web UI. If so, which task is the > > bottleneck to cause the back pressure, and you can trace it by network > related metrics. > > > > Whether there are data skew in your case, that means some task would > process more records than others. If so, maybe we can increase the > parallelism to balance the load. > > > > Best, > > Zhijiang > > ------------------------------------------------------------------ > > From:Senthil Kumar <senthi...@vmware.com> > > Send Time:2020年5月13日(星期三) 00:49 > > To:user@flink.apache.org <user@flink.apache.org> > > Subject:Re: Flink Streaming Job Tuning help > > > > I forgot to mention, we are consuming said records from AWS kinesis and > writing out to S3. > > > > *From: *Senthil Kumar <senthi...@vmware.com> > *Date: *Tuesday, May 12, 2020 at 10:47 AM > *To: *"user@flink.apache.org" <user@flink.apache.org> > *Subject: *Flink Streaming Job Tuning help > > > > Hello Flink Community! > > > > We have a fairly intensive flink streaming application, processing 8-9 > million records a minute, with each record being 10k. > > One of our steps is a keyBy operation. We are finding that flink lags > seriously behind when we introduce the keyBy (presumably because of shuffle > across the network). > > > > We are trying to tune it ourselves (size of nodes, memory, network buffers > etc), but before we spend way too much time on > > this; would it be better to hire some “flink tuning expert” to get us > through? > > > > If so what resources are recommended on this list? > > > > Cheers > > Kumar > > > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng