Hi Robert, Thanks for information. payloads so far are 400KB (each record). To achieve high parallelism at the downstream operator do I rebalance the kafka stream ? Could you give me an example please.
Regards, Vijay On Fri, Aug 14, 2020 at 12:50 PM Robert Metzger <rmetz...@apache.org> wrote: > Hi, > > Also, can we increase parallel processing, beyond the number of >> kafka partitions that we have, without causing any overhead ? > > > Yes, the Kafka sources produce a tiny bit of overhead, but the potential > benefit of having downstream operators at a high parallelism might be much > bigger. > > How large is a large payload in your case? > > Best practices: > Try to understand what's causing the performance slowdown: Kafka or S3 ? > You can do a test where you read from kafka, and write it into a > discarding sink. > Likewise, use a datagenerator source, and write into S3. > > Do the math on your job: What's the theoretical limits of your job: > https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines > > Hope this helps, > Robert > > > On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav <contact....@gmail.com> > wrote: > >> Hi Team, >> >> I am trying to increase throughput of my flink stream job streaming from >> kafka source and sink to s3. Currently it is running fine for small events >> records. But records with large payloads are running extremely slow like at >> rate 2 TPS. >> >> Could you provide some best practices to tune? >> Also, can we increase parallel processing, beyond the number of >> kafka partitions that we have, without causing any overhead ? >> >> Regards, >> Vijay >> >