Hi, Also, can we increase parallel processing, beyond the number of > kafka partitions that we have, without causing any overhead ?
Yes, the Kafka sources produce a tiny bit of overhead, but the potential benefit of having downstream operators at a high parallelism might be much bigger. How large is a large payload in your case? Best practices: Try to understand what's causing the performance slowdown: Kafka or S3 ? You can do a test where you read from kafka, and write it into a discarding sink. Likewise, use a datagenerator source, and write into S3. Do the math on your job: What's the theoretical limits of your job: https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines Hope this helps, Robert On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav <contact....@gmail.com> wrote: > Hi Team, > > I am trying to increase throughput of my flink stream job streaming from > kafka source and sink to s3. Currently it is running fine for small events > records. But records with large payloads are running extremely slow like at > rate 2 TPS. > > Could you provide some best practices to tune? > Also, can we increase parallel processing, beyond the number of > kafka partitions that we have, without causing any overhead ? > > Regards, > Vijay >