Hi,

Also, can we increase parallel processing, beyond the number of
> kafka partitions that we have, without causing any overhead ?


Yes, the Kafka sources produce a tiny bit of overhead, but the potential
benefit of having downstream operators at a high parallelism might be much
bigger.

How large is a large payload in your case?

Best practices:
Try to understand what's causing the performance slowdown: Kafka or S3 ?
You can do a test where you read from kafka, and write it into a discarding
sink.
Likewise, use a datagenerator source, and write into S3.

Do the math on your job: What's the theoretical limits of your job:
https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines

Hope this helps,
Robert


On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav <contact....@gmail.com>
wrote:

> Hi Team,
>
> I am trying to increase throughput of my flink stream job streaming from
> kafka source and sink to s3. Currently it is running fine for small events
> records. But records with large payloads are running extremely slow like at
> rate 2 TPS.
>
> Could you provide some best practices to tune?
> Also, can we increase parallel processing, beyond the number of
> kafka partitions that we have, without causing any overhead ?
>
> Regards,
> Vijay
>

Reply via email to