Hi Robert,

Thanks for information. payloads so far are 400KB (each record).
To achieve high parallelism at the downstream operator do I rebalance the
kafka stream ? Could you give me an example please.

Regards,
Vijay


On Fri, Aug 14, 2020 at 12:50 PM Robert Metzger <rmetz...@apache.org> wrote:

> Hi,
>
> Also, can we increase parallel processing, beyond the number of
>> kafka partitions that we have, without causing any overhead ?
>
>
> Yes, the Kafka sources produce a tiny bit of overhead, but the potential
> benefit of having downstream operators at a high parallelism might be much
> bigger.
>
> How large is a large payload in your case?
>
> Best practices:
> Try to understand what's causing the performance slowdown: Kafka or S3 ?
> You can do a test where you read from kafka, and write it into a
> discarding sink.
> Likewise, use a datagenerator source, and write into S3.
>
> Do the math on your job: What's the theoretical limits of your job:
> https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines
>
> Hope this helps,
> Robert
>
>
> On Thu, Aug 13, 2020 at 11:25 PM Vijayendra Yadav <contact....@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> I am trying to increase throughput of my flink stream job streaming from
>> kafka source and sink to s3. Currently it is running fine for small events
>> records. But records with large payloads are running extremely slow like at
>> rate 2 TPS.
>>
>> Could you provide some best practices to tune?
>> Also, can we increase parallel processing, beyond the number of
>> kafka partitions that we have, without causing any overhead ?
>>
>> Regards,
>> Vijay
>>
>

Reply via email to