Re: how to reduce read times when many jobs read the same kafka topic?

longfeng Xu Tue, 14 May 2024 20:23:22 -0700

Thanks for you explanation. I'll give it a try. :)

Sachin Mittal <sjmit...@gmail.com> 于2024年5月15日周三 10:39写道：


> Each separate job would have its own consumer group hence they will read
> independently from the same topic and when checkpointing they will commit
> their own offsets.
> So if any job fails, it will not affect the progress of other jobs when
> reading from Kafka.
>
> I am not sure of the impact of network load when multiple consumer groups
> are requesting data from the same topic.
>
> Multiple small jobs ensure that each job is scaled and monitored in an
> isolated way.
>
> Having an efficient serde can help a lot of the data we store in state,
> data forwarded to next steps and overall state management.
>
> Another thing you can look into is if your job step is keyed by some key,
> then make sure they are keyed as a string or any other Java primitive types
> since Object keys are much slower when reading from and writing to a state
> store.
>
> Thanks
> Sachin
>
>
> On Wed, May 15, 2024 at 7:58 AM longfeng Xu <xulongfeng2...@gmail.com>
> wrote:
>
>> Thank you . we will try .
>>
>> I‘m still confused about multiple jobs on a cluster (flink-session-yarn)
>> reading the same topic from kafka cluster, I understand that in this mode,
>> the number of times reading the topic has not decreased; it just shares the
>> TCP channel of the task manager, reducing the network load. Is my
>> understanding correct?
>>
>> Or are there any other advantages to it? Please advise. Thank you.
>>
>> Sachin Mittal <sjmit...@gmail.com> 于2024年5月15日周三 09:24写道：
>>
>>> We have the same scenario.
>>> We thought of having one big job with multiple branches but this leads
>>> to single point of failure as any issue with any branch would lead to the
>>> job failure and also all the sub branches would stop processing.
>>>
>>> Hence running multiple jobs on a cluster say yarn is better.
>>>
>>> Now to overcome serde issue try to use some of the more efficient
>>> schemes as recommended by flink. We are using POJO and it has yielded good
>>> results for us.
>>>
>>>
>>> On Wed, 15 May 2024 at 5:59 AM, longfeng Xu <xulongfeng2...@gmail.com>
>>> wrote:
>>>
>>>> hi
>>>>   there are many flink jobs read one kafka topic in this scenario,
>>>> therefore CPU resources waste in  serialization/deserialization and
>>>> network  load is too heavy . Can you recommend a solution to avoid this
>>>> situation? e.g it can be more effectively using one large stream job with
>>>> multi branchs ?
>>>>
>>>>  Best regards,
>>>>
>>>>

Re: how to reduce read times when many jobs read the same kafka topic?

Reply via email to