Thanks for you explanation. I'll give it a try. :) Sachin Mittal <sjmit...@gmail.com> 于2024年5月15日周三 10:39写道:
> Each separate job would have its own consumer group hence they will read > independently from the same topic and when checkpointing they will commit > their own offsets. > So if any job fails, it will not affect the progress of other jobs when > reading from Kafka. > > I am not sure of the impact of network load when multiple consumer groups > are requesting data from the same topic. > > Multiple small jobs ensure that each job is scaled and monitored in an > isolated way. > > Having an efficient serde can help a lot of the data we store in state, > data forwarded to next steps and overall state management. > > Another thing you can look into is if your job step is keyed by some key, > then make sure they are keyed as a string or any other Java primitive types > since Object keys are much slower when reading from and writing to a state > store. > > Thanks > Sachin > > > On Wed, May 15, 2024 at 7:58 AM longfeng Xu <xulongfeng2...@gmail.com> > wrote: > >> Thank you . we will try . >> >> I‘m still confused about multiple jobs on a cluster (flink-session-yarn) >> reading the same topic from kafka cluster, I understand that in this mode, >> the number of times reading the topic has not decreased; it just shares the >> TCP channel of the task manager, reducing the network load. Is my >> understanding correct? >> >> Or are there any other advantages to it? Please advise. Thank you. >> >> Sachin Mittal <sjmit...@gmail.com> 于2024年5月15日周三 09:24写道: >> >>> We have the same scenario. >>> We thought of having one big job with multiple branches but this leads >>> to single point of failure as any issue with any branch would lead to the >>> job failure and also all the sub branches would stop processing. >>> >>> Hence running multiple jobs on a cluster say yarn is better. >>> >>> Now to overcome serde issue try to use some of the more efficient >>> schemes as recommended by flink. We are using POJO and it has yielded good >>> results for us. >>> >>> >>> On Wed, 15 May 2024 at 5:59 AM, longfeng Xu <xulongfeng2...@gmail.com> >>> wrote: >>> >>>> hi >>>> there are many flink jobs read one kafka topic in this scenario, >>>> therefore CPU resources waste in serialization/deserialization and >>>> network load is too heavy . Can you recommend a solution to avoid this >>>> situation? e.g it can be more effectively using one large stream job with >>>> multi branchs ? >>>> >>>> Best regards, >>>> >>>>