Hi Md Aamir, Apologies for the delayed response due to the festive season. Thank you for providing the additional details.
The information shared is helpful but still sparse to pinpoint the root cause of the backpressure issue. To proceed further, it would be great if you could share the logs from the job (ensuring that no sensitive information is included). Logs can often reveal valuable insights into bottlenecks or errors occurring during execution.Additionally, it would be helpful to know more about the type of state maintained in the ProcessFunction. Is it keyed state, or are you using other forms of state management? Also, is the ProcessFunction parallelizable, or does it involve operations that could inherently limit parallelism? Here are some general guidelines for debugging ProcessFunction in such scenarios: 1. Use the Flink Web UI to inspect the ProcessFunction operator's subtask performance. Check for uneven load distribution across subtasks, which can indicate issues like data skew. 2. If your function uses state, verify that the state usage is optimized. Large or inefficient state can increase checkpointing time and memory pressure. 3. Track the latency and throughput of the process operator. Sudden spikes or sustained high latency often correlate with bottlenecks. 4. Review the logic inside the ProcessFunction. Look for computationally expensive operations, excessive object creation, or blocking calls that could limit processing speed. 5. Consider increasing fetch.size and batch.size in your Kafka consumer, and tweak the producer’s batch configurations like linger.ms or compression.type. These can reduce the frequency of network I/O and improve throughput. 6. Re-evaluate the parallelism of the process operator. It may need to be adjusted independently of the source and sink. If possible, also include metrics or screenshots from the Flink Web UI showing operator backpressure or task-level statistics. Bests, Samrat On Tue, Dec 24, 2024 at 11:13 AM Mohammad Aamir Iqubal <aamir.y...@gmail.com> wrote: > Hi Samrat, > > PFB details: > > 1)1.17 version 2)We are just modifying the json field so the modification > in existing json field we are managing if we talk about state3)Compression > Type snappy used, default fetch and batch size are used 4)process operator > is the bottleneck hence source is getting back pressured 5) yes 6)NO 7)Yes > > Please let me know if you need any more details. > > Regards, > Md Aamir Iqubal > +918409561989 > > On Thu, 19 Dec 2024 at 11:56 AM, Samrat Deb <decordea...@gmail.com> wrote: > > > Hi Md Aamir, > > > > > > Thank you for providing the details of your streaming application setup. > > > > To assist you better in identifying and resolving the backpressure > issue, I > > have a few follow-up questions: > > > > > > > > 1. Which version of Apache Flink are you using? > > 2. Is the ProcessFunction stateful? If yes, what kind of state is > > managed? > > 3. What are the configurations for the Kafka producer and consumer > > (e.g., fetch.size, batch.size, linger.ms, acks, compression.type > etc)? > > 4. Have you examined the backpressure monitoring graphs in the Flink > Web > > UI? Which operator (source, process, or sink) is experiencing the > > bottleneck? > > 5. Have you tried explicitly increasing the parallelism of the sink > > operator to match or exceed the source parallelism? > > 6. Although I’m not deeply familiar with Kerberos integration, are > there > > any errors or warnings in the logs related to Kerberos authentication? > > 7. Have you verified if Kerberos ticket renewal contributes to any > > performance overhead for the sink operator? > > > > > > Bests, > > Samrat > > > > On Wed, Dec 18, 2024 at 3:25 PM Mohammad Aamir Iqubal < > > aamir.y...@gmail.com> > > wrote: > > > > > Hi Team, > > > > > > I am running a streaming application in performance environment. The > > source > > > is Kafka and sink is also Kafka but the sink Kafka is secured by > > kerberos. > > > Message size is 102kb and source parallelism is 16 > > > process parallelism is 80 and sink parallelism is 12. > > > I am using process function to remove few fields from json and mask > some > > > fields. When we send the data with around 100 TPS the application goes > > into > > > backpressure. Please let me know how to fix the issue. > > > > > > > > > Regards, > > > Md Aamir Iqubal > > > +918409561989 > > > > > >