Hi Mohammad, please share the logs in text format .
On Mon, Dec 30, 2024 at 1:06 PM Mohammad Aamir Iqubal <aamir.y...@gmail.com> wrote: > Hi Samrat, > > I think the issue is the amount of data collected while check pointing > with 29 KB message the application is tuned properly. But with 102 KB we > are not able to tune. PFA screenshot from logs. > > Regards, > Md Aamir Iqubal > +918409561989 > > On Fri, 27 Dec 2024 at 9:01 AM, Samrat Deb <decordea...@gmail.com> wrote: > >> Hi Md Aamir, >> >> Apologies for the delayed response due to the festive season. Thank you >> for >> providing the additional details. >> >> The information shared is helpful but still sparse to pinpoint the root >> cause of the backpressure issue. To proceed further, it would be great if >> you could share the logs from the job (ensuring that no sensitive >> information is included). Logs can often reveal valuable insights into >> bottlenecks or errors occurring during execution.Additionally, it would be >> helpful to know more about the type of state maintained in the >> ProcessFunction. Is it keyed state, or are you using other forms of state >> management? Also, is the ProcessFunction parallelizable, or does it >> involve >> operations that could inherently limit parallelism? >> >> Here are some general guidelines for debugging ProcessFunction in such >> scenarios: >> >> 1. >> >> Use the Flink Web UI to inspect the ProcessFunction operator's subtask >> performance. Check for uneven load distribution across subtasks, which >> can >> indicate issues like data skew. >> 2. >> >> If your function uses state, verify that the state usage is optimized. >> Large or inefficient state can increase checkpointing time and memory >> pressure. >> 3. >> >> Track the latency and throughput of the process operator. Sudden spikes >> or sustained high latency often correlate with bottlenecks. >> 4. >> >> Review the logic inside the ProcessFunction. Look for computationally >> expensive operations, excessive object creation, or blocking calls that >> could limit processing speed. >> 5. >> >> Consider increasing fetch.size and batch.size in your Kafka consumer, >> and tweak the producer’s batch configurations like linger.ms or >> compression.type. These can reduce the frequency of network I/O and >> improve throughput. >> 6. >> >> Re-evaluate the parallelism of the process operator. It may need to be >> adjusted independently of the source and sink. >> >> If possible, also include metrics or screenshots from the Flink Web UI >> showing operator backpressure or task-level statistics. >> >> Bests, >> Samrat >> On Tue, Dec 24, 2024 at 11:13 AM Mohammad Aamir Iqubal < >> aamir.y...@gmail.com> >> wrote: >> >> > Hi Samrat, >> > >> > PFB details: >> > >> > 1)1.17 version 2)We are just modifying the json field so the >> modification >> > in existing json field we are managing if we talk about >> state3)Compression >> > Type snappy used, default fetch and batch size are used 4)process >> operator >> > is the bottleneck hence source is getting back pressured 5) yes 6)NO >> 7)Yes >> > >> > Please let me know if you need any more details. >> > >> > Regards, >> > Md Aamir Iqubal >> > +918409561989 >> > >> > On Thu, 19 Dec 2024 at 11:56 AM, Samrat Deb <decordea...@gmail.com> >> wrote: >> > >> > > Hi Md Aamir, >> > > >> > > >> > > Thank you for providing the details of your streaming application >> setup. >> > > >> > > To assist you better in identifying and resolving the backpressure >> > issue, I >> > > have a few follow-up questions: >> > > >> > > >> > > >> > > 1. Which version of Apache Flink are you using? >> > > 2. Is the ProcessFunction stateful? If yes, what kind of state is >> > > managed? >> > > 3. What are the configurations for the Kafka producer and consumer >> > > (e.g., fetch.size, batch.size, linger.ms, acks, compression.type >> > etc)? >> > > 4. Have you examined the backpressure monitoring graphs in the >> Flink >> > Web >> > > UI? Which operator (source, process, or sink) is experiencing the >> > > bottleneck? >> > > 5. Have you tried explicitly increasing the parallelism of the sink >> > > operator to match or exceed the source parallelism? >> > > 6. Although I’m not deeply familiar with Kerberos integration, are >> > there >> > > any errors or warnings in the logs related to Kerberos >> authentication? >> > > 7. Have you verified if Kerberos ticket renewal contributes to any >> > > performance overhead for the sink operator? >> > > >> > > >> > > Bests, >> > > Samrat >> > > >> > > On Wed, Dec 18, 2024 at 3:25 PM Mohammad Aamir Iqubal < >> > > aamir.y...@gmail.com> >> > > wrote: >> > > >> > > > Hi Team, >> > > > >> > > > I am running a streaming application in performance environment. The >> > > source >> > > > is Kafka and sink is also Kafka but the sink Kafka is secured by >> > > kerberos. >> > > > Message size is 102kb and source parallelism is 16 >> > > > process parallelism is 80 and sink parallelism is 12. >> > > > I am using process function to remove few fields from json and mask >> > some >> > > > fields. When we send the data with around 100 TPS the application >> goes >> > > into >> > > > backpressure. Please let me know how to fix the issue. >> > > > >> > > > >> > > > Regards, >> > > > Md Aamir Iqubal >> > > > +918409561989 >> > > > >> > > >> > >> >