Re: Latency between Batch Completion and triggering of onBatchCompleted() event

2020-03-19 Thread rahul patwari
Update: As the Pipeline progresses, the delay between the time when the batch is completed and the time when onBatchCompleted() event is getting triggered for the batch is increasing i.e. we are *not* observing a constant delay of 3 to 5 seconds for every batch. Trimmed Logs which show the issue

Re: structured streaming Kafka consumer group.id override

2020-03-19 Thread lec ssmi
The last offset is stored in file system you specified , how does it expire? I don't understand. I haven't met that condition. Srinivas V 于2020年3月19日周四 下午10:18写道: > 1. How would a prod admin user/other engineers understand which process > is this random groupid which is consuming a specific

Re: Spark 3 Build Problem!!!

2020-03-19 Thread Zahid Rahman
https://youtu.be/iarn1KHeouc You'll find step by step setup guide for linux and windows here. You need to do extra steps for windows. Spark is more user friendly towards *nix. You are better off downloading ubuntu 19. Hopefully ubuntu has the drivers for you PC. If ubuntu do not have the driv

Re: structured streaming Kafka consumer group.id override

2020-03-19 Thread Srinivas V
1. How would a prod admin user/other engineers understand which process is this random groupid which is consuming a specific topic? why is it designed this way? 2. I don't see the groupid changing all the time. It is repeating on restarts. Not able to understand when and how it changes. I know it

Latency between Batch Completion and triggering of onBatchCompleted() event

2020-03-19 Thread rahul patwari
Hi, *Context*: We are using the Spark Streaming Library. We have created a StreamingListener to implement some logic when onBatchCompleted() event is triggered. This StreamingListener is registered with the StreamingContext. We are using Spark on Kubernetes. The Spark version is 2.4.2. batchDura