Update:
As the Pipeline progresses, the delay between the time when the batch is
completed and the time when onBatchCompleted() event is getting triggered
for the batch is increasing i.e. we are *not* observing a constant delay of
3 to 5 seconds for every batch.
Trimmed Logs which show the issue
The last offset is stored in file system you specified , how does it
expire? I don't understand. I haven't met that condition.
Srinivas V 于2020年3月19日周四 下午10:18写道:
> 1. How would a prod admin user/other engineers understand which process
> is this random groupid which is consuming a specific
https://youtu.be/iarn1KHeouc
You'll find step by step setup guide for linux and windows here. You need
to do extra steps for windows. Spark is more user friendly towards *nix.
You are better off downloading ubuntu 19. Hopefully ubuntu has the
drivers for you PC. If ubuntu do not have the driv
1. How would a prod admin user/other engineers understand which process is
this random groupid which is consuming a specific topic? why is it
designed this way?
2. I don't see the groupid changing all the time. It is repeating on
restarts. Not able to understand when and how it changes. I know it
Hi,
*Context*:
We are using the Spark Streaming Library.
We have created a StreamingListener to implement some logic when
onBatchCompleted() event is triggered. This StreamingListener is registered
with the StreamingContext.
We are using Spark on Kubernetes. The Spark version is 2.4.2. batchDura