This has been brought up a few times. I will focus on Spark Structured
Streaming
Autoscaling does not support Spark Structured Streaming (SSS). Why because
streaming jobs are typically long-running jobs that need to maintain state
across micro-batches. Autoscaling is designed to scale up and down
Hello Experts
Is there any true auto scaling option for spark? The dynamic auto scaling
works only for batch. Any guidelines on spark streaming autoscaling and
how that will be tied to any cluster level autoscaling solutions?
Thanks
Hi,
Since you mentioned that there could be duplicate records with the same
unique key in the Delta table, you will need a way to handle these
duplicate records. One approach I can suggest is to use a timestamp to
determine the latest or most relevant record among duplicates, the
so-called op_tim