Hi Mich,
I was just reading random questions on the user list when I noticed that you
said:
On 25 Apr 2024, at 2:12 AM, Mich Talebzadeh wrote:
1) You are using monotonically_increasing_id(), which is not
collision-resistant in distributed environments like Spark. Multiple hosts
can generat
unsubscribe
unsubscribe
spark.sql.shuffle.partitions=auto
Because Apache Spark does not build clusters. This configuration option is
specific to Databricks, with their managed Spark offering. It allows
Databricks to automatically determine an optimal number of shuffle
partitions for your workload.
HTH
Mich Talebzadeh,
Hi,
In k8s the driver is responsible for executor creation. The likelihood of
your problem is that Insufficient memory allocated for executors in the K8s
cluster. Even with dynamic allocation, k8s won't schedule executor pods if
there is not enough free memory to fulfill their resource requests.
May i know is spark.sql.shuffle.partitions=auto only available on Databricks?
what about on vanilla Spark ? When i set this, it gives error need to put int.
Any open source library that auto find the best partition , block size for
dataframe?