Any advantages of using sql.adaptive.autoBroadcastJoinThreshold over sql.autoBroadcastJoinThreshold?

2023-01-22 Thread Soumyadeep Mukhopadhyay
Hello! In my use case we are using PySpark 3.1 and there are a few pyspark scripts that are running better with higher driver memory. As far as I know the default value of "spark.sql.autoBroadcastJoinThreshold" is 10MB, there are a few cases where the default driver configuration was throwing OOM

Re: Any advantages of using sql.adaptive.autoBroadcastJoinThreshold over sql.autoBroadcastJoinThreshold?

2023-01-22 Thread Balakrishnan Ayyappan
Hi Soumyadeep, Both the configs are more or less the same. However, sql.adaptive.auto* config is applicable (starting from version 3.2.0) only in adaptive framework As per the doc, default value for " spark.sql.adaptive.autoBroadcastJoinThreshold" is same with " spark.sql.autoBroadcastJoinThresh

Duplicates in Collaborative Filtering Output

2023-01-22 Thread Kartik Ohri
Hi! We are using Spark mllib (on Spark 3.2.0) ALS Model for an implicit feedback based collaborative filtering recommendation job. While looking at the output of recommendForUserSubset