approx_count_distinct in spark always return 1

2022-06-02 Thread marc nicole
I have a dataset where i want to count distinct values for column based a group of others, i do it like so, processedDataset = processedDataset.withColumn("freq", approx_count_distinct("col1").over(Window.partitionBy(groupCols.toArray(new Column[groupCols.size()]; but even when i have duplic

Does adaptive auto broadcast respect spark.sql.autoBroadcastJoinThreshold

2022-06-02 Thread Henry Quan
I noticed that my spark application is broadcasting even though I set spark.sql.autoBroadcastJoinThreshold = -1. When I checked the query plan, I noticed that the physical plan was an AdaptiveSparkPlan. When I checked the adaptive settings, I noticed that there was a separate setting, spark.sql.ada