I have a dataset where i want to count distinct values for column based a
group of others, i do it like so,
processedDataset = processedDataset.withColumn("freq",
approx_count_distinct("col1").over(Window.partitionBy(groupCols.toArray(new
Column[groupCols.size()];
but even when i have duplic
I noticed that my spark application is broadcasting even though I
set spark.sql.autoBroadcastJoinThreshold = -1. When I checked the query
plan, I noticed that the physical plan was an AdaptiveSparkPlan. When I
checked the adaptive settings, I noticed that there was a separate setting,
spark.sql.ada