Hi everyone,
I was playing with the integration of Hive UDAFs in Spark-SQL and noticed that
the terminatePartial and merge methods of custom UDAFs were not called. This
made me curious as those two methods are the ones responsible for distributing
the UDAF execution in Hive.
Looking at the code
Dear List,
We have run into serious problems trying to run a larger than average number of
aggregations in a GROUP BY query. Symptoms of this problem are OutOfMemory
exceptions and unreasonably long processing times due to GC. The problem occurs
when the following two conditions are met:
- The
Hey everyone,
Consider the following use of spark.sql.shuffle.partitions:
case class Data(A:String = f"${(math.random*1e8).toLong}%09.0f", B: String
= f"${(math.random*1e8).toLong}%09.0f")
val dataFrame = (1 to 1000).map(_ => Data()).toDF
dataFrame.registerTempTable("data")
sqlContext.setConf( "