Hi everyone,
Consider the following code:
val result = df.groupBy("col1").agg(min("col2"))
I know that rdd.reduceByKey(func) produces the same RDD as
rdd.groupByKey().mapValues(value => value.reduce(func)) However reducerByKey
is more efficient as it avoids shipping each value to the reducer doi
Hi all,
Consider the following case:
import java.sql.Timestamp
case class MyProduct(t: Timestamp, a: Float)
val rdd = sc.parallelize(List(MyProduct(new Timestamp(0), 10))).toDF()
rdd.printSchema()
The output is:
root
|-- t: timestamp (nullable = true)
|-- a: float (nullable = false)
How can I