Is there a reduceByKey functionality in DataFrame API?

2016-08-10 Thread luismattor
Hi everyone, Consider the following code: val result = df.groupBy("col1").agg(min("col2")) I know that rdd.reduceByKey(func) produces the same RDD as rdd.groupByKey().mapValues(value => value.reduce(func)) However reducerByKey is more efficient as it avoids shipping each value to the reducer doi

How to set nullable field when create DataFrame using case class

2016-08-04 Thread luismattor
Hi all, Consider the following case: import java.sql.Timestamp case class MyProduct(t: Timestamp, a: Float) val rdd = sc.parallelize(List(MyProduct(new Timestamp(0), 10))).toDF() rdd.printSchema() The output is: root |-- t: timestamp (nullable = true) |-- a: float (nullable = false) How can I