for example, this work for RDD object:
scala> val li = List(3,2,1,4,0)
li: List[Int] = List(3, 2, 1, 4, 0)
scala> val rdd = sc.parallelize(li)
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at
parallelize at <console>:24
scala> rdd.filter(_ > 2).collect()
res0: Array[Int] = Array(3, 4)
After I convert RDD to the dataframe, the filter won't work:
scala> val df = rdd.toDF
df: org.apache.spark.sql.DataFrame = [value: int]
scala> df.filter(_ > 2).show()
<console>:24: error: value > is not a member of org.apache.spark.sql.Row
df.filter(_ > 2).show()
But this can work:
scala> df.filter($"value" > 2).show()
+-----+
|value|
+-----+
| 3|
| 4|
+-----+
Where to check all the methods supported by dataframe?
Thank you.
Frakass
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org