for example, this work for RDD object:

scala> val li = List(3,2,1,4,0)
li: List[Int] = List(3, 2, 1, 4, 0)

scala> val rdd = sc.parallelize(li)
rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24

scala> rdd.filter(_ > 2).collect()
res0: Array[Int] = Array(3, 4)


After I convert RDD to the dataframe, the filter won't work:

scala> val df = rdd.toDF
df: org.apache.spark.sql.DataFrame = [value: int]

scala> df.filter(_ > 2).show()
<console>:24: error: value > is not a member of org.apache.spark.sql.Row
       df.filter(_ > 2).show()


But this can work:

scala> df.filter($"value" > 2).show()
+-----+
|value|
+-----+
|    3|
|    4|
+-----+


Where to check all the methods supported by dataframe?


Thank you.
Frakass


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to