Re: Spark SQL(1.3.0) "import sqlContext.implicits._" seems not work for converting a case class RDD to DataFrame

Ted Yu Tue, 24 Mar 2015 21:09:07 -0700

Please take a look at:
./sql/core/src/main/scala/org/apache/spark/sql/DataFrameHolder.scala
./sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala


Cheers

On Tue, Mar 24, 2015 at 8:46 PM, Zhiwei Chan <z.w.chan.ja...@gmail.com>
wrote:

> Hi all,
>
>   I just upgraded spark from 1.2.1 to 1.3.0, and changed the "import
> sqlContext.createSchemaRDD" to "import sqlContext.implicits._" in my code.
> (I scan the programming guide and it seems this is the only change I need
> to do). But it come to an error when run compile as following:
> >>>
> [ERROR] ...\magic.scala:527: error: value registerTempTable is not a member
> of org.apache.spark.rdd.RDD[com.yhd.ycache.magic.Table]
> [INFO]     tableRdd.registerTempTable(tableName)
> <<<
>
> Then I try the exactly example in the programming guide of 1.3  in
> spark-shell, it come to the same error.
> >>>
> scala> sys.env.get("CLASSPATH")
> res7: Option[String] =
>
> Some(:/root/scala/spark-1.3.0-bin-hadoop2.4/conf:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar)
>
> scala>  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> sqlContext: org.apache.spark.sql.SQLContext =
> org.apache.spark.sql.SQLContext@4b05b3ff
>
> scala>  import sqlContext.implicits._
> import sqlContext.implicits._
>
> scala>  case class Person(name: String, age: Int)
> defined class Person
>
> scala>   val t1 =
> sc.textFile("hdfs://heju:8020/user/root/magic/poolInfo.txt")
> 15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(81443) called with
> curMem=186397, maxMem=278302556
> 15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3 stored as values in
> memory (estimated size 79.5 KB, free 265.2 MB)
> 15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(31262) called with
> curMem=267840, maxMem=278302556
> 15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3_piece0 stored as
> bytes in memory (estimated size 30.5 KB, free 265.1 MB)
> 15/03/25 11:13:35 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory
> on heju:48885 (size: 30.5 KB, free: 265.4 MB)
> 15/03/25 11:13:35 INFO BlockManagerMaster: Updated info of block
> broadcast_3_piece0
> 15/03/25 11:13:35 INFO SparkContext: Created broadcast 3 from textFile at
> <console>:34
> t1: org.apache.spark.rdd.RDD[String] =
> hdfs://heju:8020/user/root/magic/poolInfo.txt MapPartitionsRDD[9] at
> textFile at <console>:34
>
> scala>  val t2 = t1.flatMap(_.split("\n")).map(_.split(" ")).map(p =>
> Person(p(0),1))
> t2: org.apache.spark.rdd.RDD[Person] = MapPartitionsRDD[12] at map at
> <console>:38
>
> scala>  t2.registerTempTable("people")
> <console>:41: error: value registerTempTable is not a member of
> org.apache.spark.rdd.RDD[Person]
>                t2.registerTempTable("people")
>                   ^
> <<<
>
> I found the following explanation in programming guide about implicit
> convert case class to DataFrams, but I don't understand what I should do.
> Could any one tell me how should I do if I want to convert a case class RDD
> to DataFrame?
>
> >>>
> Isolation of Implicit Conversions and Removal of dsl Package (Scala-only)
>
> Many of the code examples prior to Spark 1.3 started with import
> sqlContext._, which brought all of the functions from sqlContext into
> scope. In Spark 1.3 we have isolated the implicit conversions for
> converting RDDs into DataFrames into an object inside of the SQLContext.
> Users should now write import sqlContext.implicits._.
>
> Additionally, the implicit conversions now only augment RDDs that are
> composed of Products (i.e., case classes or tuples) with a method toDF,
> instead of applying automatically.
>
> <<<
> Thanks
> Jason
>

Re: Spark SQL(1.3.0) "import sqlContext.implicits._" seems not work for converting a case class RDD to DataFrame

Reply via email to