Re: Spark SQL(1.3.0) "import sqlContext.implicits._" seems not work for converting a case class RDD to DataFrame

Reynold Xin Tue, 24 Mar 2015 21:22:04 -0700

In particular:

http://spark.apache.org/docs/latest/sql-programming-guide.html



"Additionally, the implicit conversions now only augment RDDs that are
composed of Products (i.e., case classes or tuples) with a method toDF,
instead of applying automatically."



On Tue, Mar 24, 2015 at 9:07 PM, Ted Yu <[email protected]> wrote:

> Please take a look at:
> ./sql/core/src/main/scala/org/apache/spark/sql/DataFrameHolder.scala
> ./sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala
>
> Cheers
>
> On Tue, Mar 24, 2015 at 8:46 PM, Zhiwei Chan <[email protected]>
> wrote:
>
> > Hi all,
> >
> >   I just upgraded spark from 1.2.1 to 1.3.0, and changed the "import
> > sqlContext.createSchemaRDD" to "import sqlContext.implicits._" in my
> code.
> > (I scan the programming guide and it seems this is the only change I need
> > to do). But it come to an error when run compile as following:
> > >>>
> > [ERROR] ...\magic.scala:527: error: value registerTempTable is not a
> member
> > of org.apache.spark.rdd.RDD[com.yhd.ycache.magic.Table]
> > [INFO]     tableRdd.registerTempTable(tableName)
> > <<<
> >
> > Then I try the exactly example in the programming guide of 1.3  in
> > spark-shell, it come to the same error.
> > >>>
> > scala> sys.env.get("CLASSPATH")
> > res7: Option[String] =
> >
> >
> Some(:/root/scala/spark-1.3.0-bin-hadoop2.4/conf:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar)
> >
> > scala>  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> > sqlContext: org.apache.spark.sql.SQLContext =
> > org.apache.spark.sql.SQLContext@4b05b3ff
> >
> > scala>  import sqlContext.implicits._
> > import sqlContext.implicits._
> >
> > scala>  case class Person(name: String, age: Int)
> > defined class Person
> >
> > scala>   val t1 =
> > sc.textFile("hdfs://heju:8020/user/root/magic/poolInfo.txt")
> > 15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(81443) called with
> > curMem=186397, maxMem=278302556
> > 15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3 stored as values in
> > memory (estimated size 79.5 KB, free 265.2 MB)
> > 15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(31262) called with
> > curMem=267840, maxMem=278302556
> > 15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3_piece0 stored as
> > bytes in memory (estimated size 30.5 KB, free 265.1 MB)
> > 15/03/25 11:13:35 INFO BlockManagerInfo: Added broadcast_3_piece0 in
> memory
> > on heju:48885 (size: 30.5 KB, free: 265.4 MB)
> > 15/03/25 11:13:35 INFO BlockManagerMaster: Updated info of block
> > broadcast_3_piece0
> > 15/03/25 11:13:35 INFO SparkContext: Created broadcast 3 from textFile at
> > <console>:34
> > t1: org.apache.spark.rdd.RDD[String] =
> > hdfs://heju:8020/user/root/magic/poolInfo.txt MapPartitionsRDD[9] at
> > textFile at <console>:34
> >
> > scala>  val t2 = t1.flatMap(_.split("\n")).map(_.split(" ")).map(p =>
> > Person(p(0),1))
> > t2: org.apache.spark.rdd.RDD[Person] = MapPartitionsRDD[12] at map at
> > <console>:38
> >
> > scala>  t2.registerTempTable("people")
> > <console>:41: error: value registerTempTable is not a member of
> > org.apache.spark.rdd.RDD[Person]
> >                t2.registerTempTable("people")
> >                   ^
> > <<<
> >
> > I found the following explanation in programming guide about implicit
> > convert case class to DataFrams, but I don't understand what I should do.
> > Could any one tell me how should I do if I want to convert a case class
> RDD
> > to DataFrame?
> >
> > >>>
> > Isolation of Implicit Conversions and Removal of dsl Package (Scala-only)
> >
> > Many of the code examples prior to Spark 1.3 started with import
> > sqlContext._, which brought all of the functions from sqlContext into
> > scope. In Spark 1.3 we have isolated the implicit conversions for
> > converting RDDs into DataFrames into an object inside of the SQLContext.
> > Users should now write import sqlContext.implicits._.
> >
> > Additionally, the implicit conversions now only augment RDDs that are
> > composed of Products (i.e., case classes or tuples) with a method toDF,
> > instead of applying automatically.
> >
> > <<<
> > Thanks
> > Jason
> >
>

Re: Spark SQL(1.3.0) "import sqlContext.implicits._" seems not work for converting a case class RDD to DataFrame

Reply via email to