you should import either spark.implicits or sqlContext.implicits, not both.
Otherwise the compiler will be confused about two implicit transformations
following code works for me, spark version 2.1.0
object Test {
def main(args: Array[String]) {
val spark = SparkSession
.builder
.master("local")
.appName(getClass.getSimpleName)
.getOrCreate()
import spark.implicits._
val df = Seq(TeamUser("t1", "u1", "r1")).toDF()
df.printSchema()
spark.close()
}
}
case class TeamUser(teamId: String, userId: String, role: String)
On Fri, Mar 24, 2017 at 5:23 AM, shyla deshpande <[email protected]>
wrote:
> I made the code even more simpler still getting the error
>
> error: value toDF is not a member of Seq[com.whil.batch.Teamuser]
> [ERROR] val df = Seq(Teamuser("t1","u1","r1")).toDF()
>
> object Test {
> def main(args: Array[String]) {
> val spark = SparkSession
> .builder
> .appName(getClass.getSimpleName)
> .getOrCreate()
> import spark.implicits._
> val sqlContext = spark.sqlContext
> import sqlContext.implicits._
> val df = Seq(Teamuser("t1","u1","r1")).toDF()
> df.printSchema()
> }
> }
> case class Teamuser(teamid:String, userid:String, role:String)
>
>
>
>
> On Thu, Mar 23, 2017 at 1:07 PM, Yong Zhang <[email protected]> wrote:
>
>> Not sure I understand this problem, why I cannot reproduce it?
>>
>>
>> scala> spark.version
>> res22: String = 2.1.0
>>
>> scala> case class Teamuser(teamid: String, userid: String, role: String)
>> defined class Teamuser
>>
>> scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF
>> df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1
>> more field]
>>
>> scala> df.show
>> +------+------+-----+
>> |teamid|userid| role|
>> +------+------+-----+
>> | t1| u1|role1|
>> +------+------+-----+
>>
>> scala> df.createOrReplaceTempView("teamuser")
>>
>> scala> val newDF = spark.sql("select teamid, userid, role from teamuser")
>> newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ...
>> 1 more field]
>>
>> scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser]
>> userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid:
>> string ... 1 more field]
>>
>> scala> userDS.show
>> +------+------+-----+
>> |teamid|userid| role|
>> +------+------+-----+
>> | t1| u1|role1|
>> +------+------+-----+
>>
>>
>> scala> userDS.printSchema
>> root
>> |-- teamid: string (nullable = true)
>> |-- userid: string (nullable = true)
>> |-- role: string (nullable = true)
>>
>>
>> Am I missing anything?
>>
>>
>> Yong
>>
>>
>> ------------------------------
>> *From:* shyla deshpande <[email protected]>
>> *Sent:* Thursday, March 23, 2017 3:49 PM
>> *To:* user
>> *Subject:* Re: Converting dataframe to dataset question
>>
>> I realized, my case class was inside the object. It should be defined
>> outside the scope of the object. Thanks
>>
>> On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande <
>> [email protected]> wrote:
>>
>>> Why userDS is Dataset[Any], instead of Dataset[Teamuser]? Appreciate your
>>> help. Thanks
>>>
>>> val spark = SparkSession
>>> .builder
>>> .config("spark.cassandra.connection.host", cassandrahost)
>>> .appName(getClass.getSimpleName)
>>> .getOrCreate()
>>>
>>> import spark.implicits._
>>> val sqlContext = spark.sqlContext
>>> import sqlContext.implicits._
>>>
>>> case class Teamuser(teamid:String, userid:String, role:String)
>>> spark
>>> .read
>>> .format("org.apache.spark.sql.cassandra")
>>> .options(Map("keyspace" -> "test", "table" -> "teamuser"))
>>> .load
>>> .createOrReplaceTempView("teamuser")
>>>
>>> val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser")
>>>
>>> userDF.show()
>>>
>>> val userDS:Dataset[Teamuser] = userDF.as[Teamuser]
>>>
>>>
>>
>