Hi,
I'm trying to read in a CSV file into a Dataset but keep having compilation
issues. I'm using spark 2.1 and the following is a small program that
exhibit the issue I'm having. I've searched around but not found a solution
that worked, I've added "import sqlContext.implicits._" as suggested but no
luck. What am I missing? Would appreciate some advice.
import org.apache.spark.sql.functions._
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.{Encoder,Encoders}
object DatasetTest{
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("DatasetTest")
val sc = new SparkContext(sparkConf)
case class Foo(text: String)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val ds : org.apache.spark.sql.Dataset[Foo] =
sqlContext.read.csv(args(1)).as[Foo]
ds.show
}
}
Compiling the above program gives, I'd expect it to work as its a simple
case class, changing it to as[String] works, but I would like to get the
case class to work.
[error] /home/keith/dataset/DataSetTest.scala:13: Unable to find encoder
for type stored in a Dataset. Primitive types (Int, String, etc) and
Product types (case classes) are supported by importing spark.implicits._
Support for serializing other types will be added in future releases.
[error] val ds : org.apache.spark.sql.Dataset[Foo] =
sqlContext.read.csv(args(1)).as[Foo]
Regards,
Keith.