I have the same question and tried with 1, but get compilation error:

[error] …. could not find implicit value for parameter kcf: () => 
org.apache.spark.WritableConverter[String]
[error]     val t2 = sc.sequenceFile[String, Int](“/test/data", 20)


Yishu

On Mar 9, 2014, at 12:21 AM, Shixiong Zhu <zsxw...@gmail.com> wrote:

> Hi Kane,
> 
> In the sequence file, the class is org.apache.hadoop.io.Text. You need to 
> convert Text to String. There are two approaches:
> 
> 1. Use implicit conversions to convert Text to String automatically. I 
> recommend this one. E.g.,
> 
> val t2 = sc.sequenceFile[String, String]("/user/hdfs/e1Mseq")
> t2.groupByKey().take(5) 
> 
> 2. Use "classOf[Text]" to specify the correct class in the sequence file and 
> convert Text to String.  E.g.,
> 
> import org.apache.hadoop.io.Text
> val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[Text], classOf[Text])
> t2.map { case (k,v) => (k.toString, v.toString) } .groupByKey().take(5)
> 
> 
> Best Regards,
> 
> Shixiong Zhu
> 
> 
> 2014-03-09 13:30 GMT+08:00 Kane <kane.ist...@gmail.com>:
> when i try to open sequence file:
> val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[String],
> classOf[String])
> t2.groupByKey().take(5)
> 
> I get:
> org.apache.spark.SparkException: Job aborted: Task 25.0:0 had a not
> serializable result: java.io.NotSerializableException:
> org.apache.hadoop.io.Text
> 
> another thing is:
> t2.take(5) - returns 5 identical items, i guess I have to map/clone items,
> but i get something like org.apache.hadoop.io.Text cannot be cast to
> java.lang.String, how do i clone it?
> 
> Thanks.
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/sequenceFile-and-groupByKey-tp2428.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 

Reply via email to