Re: sequenceFile and groupByKey

Yishu Lin Mon, 10 Mar 2014 19:46:24 -0700

Need this to solve the problem:

import org.apache.spark.SparkContext._


Yishu

On Mar 10, 2014, at 2:46 PM, Yishu Lin <yishutheco...@gmail.com> wrote:

> I have the same question and tried with 1, but get compilation error:
> 
> [error] …. could not find implicit value for parameter kcf: () => 
> org.apache.spark.WritableConverter[String]
> [error]     val t2 = sc.sequenceFile[String, Int](“/test/data", 20)
> 
> 
> Yishu
> 
> On Mar 9, 2014, at 12:21 AM, Shixiong Zhu <zsxw...@gmail.com> wrote:
> 
>> Hi Kane,
>> 
>> In the sequence file, the class is org.apache.hadoop.io.Text. You need to 
>> convert Text to String. There are two approaches:
>> 
>> 1. Use implicit conversions to convert Text to String automatically. I 
>> recommend this one. E.g.,
>> 
>> val t2 = sc.sequenceFile[String, String]("/user/hdfs/e1Mseq")
>> t2.groupByKey().take(5) 
>> 
>> 2. Use "classOf[Text]" to specify the correct class in the sequence file and 
>> convert Text to String.  E.g.,
>> 
>> import org.apache.hadoop.io.Text
>> val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[Text], classOf[Text])
>> t2.map { case (k,v) => (k.toString, v.toString) } .groupByKey().take(5)
>> 
>> 
>> Best Regards,
>> 
>> Shixiong Zhu
>> 
>> 
>> 2014-03-09 13:30 GMT+08:00 Kane <kane.ist...@gmail.com>:
>> when i try to open sequence file:
>> val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[String],
>> classOf[String])
>> t2.groupByKey().take(5)
>> 
>> I get:
>> org.apache.spark.SparkException: Job aborted: Task 25.0:0 had a not
>> serializable result: java.io.NotSerializableException:
>> org.apache.hadoop.io.Text
>> 
>> another thing is:
>> t2.take(5) - returns 5 identical items, i guess I have to map/clone items,
>> but i get something like org.apache.hadoop.io.Text cannot be cast to
>> java.lang.String, how do i clone it?
>> 
>> Thanks.
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/sequenceFile-and-groupByKey-tp2428.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>

Re: sequenceFile and groupByKey

Reply via email to