Re: Reading sequencefile

Shixiong Zhu Tue, 11 Mar 2014 02:47:26 -0700

Hi Jaonary,

You can use "sc.sequenceFile" to load your file. E.g.,


scala> import org.apache.hadoop.io._
import org.apache.hadoop.io._

scala> val rdd = sc.sequenceFile("path_to_file", classOf[Text],
classOf[BytesWritable])
rdd: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.Text,
org.apache.hadoop.io.BytesWritable)] = HadoopRDD[0] at sequenceFile at
<console>:15


Best Regards,
Shixiong Zhu


2014-03-11 16:54 GMT+08:00 Jaonary Rabarisoa <[email protected]>:

> Hi all,
>
> I'm trying to read a sequenceFile that represent a set of jpeg image
> generated using this tool :
> http://stuartsierra.com/2008/04/24/a-million-little-files . According to
> the documentation : "Each key is the name of a file (a Hadoop "Text"),
> the value is the binary contents of the file (a BytesWritable)"
>
> How do I load the generated file inside spark ?
>
> Cheers,
>
> Jaonary
>

Re: Reading sequencefile

Reply via email to