Hey guys,

What is the best way for me to get an RDD[(K,V)] for a Map File created by
MapFile.Writer?
The Map file has Text key and MyArrayWritable as the value.

Something akin to sc.textFile($path)

So far I have tried two approaches -sc.hadoopFile and sc.sequenceFile

#1

val rdd= sc.hadoopFile[Text,MyArrayWritable,SequenceFileInputFormat[Text,
MyArrayWritable]]($path)

val count = rdd.count()

 Which  gives me a run time error:

14/06/04 12:05:22 WARN TaskSetManager: Loss was due to java.io.EOFException
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1800)
at
org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43)
at
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:190)


#2 The second snippet that I have tried is

     val rdd:RDD[(Text, QArrayWritable)]
=sc.sequenceFile[Text,QArrayWritable]($path)
     val count = rdd.count()

Same exception as above.


I don't see any MapInputFormat and understood that SequenceFileFormat
should have been correct.

 <
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputFormat.html
>

What am I missing?


Many thanks again,
Amit






In case it is pertinent...
MyArrayWritable is

public class MyArrayWritable extends ArrayWritable {

   public MyArrayWritable() {
        super(CustomWritable.class);
    }
    public QArrayWritable(CustomWritable[] values) {
        super(CustomWritable.class, values);
    }
}
 and CustomWritable implements WritableComparable<CustomWritable>

Reply via email to