Programming with java on spark

付雅丹 Mon, 22 Jun 2015 18:29:34 -0700

Hello, everyone! I'm new in spark. I have already written programs in
Hadoop2.5.2, where I defined my own InputFormat and OutputFormat. Now I
want to move my codes to spark using java language. The first problem I
encountered is how to transform big txt file in local storage to RDD, which
is compatible to my program written in hadoop. I found that there are
functions in SparkContext which maybe helpful. But I don't know how to use
them.
E.G.


public <K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>>
RDD 
<http://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/rdd/RDD.html><scala.Tuple2<K,V>>
newAPIHadoopFile(String path,
                                           Class<F> fClass,
                                           Class<K> kClass,
                                           Class<V> vClass,
                         org.apache.hadoop.conf.Configuration conf)

Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
and extra configuration options to pass to the input format.

'''Note:''' Because Hadoop's RecordReader class re-uses the same Writable
object for each record, directly caching the returned RDD or directly
passing it to an aggregation or shuffle operation will create many
references to the same object. If you plan to directly cache, sort, or
aggregate Hadoop writable objects, you should first copy them using a map
 function.
In java, the following is wrong.

/////option one
Configuration confHadoop = new Configuration();
JavaPairRDD<LongWritable,Text> distFile=sc.newAPIHadoopFile(
"hdfs://cMaster:9000/wcinput/data.txt",
DataInputFormat,LongWritable,Text,confHadoop);

/////option two
Configuration confHadoop = new Configuration();
DataInputFormat input=new DataInputFormat();
LongWritable longType=new LongWritable();
Text text=new Text();
JavaPairRDD<LongWritable,Text> distFile=sc.newAPIHadoopFile(
"hdfs://cMaster:9000/wcinput/data.txt",
input,longType,text,confHadoop);

Can anyone help me? Thank you so much.

Programming with java on spark

Reply via email to