Hi Recently I want to save a big RDD[(k,v)] in form of index and data ,I deceide to use hadoop mapFile. I tried some examples like this :https://gist.github.com/airawat/6538748 I runs the code well and generate a index and data file. I can use command "hadoop fs -text /spark/out2/mapFile/data" to open the file .But when I run command :hadoop fs -text /spark/out2/mapFile/index ,I can't see the index content .there are only some informations in console : 14/11/10 16:11:04 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor [.deflate] 14/11/10 16:11:04 INFO compress.CodecPool: Got brand-new decompressor [.deflate] and commond :hadoop fs -ls /spark/out2/mapFile/ shows follows -rw-r--r-- 3 spark hdfs 24002 2014-11-10 15:19 /spark/out2/mapFile/data -rw-r--r-- 3 spark hdfs 136 2014-11-10 15:19 /spark/out2/mapFile/index
I think "INFO compress.CodecPool: Got brand-new decompressor [.deflate]" should't prohibit show the data in index. It'e really confused me. My code was as follows: def try_Map_File(writePath:String) = { val uri = writePath+"/mapFile" val data=Array( "One, two, buckle my shoe","Three, four, shut the door","Five, six, pick up sticks", "Seven, eight, lay them straight","Nine, ten, a big fat hen") val con = new SparkConf() con.set("spark.io.compression.codec","org.apache.spark.io.LZ4CompressionCodec") val sc= new SparkContext(con) val conf = sc.hadoopConfiguration val fs = FileSystem.get(URI.create(uri),conf) val key = new IntWritable() val value = new Text() var writer:MapFile.Writer = null try{ val writer = new Writer(conf,fs,uri,key.getClass,value.getClass) writer.setIndexInterval(64) for(i<- Range(0,512)){ key.set(i+1) value.set(data(i%data.length)) writer.append(key,value) } }finally { IOUtils.closeStream(writer) } } can anyone give me some idea or other method to instead mapFile? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/index-File-create-by-mapFile-can-t-tp18469.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org