Thanks for your reply! According to your hint, the code should be like this:
// i want to save data in rdd to mongodb and hdfs
rdd.saveAsNewAPIHadoopFile() rdd.saveAsTextFile()
but will the application read hdfs twice?
qinwei
From: Akhil DasDate: 2014-11-07 18:32To: qinweiCC: userSubject: Re: about
write mongodb in mapPartitionsWhy not saveAsNewAPIHadoopFile?
//Define your mongoDB confsval config = new Configuration()
config.set("mongo.output.uri", "mongodb://127.0.0.1:27017/sigmoid.output")
//Write everything to mongo rdd.saveAsNewAPIHadoopFile("file:///some/random",
classOf[Any], classOf[Any], classOf[com.mongodb.hadoop.MongoOutputFormat[Any,
Any]], config)
ThanksBest Regards
On Fri, Nov 7, 2014 at 2:53 PM, qinwei <wei....@dewmobile.net> wrote:
Hi, everyone
I come across with a prolem about writing data to mongodb in mapPartitions,
my code is as below: val sourceRDD =
sc.textFile("hdfs://host:port/sourcePath") // some transformations
val rdd= sourceRDD .map(mapFunc).filter(filterFunc) val newRDD =
rdd.mapPartitions(args => { val mongoClient = new
MongoClient("host", port)
val db = mongoClient.getDB("db")
val coll = db.getCollection("collectionA")
args.map(arg => {
coll.insert(new BasicDBObject("pkg", arg))
arg
})
mongoClient.close()
args
}) newRDD.saveAsTextFile("hdfs://host:port/path") The
application saved data to HDFS correctly, but not mongodb, is there someting
wrong? I know that collecting the newRDD to driver and then saving it to
mongodb will success, but will the following saveAsTextFile read the filesystem
once again?
Thanks
qinwei