Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.

/**** Code *****/

config.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
config.set("mongo.input.query","{host: 'abc.com'}");

JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");

        JavaPairRDD<Object, BSONObject> mongoRDD =
sc.newAPIHadoopRDD(config,
                com.mongodb.hadoop.MongoInputFormat.class, Object.class,
                BSONObject.class);

        long count=mongoRDD.count();

There are about 1.5million record.
Though i am getting data but read operation took around 15min to read whole.

Is this Api really too slow or am i missing something.
Please suggest if there is an alternate approach to read data from Mongo
faster.

Thanks,
Deepesh

Reply via email to