I'm trying to use MongoDB as a destination for an ETL I'm writing in
Spark.  It appears I'm gaining a lot of overhead in my system databases
(and possibly in the primary documents themselves);  I can only assume it's
because I'm left to using PairRDD.saveAsNewAPIHadoopFile.

- Is there a way to batch some of the data together and use Casbah natively
so I can use bulk inserts?

- Is there maybe a less "hacky" way to load to MongoDB (instead of
using saveAsNewAPIHadoopFile)?

Thanks in advance!

Reply via email to