Yup, that was the problem.
Changing the default " mongo.input.split_size" from 8MB to 100MB did the
trick.
Config reference:
https://github.com/mongodb/mongo-hadoop/wiki/Configuration-Reference
Thanks!
On Sat, Sep 12, 2015 at 3:15 PM, Richard Eggert
wrote:
> Hmm... The count() method invokes t
Hmm... The count() method invokes this:
def runJob[T, U: ClassTag](rdd: RDD[T], func: Iterator[T] => U): Array[U] =
{
runJob(rdd, func, 0 until rdd.partitions.length)
}
It appears that you're running out of memory while trying to compute
(within the driver) the number of partitions that will b
I am trying to run this, a basic mapToPair and then count() to trigger an
action.
4 executors are launched but I don't see any relevant logs on those
executors.
It looks like the the driver is pulling all the data and it runs out of
memory, the dataset is big, so it won't fit on 1 machine.
So wha