Re: RDD transformation and action running out of memory

2015-09-13 Thread Utkarsh Sengar
Yup, that was the problem. Changing the default " mongo.input.split_size" from 8MB to 100MB did the trick. Config reference: https://github.com/mongodb/mongo-hadoop/wiki/Configuration-Reference Thanks! On Sat, Sep 12, 2015 at 3:15 PM, Richard Eggert wrote: > Hmm... The count() method invokes t

Re: RDD transformation and action running out of memory

2015-09-12 Thread Richard Eggert
Hmm... The count() method invokes this: def runJob[T, U: ClassTag](rdd: RDD[T], func: Iterator[T] => U): Array[U] = { runJob(rdd, func, 0 until rdd.partitions.length) } It appears that you're running out of memory while trying to compute (within the driver) the number of partitions that will b

RDD transformation and action running out of memory

2015-09-12 Thread Utkarsh Sengar
I am trying to run this, a basic mapToPair and then count() to trigger an action. 4 executors are launched but I don't see any relevant logs on those executors. It looks like the the driver is pulling all the data and it runs out of memory, the dataset is big, so it won't fit on 1 machine. So wha