I try to use org.apache.spark.util.collection.BitSet instead of
RoaringBitMap, and it can save about 20% memories but runs much slower.
For the 200K tasks job,
RoaringBitMap uses 3 Long[1024] and 1 Short[3392]
=3*64*1024+16*3392=250880(bit)
BitSet uses 1 Long[3125] = 3125*64=20(bit)
Memory s
In our case, we are dealing with 20TB text data which is separated to about
200k map tasks and 200k reduce tasks, and our driver's memory is 15G,.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/MapStatus-too-large-for-drvier-tp14704p14707.html
Sent fr
How big is your driver heap size? And any reason why you'd need 200k map
and 200k reduce tasks?
On Mon, Oct 19, 2015 at 11:59 PM, yaoqin wrote:
> Hi everyone,
>
> When I run a spark job contains quite a lot of tasks(in my case is
> 200,000*200,000), the driver occured OOM mainly caused by t