MapStatus too large for drvier

2015-10-20 Thread yaoqin
Hi everyone, When I run a spark job contains quite a lot of tasks(in my case is 200,000*200,000), the driver occured OOM mainly caused by the object MapStatus, As is shown in the pic bellow, RoaringBitmap that used to mark which block is empty seems to use too many memories. Are there any

Re: MapStatus too large for drvier

2015-10-20 Thread yaoqin
In our case, we are dealing with 20TB text data which is separated to about 200k map tasks and 200k reduce tasks, and our driver's memory is 15G,. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MapStatus-too-large-for-drvier-tp14704p14707.html Sent fr

Re: MapStatus too large for drvier

2015-10-20 Thread yaoqin
I try to use org.apache.spark.util.collection.BitSet instead of RoaringBitMap, and it can save about 20% memories but runs much slower. For the 200K tasks job, RoaringBitMap uses 3 Long[1024] and 1 Short[3392] =3*64*1024+16*3392=250880(bit) BitSet uses 1 Long[3125] = 3125*64=20(bit) Memory s