Re: MapStatus too large for drvier

2015-10-20 Thread yaoqin
I try to use org.apache.spark.util.collection.BitSet instead of RoaringBitMap, and it can save about 20% memories but runs much slower. For the 200K tasks job, RoaringBitMap uses 3 Long[1024] and 1 Short[3392] =3*64*1024+16*3392=250880(bit) BitSet uses 1 Long[3125] = 3125*64=20(bit) Memory s

Re: MapStatus too large for drvier

2015-10-20 Thread yaoqin
In our case, we are dealing with 20TB text data which is separated to about 200k map tasks and 200k reduce tasks, and our driver's memory is 15G,. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MapStatus-too-large-for-drvier-tp14704p14707.html Sent fr

Re: MapStatus too large for drvier

2015-10-20 Thread Reynold Xin
How big is your driver heap size? And any reason why you'd need 200k map and 200k reduce tasks? On Mon, Oct 19, 2015 at 11:59 PM, yaoqin wrote: > Hi everyone, > > When I run a spark job contains quite a lot of tasks(in my case is > 200,000*200,000), the driver occured OOM mainly caused by t