FYI dev, I submitted a PR making Snappy as the default compression codec: https://github.com/apache/spark/pull/1415
Also submitted a separate PR to add lz4 support: https://github.com/apache/spark/pull/1416 On Mon, Jul 14, 2014 at 5:06 PM, Aaron Davidson <ilike...@gmail.com> wrote: > One of the core problems here is the number of open streams we have, which > is (# cores * # reduce partitions), which can easily climb into the tens of > thousands for large jobs. This is a more general problem that we are > planning on fixing for our largest shuffles, as even moderate buffer sizes > can explode to use huge amounts of memory at that scale. > > > On Mon, Jul 14, 2014 at 4:53 PM, Jon Hartlaub <jhartl...@gmail.com> wrote: > > > Is the held memory due to just instantiating the LZFOutputStream? If so, > > I'm a surprised and I consider that a bug. > > > > I suspect the held memory may be due to a SoftReference - memory will be > > released with enough memory pressure. > > > > Finally, is it necessary to keep 1000 (or more) decoders active? Would > it > > be possible to keep an object pool of encoders and check them in and out > as > > needed? I admit I have not done much homework to determine if this is > > viable. > > > > -Jon > > > > > > On Mon, Jul 14, 2014 at 4:08 PM, Reynold Xin <r...@databricks.com> > wrote: > > > > > Copying Jon here since he worked on the lzf library at Ning. > > > > > > Jon - any comments on this topic? > > > > > > > > > On Mon, Jul 14, 2014 at 3:54 PM, Matei Zaharia < > matei.zaha...@gmail.com> > > > wrote: > > > > > >> You can actually turn off shuffle compression by setting > > >> spark.shuffle.compress to false. Try that out, there will still be > some > > >> buffers for the various OutputStreams, but they should be smaller. > > >> > > >> Matei > > >> > > >> On Jul 14, 2014, at 3:30 PM, Stephen Haberman < > > stephen.haber...@gmail.com> > > >> wrote: > > >> > > >> > > > >> > Just a comment from the peanut gallery, but these buffers are a real > > >> > PITA for us as well. Probably 75% of our non-user-error job failures > > >> > are related to them. > > >> > > > >> > Just naively, what about not doing compression on the fly? E.g. > during > > >> > the shuffle just write straight to disk, uncompressed? > > >> > > > >> > For us, we always have plenty of disk space, and if you're concerned > > >> > about network transmission, you could add a separate compress step > > >> > after the blocks have been written to disk, but before being sent > over > > >> > the wire. > > >> > > > >> > Granted, IANAE, so perhaps this is a bad idea; either way, awesome > to > > >> > see work in this area! > > >> > > > >> > - Stephen > > >> > > > >> > > >> > > > > > >