> On 13 Nov 2015, at 15:49, LINZ, Arnaud <al...@bouyguestelecom.fr> wrote: > > Hi Robert, > > Thanks, it works with 50% -- at least way past the previous crash point. > > In my opinion (I lack real metrics), the part that uses the most memory is > the M2 mapper, instantiated once per slot. > The most complex part is the Sink (it does use a lot of hdfs files, flushing > threads etc.) ; but I expect the “RichSinkFunction” to be instantiated only > once per slot ? I’m really surprised by that memory usage, I will try using a > monitoring app on the yarn jvm to understand.
In general it’s instantiated once per subtask. For your current deployment, it is one per slot. – Ufuk