Hi Arnaud! Java direct-memory is tricky to debug. You can turn on the memory logging or check the TaskManager tab in teh web dashboard - both report on direct memory consumption.
One thing you can look for is forgetting to close streams. That means the streams consume native resources until the Java object is Garbage Collected, which may be quite a bit later. Greetings,. Stephan On Fri, Nov 13, 2015 at 3:59 PM, Ufuk Celebi <u...@apache.org> wrote: > > > On 13 Nov 2015, at 15:49, LINZ, Arnaud <al...@bouyguestelecom.fr> wrote: > > > > Hi Robert, > > > > Thanks, it works with 50% -- at least way past the previous crash point. > > > > In my opinion (I lack real metrics), the part that uses the most memory > is the M2 mapper, instantiated once per slot. > > The most complex part is the Sink (it does use a lot of hdfs files, > flushing threads etc.) ; but I expect the “RichSinkFunction” to be > instantiated only once per slot ? I’m really surprised by that memory > usage, I will try using a monitoring app on the yarn jvm to understand. > > In general it’s instantiated once per subtask. For your current > deployment, it is one per slot. > > – Ufuk > >