Here are some pointers - You would rather need MORE managed memory, not less, because the sorter uses that.
- We added the "large record handler" to the sorter for exactly these use cases. Can you check in the code whether it is enabled? You'll have to go through a bit of the code to see that. It is an older Flink version, I am not quite sure any more how exactly it was there. Stephan On Wed, Jun 14, 2017 at 8:59 PM, Ted Yu <yuzhih...@gmail.com> wrote: > For #2, XmlInputFormat was involved. > > Is it possible to prune (unneeded) field(s) so that heap requirement is > lower ? > > On Wed, Jun 14, 2017 at 8:47 AM, Sebastian Neef < > gehax...@mailbox.tu-berlin.de> wrote: > >> Hi Ted, >> >> sure. >> >> Here's the stack strace with .distinct() with the Exception in the >> 'SortMerger Reading Thread': [1] >> >> Here's the stack strace without .distinct() and the 'Requested array >> size exceeds VM limit' error: [2] >> >> If you need anything else, I can more or less reliably reproduce the >> issue. >> >> The best, >> Sebastian >> >> [1] >> http://paste.gehaxelt.in/?2757c33ed3a3733b#jHQPPQNKKrE2wq4o9 >> KCR48m+/V91S55kWH3dwEuyAkc= >> [2] >> http://paste.gehaxelt.in/?b106990deccecf1a#y22HgySqCYEOaP2wN >> 6xxApGk/r4YICRkLCH2HBNN9yQ= >> > >