Hi, can you please take a look at your TM logs? I would expect that you can see an java.lang.OutOfMemoryError there.
If this assumption is correct, you can try to: 1. Further decrease the taskmanager.memory.fraction: This will cause the TaskManager to allocate less memory for managed memory and leaves more free heap memory available 2. Decrease the number of slots on the TaskManager: This will decrease the number of concurrently running user functions and thus the number of objects which have to be kept on the heap. 3. Increase the number of ALS blocks `als.setBlocks(numberBlocks)`. This will increase the number of blocks into which the factor matrices are split up. A larger number means that each individual block is smaller and thus will need fewer memory to be kept on the heap. Best, Stefan > Am 12.06.2017 um 15:55 schrieb Sebastian Neef <gehax...@mailbox.tu-berlin.de>: > > Hi, > > when I'm running my Flink job on a small dataset, it successfully > finishes. However, when a bigger dataset is used, I get multiple exceptions: > > - Caused by: java.io.IOException: Cannot write record to fresh sort > buffer. Record too large. > - Thread 'SortMerger Reading Thread' terminated due to an exception: null > > A full stack trace can be found here [0]. > > I tried to reduce the taskmanager.memory.fraction (or so) and also the > amount of parallelism, but that did not help much. > > Flink 1.0.3-Hadoop2.7 was used. > > Any tipps are appreciated. > > Kind regards, > Sebastian > > [0]: > http://paste.gehaxelt.in/?1f24d0da3856480d#/dR8yriXd/VQn5zTfZACS52eWiH703bJbSTZSifegwI=