Hi,

can you please take a look at your TM logs? I would expect that you can see an 
java.lang.OutOfMemoryError there.

If this assumption is correct, you can try to:

1. Further decrease the taskmanager.memory.fraction: This will cause the 
TaskManager to allocate less memory for managed memory and leaves more free 
heap memory available
2. Decrease the number of slots on the TaskManager: This will decrease the 
number of concurrently running user functions and thus the number of objects 
which have to be kept on the heap.
3. Increase the number of ALS blocks `als.setBlocks(numberBlocks)`. This will 
increase the number of blocks into which the factor matrices are split up. A 
larger number means that each individual block is smaller and thus will need 
fewer memory to be kept on the heap.

Best,
Stefan

> Am 12.06.2017 um 15:55 schrieb Sebastian Neef <gehax...@mailbox.tu-berlin.de>:
> 
> Hi,
> 
> when I'm running my Flink job on a small dataset, it successfully
> finishes. However, when a bigger dataset is used, I get multiple exceptions:
> 
> -  Caused by: java.io.IOException: Cannot write record to fresh sort
> buffer. Record too large.
> - Thread 'SortMerger Reading Thread' terminated due to an exception: null
> 
> A full stack trace can be found here [0].
> 
> I tried to reduce the taskmanager.memory.fraction (or so) and also the
> amount of parallelism, but that did not help much.
> 
> Flink 1.0.3-Hadoop2.7 was used.
> 
> Any tipps are appreciated.
> 
> Kind regards,
> Sebastian
> 
> [0]:
> http://paste.gehaxelt.in/?1f24d0da3856480d#/dR8yriXd/VQn5zTfZACS52eWiH703bJbSTZSifegwI=

Reply via email to