Can you give us a bit more background?  What exactly is your program doing?

  - Are you running a DataSet program, or a DataStream program?
  - Is it one simple source that reads from S3, or are there multiple
sources?
  - What operations do you apply on the CSV file?
  - Are you using Flink's S3 connector, or the Hadoop S3 file system?

Greetings,
Stephan


On Thu, Oct 8, 2015 at 5:58 PM, KOSTIANTYN Kudriavtsev <
kudryavtsev.konstan...@gmail.com> wrote:

> Hi guys,
>
> I'm running FLink on EMR with 2 m3.xlarge (each 16 GB RAM) and trying to
> process 3.8 GB CSV data from S3. I'm surprised the fact that Flink failed
> with OutOfMemory: Java Heap space
>
> I tried to find the reason:
> 1) to identify TaskManager with a command ps aux | grep TaskManager
> 2) then build Heap histo:
> $ jmap -histo:live 19648 | head -n23
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:        131018     3763501304  [B
>    2:         61022        7820352  <methodKlass>
>    3:         61022        7688456  <constMethodKlass>
>    4:          4971        5454408  <constantPoolKlass>
>    5:          4966        4582232  <instanceKlassKlass>
>    6:          4169        3003104  <constantPoolCacheKlass>
>    7:         15696        1447168  [C
>    8:          1291         638824  [Ljava.lang.Object;
>    9:          5318         506000  java.lang.Class
>
>
> Do you have any ideas what can be the reason and how it can be fixed?
> Is Flink uses out-of-heap memory?
>
>
> Thank you,
> Konstantin Kudryavtsev
>

Reply via email to