Can you give us a bit more background? What exactly is your program doing?
- Are you running a DataSet program, or a DataStream program? - Is it one simple source that reads from S3, or are there multiple sources? - What operations do you apply on the CSV file? - Are you using Flink's S3 connector, or the Hadoop S3 file system? Greetings, Stephan On Thu, Oct 8, 2015 at 5:58 PM, KOSTIANTYN Kudriavtsev < kudryavtsev.konstan...@gmail.com> wrote: > Hi guys, > > I'm running FLink on EMR with 2 m3.xlarge (each 16 GB RAM) and trying to > process 3.8 GB CSV data from S3. I'm surprised the fact that Flink failed > with OutOfMemory: Java Heap space > > I tried to find the reason: > 1) to identify TaskManager with a command ps aux | grep TaskManager > 2) then build Heap histo: > $ jmap -histo:live 19648 | head -n23 > num #instances #bytes class name > ---------------------------------------------- > 1: 131018 3763501304 [B > 2: 61022 7820352 <methodKlass> > 3: 61022 7688456 <constMethodKlass> > 4: 4971 5454408 <constantPoolKlass> > 5: 4966 4582232 <instanceKlassKlass> > 6: 4169 3003104 <constantPoolCacheKlass> > 7: 15696 1447168 [C > 8: 1291 638824 [Ljava.lang.Object; > 9: 5318 506000 java.lang.Class > > > Do you have any ideas what can be the reason and how it can be fixed? > Is Flink uses out-of-heap memory? > > > Thank you, > Konstantin Kudryavtsev >