It's DataSet program that performs simple filtering, crossjoin and aggregation.
I'm using Hadoop S3 FileSystem (not Emr) as far as Flink's s3 connector doesn't work at all. Currently I have 3 taskmanagers each 5k MB, but I tried different configurations and all leads to the same exception *Sent from my ZenFone On Oct 8, 2015 12:05 PM, "Stephan Ewen" <se...@apache.org> wrote: > Can you give us a bit more background? What exactly is your program > doing? > > - Are you running a DataSet program, or a DataStream program? > - Is it one simple source that reads from S3, or are there multiple > sources? > - What operations do you apply on the CSV file? > - Are you using Flink's S3 connector, or the Hadoop S3 file system? > > Greetings, > Stephan > > > On Thu, Oct 8, 2015 at 5:58 PM, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > >> Hi guys, >> >> I'm running FLink on EMR with 2 m3.xlarge (each 16 GB RAM) and trying to >> process 3.8 GB CSV data from S3. I'm surprised the fact that Flink failed >> with OutOfMemory: Java Heap space >> >> I tried to find the reason: >> 1) to identify TaskManager with a command ps aux | grep TaskManager >> 2) then build Heap histo: >> $ jmap -histo:live 19648 | head -n23 >> num #instances #bytes class name >> ---------------------------------------------- >> 1: 131018 3763501304 [B >> 2: 61022 7820352 <methodKlass> >> 3: 61022 7688456 <constMethodKlass> >> 4: 4971 5454408 <constantPoolKlass> >> 5: 4966 4582232 <instanceKlassKlass> >> 6: 4169 3003104 <constantPoolCacheKlass> >> 7: 15696 1447168 [C >> 8: 1291 638824 [Ljava.lang.Object; >> 9: 5318 506000 java.lang.Class >> >> >> Do you have any ideas what can be the reason and how it can be fixed? >> Is Flink uses out-of-heap memory? >> >> >> Thank you, >> Konstantin Kudryavtsev >> > >