Hi Flavio, how many files are in the directory? You can count with "find /tmp/myDir | wc -l"
Flink running out of memory while creating input splits indicates to me that there are a lot of files in there. On Tue, May 26, 2015 at 2:10 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi to all, > > I'm trying to recursively read a directory but it seems that the > totalLength value in the FileInputformat.createInputSplits() is not > computed correctly.. > > I have a files organized as: > > /tmp/myDir/A/B/cunk-1.txt > /tmp/myDir/A/B/cunk-2.txt > .. > > If I try to do the following: > > Configuration parameters = new Configuration(); > parameters.setBoolean("recursive.file.enumeration", true); > env.readTextFile("file:////tmp/myDir)).withParameters(parameters).print(); > > I get: > > Caused by: org.apache.flink.runtime.JobException: Creating the input > splits caused an error: Java heap space > at > org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:162) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:471) > at org.apache.flink.runtime.jobmanager.JobManager.org > $apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:515) > ... 19 more > Caused by: java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2219) > at java.util.ArrayList.grow(ArrayList.java:242) > at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216) > at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208) > at java.util.ArrayList.add(ArrayList.java:440) > at > org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:503) > at > org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:51) > at > org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:146) > > Am I doing something wrong or is it a bug? > > Best, > Flavio >