I have 10 files..I debugged the code and it seems that there's a loop in the FileInputFormat when files are nested far away from the root directory of the scan
On Tue, May 26, 2015 at 2:14 PM, Robert Metzger <rmetz...@apache.org> wrote: > Hi Flavio, > > how many files are in the directory? > You can count with "find /tmp/myDir | wc -l" > > Flink running out of memory while creating input splits indicates to me > that there are a lot of files in there. > > On Tue, May 26, 2015 at 2:10 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Hi to all, >> >> I'm trying to recursively read a directory but it seems that the >> totalLength value in the FileInputformat.createInputSplits() is not >> computed correctly.. >> >> I have a files organized as: >> >> /tmp/myDir/A/B/cunk-1.txt >> /tmp/myDir/A/B/cunk-2.txt >> .. >> >> If I try to do the following: >> >> Configuration parameters = new Configuration(); >> parameters.setBoolean("recursive.file.enumeration", true); >> env.readTextFile("file:////tmp/myDir)).withParameters(parameters).print(); >> >> I get: >> >> Caused by: org.apache.flink.runtime.JobException: Creating the input >> splits caused an error: Java heap space >> at >> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:162) >> at >> org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:471) >> at org.apache.flink.runtime.jobmanager.JobManager.org >> $apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:515) >> ... 19 more >> Caused by: java.lang.OutOfMemoryError: Java heap space >> at java.util.Arrays.copyOf(Arrays.java:2219) >> at java.util.ArrayList.grow(ArrayList.java:242) >> at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216) >> at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208) >> at java.util.ArrayList.add(ArrayList.java:440) >> at >> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:503) >> at >> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:51) >> at >> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:146) >> >> Am I doing something wrong or is it a bug? >> >> Best, >> Flavio >> > >