Hi to all,

I'm trying to recursively read a directory but it seems that the
 totalLength value in the FileInputformat.createInputSplits() is not
computed correctly..

I have a files organized as:

/tmp/myDir/A/B/cunk-1.txt
/tmp/myDir/A/B/cunk-2.txt
 ..

If I try to do the following:

Configuration parameters = new Configuration();
parameters.setBoolean("recursive.file.enumeration", true);
env.readTextFile("file:////tmp/myDir)).withParameters(parameters).print();

I get:

Caused by: org.apache.flink.runtime.JobException: Creating the input splits
caused an error: Java heap space
at
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:162)
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:471)
at org.apache.flink.runtime.jobmanager.JobManager.org
$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:515)
... 19 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2219)
at java.util.ArrayList.grow(ArrayList.java:242)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208)
at java.util.ArrayList.add(ArrayList.java:440)
at
org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:503)
at
org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:51)
at
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:146)

Am I doing something wrong or is it a bug?

Best,
Flavio

Reply via email to