Hi Robert! This clearly sounds like unintended behavior. Thanks for reporting this.
Apparently, the 0 line length was supposed to have a double meaning, but it goes haywire in this case. Let me try to come with a fix for this... Greetings, Stephan On Fri, Jul 10, 2015 at 6:05 PM, Robert Schmidtke <ro.schmid...@gmail.com> wrote: > Hey everyone, > > I just noticed that when processing input splits from a > DelimitedInputFormat (specifically, I have a text file with words in it), > that if the splitLength is 0, the entire readbuffer is filled (see > https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/io/DelimitedInputFormat.java#L577). > I'm using XtreemFS as underlying file system, which stripes files in blocks > of 128kb across storage servers. I have 8 physically separate nodes, and my > input file is 1MB, such that each node stores 128kb of data. This is > reported accurately to Flink (e.g. split sizes and hostnames). Now when the > splitLength is 0 at some point during processing (which it will become > eventually), the entire file is read in again, which kind of defeats the > point of processing a split of length 0. Is this intended behavior? I've > tried multiple hot-fixes, but they ended up in the file not bein read in > its entirety. I would like to know the rationale behind this > implementation, and maybe figure out a way around it. Thanks in advance, > > Robert > > -- > My GPG Key ID: 336E2680 >