Re: DelimitedInputFormat reads entire buffer when splitLength is 0

2015-07-13 Thread Robert Schmidtke
Hi Stephan, I figured as much, since 128k is a plit size that is not commonly used in large scale data processing engines. I will go for increasing the split size to reduce coordination overhead for Flink. It just so happened that my small toy example brought up the issue. Thanks for clearing this

Re: DelimitedInputFormat reads entire buffer when splitLength is 0

2015-07-12 Thread Stephan Ewen
Hi Robert! I did some debugging and added some tests. Turns out, this is actually expected behavior. It has to do with the splitting of the records. Because creating the splits happens without knowing the contents, the split can be either in the middle of a record, or (by chance) exactly at the b

Re: DelimitedInputFormat reads entire buffer when splitLength is 0

2015-07-10 Thread Stephan Ewen
Hi Robert! This clearly sounds like unintended behavior. Thanks for reporting this. Apparently, the 0 line length was supposed to have a double meaning, but it goes haywire in this case. Let me try to come with a fix for this... Greetings, Stephan On Fri, Jul 10, 2015 at 6:05 PM, Robert Schmi

DelimitedInputFormat reads entire buffer when splitLength is 0

2015-07-10 Thread Robert Schmidtke
Hey everyone, I just noticed that when processing input splits from a DelimitedInputFormat (specifically, I have a text file with words in it), that if the splitLength is 0, the entire readbuffer is filled (see https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/a