Hi Stephan,
I figured as much, since 128k is a plit size that is not commonly used in
large scale data processing engines. I will go for increasing the split
size to reduce coordination overhead for Flink. It just so happened that my
small toy example brought up the issue. Thanks for clearing this
Hi Robert!
I did some debugging and added some tests. Turns out, this is actually
expected behavior. It has to do with the splitting of the records.
Because creating the splits happens without knowing the contents, the split
can be either in the middle of a record, or (by chance) exactly at the
b
Hi Robert!
This clearly sounds like unintended behavior. Thanks for reporting this.
Apparently, the 0 line length was supposed to have a double meaning, but it
goes haywire in this case.
Let me try to come with a fix for this...
Greetings,
Stephan
On Fri, Jul 10, 2015 at 6:05 PM, Robert Schmi
Hey everyone,
I just noticed that when processing input splits from a
DelimitedInputFormat (specifically, I have a text file with words in it),
that if the splitLength is 0, the entire readbuffer is filled (see
https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/a