Hi Hynek, In execution, matrices.first(60000).print() is different from matrices.print(). It is adding a reducer operator to the job which only collects the first 6000 records from the source. So if your InputFormat can generate more than 60000 (which can be unexpected though), and the trailing data are some how corrupted, this case may happen.
Do you have the detailed error info that why your program exits? That can be helpful to identify the root cause. Thanks, Zhu Zhu Hynek Noll <nollh...@fel.cvut.cz> 于2019年8月9日周五 下午8:59写道: > Hi, > I'm trying to implement a custom FileInputFormat (to read the MNIST > Dataset). > The creation of Flink DataSet (DataSet<byte[]> matrices) seems to be OK, > but when I try to print it using either > matrices.print(); > or > matrices.collect(); > > It finishes with exit code -17. > (Before, I compiled using Java 11 and aside from a reflection warning, > this approach caused the program to run indefinitely. Now I use JDK 8) > The total number of elements is 60 000. Now the strange thing is that when > I run > > matrices.first(60000).print(); > > it does print the elements just fine. But my understanding is that these > two approaches should work the same way, if there are exactly 60 000 > records. > > Is this a bug? Or something that can be explained by my extension of > FileInputFormat (I might very well not use it correctly)? > > Best regards, > Hynek >