Hi, I am trying to run a Flink job that simply writes some data to IGFS (URI: igfs://igfs@/tmp/output/mydata.bin), as Flink supports custom Hadoop filesystems [1]. However, I get a timeout error on the job, and looking further I can find an exception logged by Hadoop's Datanode:
java.io.EOFException: End of File Exception between local host is: > "cloud-7.mynetwork/130.149.21.11"; destination host is: "cloud-7":45000; > : java.io.EOFException; For more details see: > http://wiki.apache.org/hadoop/EOFException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765) > at org.apache.hadoop.ipc.Client.call(Client.java:1480) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy14.sendHeartbeat(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:153) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:553) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1079) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:974) > 17/07/21 04:33:56 INFO ipc.Client: Retrying connect to server: cloud-7/ > 130.149.21.11:45000. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) Ignite is running in PRIMARY mode (as I want to make sure to operate in memory) and I can use Hadoop CLI tools to query the filesystem (using bin/hdfs dfs -ls igfs://igfs@/) and write to it (using bin/hdfs dfs -copyFromLocal). I'd appreciate any ideas as to what could cause the write to fail when doing it programmatically. Thanks in advance. Best, Rodrigo [1] https://ci.apache.org/projects/flink/flink-docs-release-0.8/example_connectors.html
