Hi, I'm trying to figure out how data is transferred between client and DataNode in Hadoop v1.8.
This is my understanding so far: The client first fires an OP_READ_BLOCK request. The DataNode responds with a status code, checksum header, chunk offset, packet length, sequence number, the last packet boolean, the length and the data (in that order). However, I'm running into an issue. First of all, which of these lengths describes the length of the data? I tried both PacketLength and Length it seems that they leave data on the stream (I tried to "cat" a file with the numbers 1-1000 in it). Also, how does the DataNode signal the start of another packet? After "Length" number of bytes have been read, I assumed that the header would be repeated, but this is not the case (I'm not getting sane values for any of the fields of the header). I've looked through the DataXceiver, BlockSender, DFSClient (RemoteBlockReader) classes but I still can't quite grasp how this data transfer is conducted. Any help would be appreciated, Dhaivat Pandya