There's been no Apache Hadoop release versioned v1.8 historically, nor is one upcoming. Do you mean 0.18?
Either way, can you point to the specific code lines in BlockSender which have you confused? The sendBlock and sendPacket methods would interest you I assume, but they appear to be well constructed/named internally and commented in a few important spots. On Mon, Apr 7, 2014 at 6:39 AM, Dhaivat Pandya <dhaivatpan...@gmail.com> wrote: > Hi, > > I'm trying to figure out how data is transferred between client and > DataNode in Hadoop v1.8. > > This is my understanding so far: > > The client first fires an OP_READ_BLOCK request. The DataNode responds with > a status code, checksum header, chunk offset, packet length, sequence > number, the last packet boolean, the length and the data (in that order). > > However, I'm running into an issue. First of all, which of these lengths > describes the length of the data? I tried both PacketLength and Length it > seems that they leave data on the stream (I tried to "cat" a file with the > numbers 1-1000 in it). > > Also, how does the DataNode signal the start of another packet? After > "Length" number of bytes have been read, I assumed that the header would be > repeated, but this is not the case (I'm not getting sane values for any of > the fields of the header). > > I've looked through the DataXceiver, BlockSender, DFSClient > (RemoteBlockReader) classes but I still can't quite grasp how this data > transfer is conducted. > > Any help would be appreciated, > > Dhaivat Pandya -- Harsh J