Hi Harsh, I did mean 0.18 - sorry about the typo.
I read through the BlockSender.sendChunks method once again and noticed that I wasn't reading the checksum byte array correctly in my code. Thanks for the help, Dhaivat Pandya On Sun, Apr 6, 2014 at 8:59 PM, Harsh J <ha...@cloudera.com> wrote: > There's been no Apache Hadoop release versioned v1.8 historically, nor > is one upcoming. Do you mean 0.18? > > Either way, can you point to the specific code lines in BlockSender > which have you confused? The sendBlock and sendPacket methods would > interest you I assume, but they appear to be well constructed/named > internally and commented in a few important spots. > > On Mon, Apr 7, 2014 at 6:39 AM, Dhaivat Pandya <dhaivatpan...@gmail.com> > wrote: > > Hi, > > > > I'm trying to figure out how data is transferred between client and > > DataNode in Hadoop v1.8. > > > > This is my understanding so far: > > > > The client first fires an OP_READ_BLOCK request. The DataNode responds > with > > a status code, checksum header, chunk offset, packet length, sequence > > number, the last packet boolean, the length and the data (in that order). > > > > However, I'm running into an issue. First of all, which of these lengths > > describes the length of the data? I tried both PacketLength and Length it > > seems that they leave data on the stream (I tried to "cat" a file with > the > > numbers 1-1000 in it). > > > > Also, how does the DataNode signal the start of another packet? After > > "Length" number of bytes have been read, I assumed that the header would > be > > repeated, but this is not the case (I'm not getting sane values for any > of > > the fields of the header). > > > > I've looked through the DataXceiver, BlockSender, DFSClient > > (RemoteBlockReader) classes but I still can't quite grasp how this data > > transfer is conducted. > > > > Any help would be appreciated, > > > > Dhaivat Pandya > > > > -- > Harsh J >