[jira] [Reopened] (HDFS-10543) hdfsRead read stops at block boundary

James Clampffer (JIRA) Tue, 26 Jul 2016 11:35:37 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Clampffer reopened HDFS-10543:
------------------------------------
      Assignee: James Clampffer

Reopening this.

Recently found some data being mangled that always happened to be right on 
block boundaries.  Ran the HDFS-8790 test with an 88MB file, 1MB blocks, a 170 
byte pattern, and 24 threads and was able to reproduce very quickly (modified 
to print out file offset on fail).  Using the pattern is an important part 
because the buffers come back the right size, just with parts missing.  After 
backing this patch out there are some short reads, but otherwise the data looks 
correct.  Xiaowei, could you take another look?

Thanks for the input [~cmccabe].  We could add a readFully function to 
hdfs_ext.h or provide a flag to force/prevent reading across blocks.  I think 
adding some logic to span blocks inside of the library by default would be 
handy in the short term just to make this library a bit easier to use (so that 
hopefully more people try it out).

> hdfsRead read stops at block boundary
> -------------------------------------
>
>                 Key: HDFS-10543
>                 URL: https://issues.apache.org/jira/browse/HDFS-10543
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Xiaowei Zhu
>            Assignee: James Clampffer
>             Fix For: HDFS-8707
>
>         Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>       memset(buf2, 0, (size_t)file_info->mSize);
>       int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>       delete [] buf2;
>       if(ret != file_info->mSize) {
>         std::stringstream ss;
>         ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
>         ReportError(ss.str());
>         hdfsCloseFile(fs, file);
>         continue;
>       }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 1468888890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Reopened] (HDFS-10543) hdfsRead read stops at block boundary

Reply via email to