Hi Aastha, A read-ahead buffer is a common technique to trade higher bandwidth for lower latency for a number of common read patterns. Your OS does something similar (a much more advanced technique though). By reading ahead, HDFS is betting that your reads have a pattern to it. I think the 10MB default is a touch excessive (made more sense in previous releases). I use 32KB.
The buffer is not used if you have very large reads, as it doesn't provide any benefit. Brian On Sep 7, 2011, at 12:45 AM, Aastha Mehta wrote: > Hello, > > I am using FUSE-DFS with HDFS for a project. I have to modify the read and > write functions of fuse_dfs. I have few questions regarding the > fuse_dfs_read code. There is an rdbuffer_size variable associated with the > dfs_context, which is by default initialized to 10M. What is this > rdbuffer_size and what is it used for? > > Secondly, in the fuse_dfs_read function, there are two places where > hdfsPread() is called in a loop. First, there is a check for whether the > requested read size is greater than the value of rdbuffer_size. Only if it > is, is the hdfsPread executed. In this case, the data is read into the > buffer passed from the caller. > > In the second case, hdfsPread is executed for if a valid buffer is > associated with the dfs file handle fh and the size and offset of read > request lie within the range of the fh->buf. In this case, the data is read > into fh->buf. > > Could someone explain what is happening here? > > Thanks, > Aastha. > > -- > Aastha Mehta > B.E. (Hons.) Computer Science > BITS Pilani > E-mail: aasth...@gmail.com
smime.p7s
Description: S/MIME cryptographic signature