We also issue explicit readahead via fadvise since 2011 or so, so the typical io sizes hitting the device are large enough to max out the throughput, at least for typical spinning disks.
Todd On Wed, Mar 11, 2020, 9:46 AM Kihwal Lee <kih...@verizonmedia.com.invalid> wrote: > When Datanode was initially designed, Linux AIO was still early in its > adoption. Kernel support was there and the libraries were almost there. No > java support, of course. We would have to write a lot of native code for > it and use JNI. Also, AIO means bypassing kernel page cache since you are > doing it with O_DIRECT. We would have to implement some sort of block data > caching on our own. > > Another option was to build an async framework in datanode. Instead, the > community chose to use a pool of data transceiver threads to move forward > fast. There are some discussions and efforts to improve this, as the > workload has changed since the early days. However, the current way still > utilizes io schedulers on block devices, so you will see a lot of io > combining happening for typical loads. These are not direct I/O, so > read-ahead do happen and page cache is utilized. > > Kihwal > > > > On Wed, Mar 11, 2020 at 11:18 AM Wei-Chiu Chuang <weic...@apache.org> > wrote: > > > Hi David, > > We talked a bit about a similar topic on DataNode sockets a while back. > Any > > feedback on the DataNode disk access? > > > > On Thu, Mar 5, 2020 at 4:16 PM Mania Abdi <abdi...@husky.neu.edu> wrote: > > > > > Hello everyone > > > > > > I have a question regarding HDFS, data node code version 2.7.2. I have > > > posted my question as Jira issue > > > <https://issues.apache.org/jira/projects/HDFS/issues/HDFS-15206>. > > > > > > I have observed that datanode issues sequential synchronous 64KB reads > to > > > local disk and add then send it to user and wait for the > acknowledgement > > > from the user. I was wondering why HDFS community did not use file > > mapping > > > or asynchronous read from disk? This could allow disk scheduler to > > perform > > > sequential reads from disk or perform read-ahead and prefetching. Is it > > > something that could lead to performance improvement or not. > > > > > > I would appreciate if you could help me to find the answer to this > issue > > > from Hadoop community > > > perspective. > > > > > > I asked from apache members and they told me that the version I am > > pointing > > > to is old and this part of code written from scratch for modern SSDs. > > Could > > > you please help me to find at which version this modification happened? > > and > > > Where I can find it. > > > > > > Many thanks > > > Mania > > > > > >