Uma Maheswara Rao G created HDFS-4190:
-----------------------------------------

             Summary: Read complete block into memory once in BlockScanning and 
reduce concurrent disk access
                 Key: HDFS-4190
                 URL: https://issues.apache.org/jira/browse/HDFS-4190
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: data-node
    Affects Versions: 3.0.0
            Reporter: Uma Maheswara Rao G


When we perform bulk write operations to DFS we observed that block scan is one 
bottleneck for concurrent disk access.

To see real load on disks, keep single data node and local client flushing data 
to DFS.
When we switch off block scanning we have seen >10% improvement. I will update 
real figures in comment.

Even though I am doing only write operation, implicitly there will be a read 
operation for each block due to block scanning. Next scan will happen only 
after 21 days, but once scan will happen after adding the block. This will be 
the concurrent access to disks.
Other point to note is that, we will read the block, packet by packet in block 
scanning as well. We know that, we have to read&scan complete block, so, it may 
be correct to load complete block once and do checksums verification for that 
data?

I tried with MemoryMappedBuffers:
mapped the complete block once in blockScanning and does the checksum 
verification with that. Seen good improvement in that bulk write scenario.

But we don't have any API to clean the mapped buffer immediately. With my 
experiment I just used, Cleaner class from sun package. That will not be 
correct to use in production. So, we have to write JNI call to clean that 
mmapped buffer.
I am not sure I missed something here. please correct me If i missed some 
points.

Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to