Colin, I will continue my investigation into the matter. Thanks. I will just point out that org.apache.hadoop.hdfs.server.datanode.BlockSender overwrites this value with a 64KB value, if necessary. Line 116.
--------------------- On a side note, can you explain the purpose of: org.apache.hadoop.hdfs.DFSUtilClient.getSmallBufferSize(Configuration) This method seems to be an undocumented "feature" that overrides the user's configuration but does not explain the reason. It appears that in most of the cases, this method is used when creating a buffer for sending small messages between data-nodes. If that is the case, I would think that the message size should be the greatest consideration in setting a buffer size, not the value specified in the user's variable. For maintainability and predictability, I would think a hard-coded 512 would be most appropriate, or simply use the default buffer size in BufferedOutputStream/BufferedInputStream The one notable exception I see is in: org.apache.hadoop.hdfs.server.datanode.DataNode.DataTransfer.run() - line 2261 It appears that the OutputStream used for sending blocks is using this smaller buffer size to send entire data blocks, but no comment exists to indicate why this smaller buffer is utilized instead of the size configured by the user. Thanks! On Fri, Dec 18, 2015 at 9:59 PM, Colin McCabe <cmcc...@alumni.cmu.edu> wrote: > Reading files from HDFS has different performance characteristics than > reading local files. For one thing, HDFS does a few megabyes of > readahead internally by default. If you are going to make a > performance improvement suggestion, I would strongly encourage you to > test it first. > > cheers, > Colin > > > On Tue, Dec 15, 2015 at 2:22 PM, dam6923 . <dam6...@gmail.com> wrote: >> Here was the justification from 2004: >> >> https://bugs.openjdk.java.net/browse/JDK-4953311 >> >> >> Also, some research into the matter (not my own): >> >> http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly >> >> One of the conclusions: >> >> "Minimize I/O operations by reading an array at a time, not a byte at >> a time. An 8Kbyte array is a good size." >> >> >> On Tue, Dec 15, 2015 at 3:41 PM, Colin McCabe <cmcc...@alumni.cmu.edu> wrote: >>> Hi David, >>> >>> Do you have benchmarks to justify changing this configuration? >>> >>> best, >>> Colin >>> >>> On Wed, Dec 9, 2015 at 8:05 AM, dam6923 . <dam6...@gmail.com> wrote: >>>> Hello! >>>> >>>> A while back, Java 1.6, the size of the internal internal file-reading >>>> buffers were bumped-up to 8192 bytes. >>>> >>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/BufferedInputStream.java >>>> >>>> Perhaps it's time to update Hadoop to at least this default level too. :) >>>> >>>> https://issues.apache.org/jira/browse/HADOOP-2705 >>>> >>>> Thanks, >>>> David