Reading files from HDFS has different performance characteristics than reading local files. For one thing, HDFS does a few megabyes of readahead internally by default. If you are going to make a performance improvement suggestion, I would strongly encourage you to test it first.
cheers, Colin On Tue, Dec 15, 2015 at 2:22 PM, dam6923 . <dam6...@gmail.com> wrote: > Here was the justification from 2004: > > https://bugs.openjdk.java.net/browse/JDK-4953311 > > > Also, some research into the matter (not my own): > > http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly > > One of the conclusions: > > "Minimize I/O operations by reading an array at a time, not a byte at > a time. An 8Kbyte array is a good size." > > > On Tue, Dec 15, 2015 at 3:41 PM, Colin McCabe <cmcc...@alumni.cmu.edu> wrote: >> Hi David, >> >> Do you have benchmarks to justify changing this configuration? >> >> best, >> Colin >> >> On Wed, Dec 9, 2015 at 8:05 AM, dam6923 . <dam6...@gmail.com> wrote: >>> Hello! >>> >>> A while back, Java 1.6, the size of the internal internal file-reading >>> buffers were bumped-up to 8192 bytes. >>> >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/BufferedInputStream.java >>> >>> Perhaps it's time to update Hadoop to at least this default level too. :) >>> >>> https://issues.apache.org/jira/browse/HADOOP-2705 >>> >>> Thanks, >>> David