Reading files from HDFS has different performance characteristics than
reading local files.  For one thing, HDFS does a few megabyes of
readahead internally by default.  If you are going to make a
performance improvement suggestion, I would strongly encourage you to
test it first.

cheers,
Colin


On Tue, Dec 15, 2015 at 2:22 PM, dam6923 . <dam6...@gmail.com> wrote:
> Here was the justification from 2004:
>
> https://bugs.openjdk.java.net/browse/JDK-4953311
>
>
> Also, some research into the matter (not my own):
>
> http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly
>
> One of the conclusions:
>
> "Minimize I/O operations by reading an array at a time, not a byte at
> a time. An 8Kbyte array is a good size."
>
>
> On Tue, Dec 15, 2015 at 3:41 PM, Colin McCabe <cmcc...@alumni.cmu.edu> wrote:
>> Hi David,
>>
>> Do you have benchmarks to justify changing this configuration?
>>
>> best,
>> Colin
>>
>> On Wed, Dec 9, 2015 at 8:05 AM, dam6923 . <dam6...@gmail.com> wrote:
>>> Hello!
>>>
>>> A while back, Java 1.6, the size of the internal internal file-reading
>>> buffers were bumped-up to 8192 bytes.
>>>
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/BufferedInputStream.java
>>>
>>> Perhaps it's time to update Hadoop to at least this default level too. :)
>>>
>>> https://issues.apache.org/jira/browse/HADOOP-2705
>>>
>>> Thanks,
>>> David

Reply via email to