Chris Nauroth created HADOOP-19389:
--------------------------------------

             Summary: Optimize shell -text command I/O with multi-byte read.
                 Key: HADOOP-19389
                 URL: https://issues.apache.org/jira/browse/HADOOP-19389
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
            Reporter: Chris Nauroth
            Assignee: Chris Nauroth


{{hadoop fs -text}} reads Avro files and sequence files by internally wrapping 
the stream in 
[{{AvroFileInputStream}}|https://github.com/apache/hadoop/blob/rel/release-3.4.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java#L270]
 or 
[{{TextRecordInputStream}}|https://github.com/apache/hadoop/blob/rel/release-3.4.1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java#L217].
 These classes implement the required single-byte 
[{{read()}}|https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStream.html#read()],
 but not the optional multi-byte buffered [{{read(byte[], int, 
int)}}|https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStream.html#read(byte%5B%5D,int,int)].
 The default implementation in the JDK is a [loop over single-byte 
read|https://github.com/openjdk/jdk11u-dev/blob/a47c72fad455bfdf9053cb8e94c99e73965ab50d/src/java.base/share/classes/java/io/InputStream.java#L280],
 which causes sub-optimal I/O and method call overhead. We can optimize this by 
overriding the multi-byte read method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to