Bhalchandra Pandit created HADOOP-18028:
-------------------------------------------

             Summary: improve S3 read speed using prefetching & caching
                 Key: HADOOP-18028
                 URL: https://issues.apache.org/jira/browse/HADOOP-18028
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs/s3
            Reporter: Bhalchandra Pandit


I work for Pinterest. I developed a technique for vastly improving read 
throughput when reading from the S3 file system. It not only helps the 
sequential read case (like reading a SequenceFile) but also significantly 
improves read throughput of a random access case (like reading Parquet). This 
technique has been very useful in significantly improving efficiency of the 
data processing jobs at Pinterest. 
 
I would like to contribute that feature to Apache Hadoop. More details on this 
technique are available in this blog I wrote recently:
[https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to