Bhalchandra Pandit created HADOOP-18028: -------------------------------------------
Summary: improve S3 read speed using prefetching & caching Key: HADOOP-18028 URL: https://issues.apache.org/jira/browse/HADOOP-18028 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Reporter: Bhalchandra Pandit I work for Pinterest. I developed a technique for vastly improving read throughput when reading from the S3 file system. It not only helps the sequential read case (like reading a SequenceFile) but also significantly improves read throughput of a random access case (like reading Parquet). This technique has been very useful in significantly improving efficiency of the data processing jobs at Pinterest. I would like to contribute that feature to Apache Hadoop. More details on this technique are available in this blog I wrote recently: [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0] -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org