Abdullah Yousufi created HIVE-14165:
---------------------------------------
Summary: Enable faster S3 Split Computation by listing files in
blocks
Key: HIVE-14165
URL: https://issues.apache.org/jira/browse/HIVE-14165
Project: Hive
Issue Type: Improvement
Affects Versions: 2.1.0
Reporter: Abdullah Yousufi
Assignee: Abdullah Yousufi
During split computation when a large of files are required to be listed from
S3 then instead of executing 1 API call per file, one can optimize by listing
1000 files in each API call. Thereby reducing the amount of time required for
listing files.
Qubole has this optimization in place as detailed here:
https://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/?nabe=5695374637924352:0
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)