[ https://issues.apache.org/jira/browse/HIVE-23158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078596#comment-17078596 ]
Panagiotis Garefalakis commented on HIVE-23158: ----------------------------------------------- [~rbalamohan] [~prasanth_j] Thoughts? > Optimize S3A recordReader policy for Random IO formats > ------------------------------------------------------ > > Key: HIVE-23158 > URL: https://issues.apache.org/jira/browse/HIVE-23158 > Project: Hive > Issue Type: Bug > Reporter: Panagiotis Garefalakis > Assignee: Panagiotis Garefalakis > Priority: Trivial > Labels: pull-request-available > Attachments: HIVE-23158.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > S3A filesystem client (inherited by Hadoop) supports the notion of input > policies. > These policies tune the behaviour of HTTP requests that are used for reading > different filetypes such as TEXT or ORC. > For formats such as ORC and Parquet do a lot of seek operations, thus there > is an optimized RANDOM mode that reads files only partially instead of fully > (default). > I am suggesting to add some extra logic as part of HiveInputFormat to make > sure we optimize for random IO when data is stored on S3A using formats such > as ORC or Parquet. -- This message was sent by Atlassian Jira (v8.3.4#803005)