[ https://issues.apache.org/jira/browse/HIVE-28650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902076#comment-17902076 ]
Butao Zhang edited comment on HIVE-28650 at 12/1/24 3:33 AM: ------------------------------------------------------------- [~glapark] Thanks for your testing! {code:java} ORC 2.0.3 actually executes more S3 operations than ORC 1.9.4, which is a bit surprising.{code} I have not explored more about the Hadoop Vectored IO feature. But the test result that the number of *GetObject* in 2.0.3 is more than 1.9.4 also surprised me, too. I found this Hadoop Vectored IO doc link [https://docs.google.com/presentation/d/1U5QRN4etbM7gkbnGO3OW4sCfUZx9LqJN/] which was written by [~ste...@apache.org] . It would be great if [~ste...@apache.org] can give some guidance. was (Author: zhangbutao): [~glapark] Thanks for your testing! {code:java} ORC 2.0.3 actually executes more S3 operations than ORC 1.9.4, which is a bit surprising.{code} I have not explored more about the Hadoop Vectored IO feature. But the test reult that GetObject in 2.0.3 is more than 1.9.4 also surprised me, too. I found this Hadoop Vectored IO doc link [https://docs.google.com/presentation/d/1U5QRN4etbM7gkbnGO3OW4sCfUZx9LqJN/] which was written by [~ste...@apache.org] . It would be great if [~ste...@apache.org] can give some guidance. > Upgrade Apache ORC version to 2.0.3 > ----------------------------------- > > Key: HIVE-28650 > URL: https://issues.apache.org/jira/browse/HIVE-28650 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) > Reporter: Butao Zhang > Priority: Major > > ORC 2.0.x version added the Hadoop Vectored IO feature in ORC-1251. > We can try to upgrade ORC to latest version 2.0.x to make this feature work > in Hive. > But ORC 2.0.x is built on JDK17+, so we need to upgrade Hive jdk to 17+ > first. This depends on this ticket HIVE-26473 upgrading jdk17. -- This message was sent by Atlassian Jira (v8.20.10#820010)