[
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740052#comment-13740052
]
Owen O'Malley commented on HIVE-5091:
-------------------------------------
This patch:
* Adds a new table property orc.block.padding, which defaults to true.
* For stripes smaller than a block, if they would straddle the block boundary,
zeros are written to get to the start of the next block.
* The max block size is set to 1.5GB since 2GB - 1 created issues with
blocksizes needing to be divisible by the checksum length (512).
* Cleans up the interface to the OrcFile.createWriter so that the user can set
parameters by name.
* Cleans up the ability to write the 0.11 version of ORC files that was added
in HIVE-4123. Ensures that the direct string encoding isn't used for 0.11 ORC
files.
* Updated most of the tests to use the new createWriter API.
> ORC files should have an option to pad stripes to the HDFS block boundaries
> ---------------------------------------------------------------------------
>
> Key: HIVE-5091
> URL: https://issues.apache.org/jira/browse/HIVE-5091
> Project: Hive
> Issue Type: Bug
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Attachments: HIVE-5091.D12249.1.patch
>
>
> With ORC stripes being large, if a stripe straddles an HDFS block, the
> locality of read is suboptimal. It would be good to add padding to ensure
> that stripes don't straddle HDFS blocks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira