[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Wilfong updated HIVE-4340: -------------------------------- Status: Patch Available (was: Open) > ORC should provide raw data size > -------------------------------- > > Key: HIVE-4340 > URL: https://issues.apache.org/jira/browse/HIVE-4340 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Affects Versions: 0.11.0 > Reporter: Kevin Wilfong > Assignee: Kevin Wilfong > Attachments: HIVE-4340.1.patch.txt > > > ORC's SerDe currently does nothing, and hence does not calculate a raw data > size. WriterImpl, however, has enough information to provide one. > WriterImpl should compute a raw data size for each row, aggregate them per > stripe and record it in the strip information, as RC currently does in its > key header, and allow the FileSinkOperator access to the size per row. > FileSinkOperator should be able to get the raw data size from either the > SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira