[
https://issues.apache.org/jira/browse/HIVE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681573#comment-13681573
]
Samuel Yuan commented on HIVE-4044:
-----------------------------------
I tried breaking the URL into parts and encoding them as individual columns;
the dictionary shrunk, but the overhead of the other ORC columns introduced
(mostly the column of indices) made a bigger impact, so compression was
actually worse overall. I also tried storing the query string as a map and
putting common keys into separate columns; this improved compression somewhat,
but still not enough to offset the overhead of new columns for the query string.
> Add URL type
> ------------
>
> Key: HIVE-4044
> URL: https://issues.apache.org/jira/browse/HIVE-4044
> Project: Hive
> Issue Type: Improvement
> Reporter: Samuel Yuan
> Assignee: Samuel Yuan
> Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch
>
>
> Having a separate type for URLs would enable improvements in storage
> efficiency based on breaking up a URL into its components. The new type will
> be named "URL" and made a non-reserved keyword (see HIVE-701).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira