[ https://issues.apache.org/jira/browse/HIVE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681573#comment-13681573 ]
Samuel Yuan commented on HIVE-4044: ----------------------------------- I tried breaking the URL into parts and encoding them as individual columns; the dictionary shrunk, but the overhead of the other ORC columns introduced (mostly the column of indices) made a bigger impact, so compression was actually worse overall. I also tried storing the query string as a map and putting common keys into separate columns; this improved compression somewhat, but still not enough to offset the overhead of new columns for the query string. > Add URL type > ------------ > > Key: HIVE-4044 > URL: https://issues.apache.org/jira/browse/HIVE-4044 > Project: Hive > Issue Type: Improvement > Reporter: Samuel Yuan > Assignee: Samuel Yuan > Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch > > > Having a separate type for URLs would enable improvements in storage > efficiency based on breaking up a URL into its components. The new type will > be named "URL" and made a non-reserved keyword (see HIVE-701). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira