[ https://issues.apache.org/jira/browse/HIVE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585177#comment-13585177 ]
Ashutosh Chauhan commented on HIVE-4044: ---------------------------------------- URL is an unusual type to add in query processing engines. Can you spec out whats the motivation of adding this type (e.g. you can always use string type for urls). I am assuming from your description above that it might result in storage efficiency by having better encoding of urls. But, I see in LazyBinaryURL following comment /** * The serialization of LazyBinaryURL is the same as the binary representation * of the underlying string */ and also URLWritable has {code} @Override public void write(DataOutput out) throws IOException { if (url != null) { byte[] bytes = url.toString().getBytes(); WritableUtils.writeVInt(out, bytes.length); out.write(bytes); } else { WritableUtils.writeVInt(out, 0); } } {code} So, it seems like you are storing urls as string anyways both for intermediate data of MR as well as output of query. So, I don't see how is it resulting in better storage efficiency. > Add URL type > ------------ > > Key: HIVE-4044 > URL: https://issues.apache.org/jira/browse/HIVE-4044 > Project: Hive > Issue Type: Improvement > Reporter: Samuel Yuan > Assignee: Samuel Yuan > Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch > > > Having a separate type for URLs would enable improvements in storage > efficiency based on breaking up a URL into its components. The new type will > be named "URL" and made a non-reserved keyword (see HIVE-701). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira