[ 
https://issues.apache.org/jira/browse/HIVE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585177#comment-13585177
 ] 

Ashutosh Chauhan commented on HIVE-4044:
----------------------------------------

URL is an unusual type to add in query processing engines. Can you spec out 
whats the motivation of adding this type (e.g. you can always use string type 
for urls). I am assuming from your description above that it might result in 
storage efficiency by having better encoding of urls. But, I see in 
LazyBinaryURL following comment
/**
 * The serialization of LazyBinaryURL is the same as the binary representation
 * of the underlying string
 */
and also URLWritable has
{code}
 @Override
  public void write(DataOutput out) throws IOException {
    if (url != null) {
      byte[] bytes = url.toString().getBytes();
      WritableUtils.writeVInt(out, bytes.length);
      out.write(bytes);
    } else {
      WritableUtils.writeVInt(out, 0);
    }
  }
{code}

So, it seems like you are storing urls as string anyways both for intermediate 
data of MR as well as output of query. So, I don't see how is it resulting in 
better storage efficiency. 
                
> Add URL type
> ------------
>
>                 Key: HIVE-4044
>                 URL: https://issues.apache.org/jira/browse/HIVE-4044
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Samuel Yuan
>            Assignee: Samuel Yuan
>         Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch
>
>
> Having a separate type for URLs would enable improvements in storage 
> efficiency based on breaking up a URL into its components. The new type will 
> be named "URL" and made a non-reserved keyword (see HIVE-701).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to