[jira] [Commented] (HIVE-8347) Use base-64 encoding instead of custom encoding for serialized objects

Mariappan Asokan (JIRA) Fri, 03 Oct 2014 17:57:07 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158830#comment-14158830
 ]


Mariappan Asokan commented on HIVE-8347:
----------------------------------------

Another idea comes to my mind.  Can the {{HCatUtils.serialize()}} be modified 
to compress before encoding similar to what was done to {{InputJobInfo}} 
instances?  In other words, the fix in HCATALOG-453 can be applied to all 
objects.  Suggestions welcome.


> Use base-64 encoding instead of custom encoding for serialized objects
> ----------------------------------------------------------------------
>
>                 Key: HIVE-8347
>                 URL: https://issues.apache.org/jira/browse/HIVE-8347
>             Project: Hive
>          Issue Type: Improvement
>          Components: HCatalog
>    Affects Versions: 0.13.1
>            Reporter: Mariappan Asokan
>         Attachments: HIVE-8347.patch
>
>
> Serialized objects that are shipped via Hadoop {{Configuration}} are encoded 
> using custom encoding (see {{HCatUtil.encodeBytes()}} and its complement 
> {{HCatUtil.decodeBytes()}}) which has 100% overhead.  In other words, each 
> byte in the serialized object becomes 2 bytes after encoding.  Perhaps, this 
> might be one of the reasons for the problem reported in HCATALOG-453.  The 
> patch for HCATALOG-453 compressed serialized {{InputJobInfo}} objects to 
> solve the problem.
> By using Base64 encoding, the overhead will be reduced to about 33%.  This 
> will alleviate the problem for all serialized objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8347) Use base-64 encoding instead of custom encoding for serialized objects

Reply via email to