Gunther Hagleitner created HIVE-6262:
----------------------------------------

             Summary: Remove unnecessary copies of schema + table desc from 
serialized plan
                 Key: HIVE-6262
                 URL: https://issues.apache.org/jira/browse/HIVE-6262
             Project: Hive
          Issue Type: Bug
            Reporter: Gunther Hagleitner
            Assignee: Gunther Hagleitner


Currently for a partitioned table the following are true:

- for each partitiondesc we send a copy of the corresponding tabledesc
- for each partitiondesc we send two copies of the schema (in different 
formats).

Obviously we need to send different schemas if they are required by schema 
evolution, but in our case we'll always end up with multiple copies.

The effect can be dramatic. The reductions by removing those on partitioned 
tables easily be can be 8-10x in size. Plans themselves can be 10s to 100s of 
mb (even with kryo). The size difference also plays out in every task on the 
cluster we run.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to