Siddharth Seth created HIVE-14168:
-------------------------------------

             Summary: Avoid serializing all parameters from HiveConf.java into 
in-memory HiveConf instances
                 Key: HIVE-14168
                 URL: https://issues.apache.org/jira/browse/HIVE-14168
             Project: Hive
          Issue Type: Bug
            Reporter: Siddharth Seth
            Priority: Critical


All non-null parameters from HiveConf.java are explicitly set in each HiveConf 
instance.
{code}
// Overlay the ConfVars. Note that this ignores ConfVars with null values
    addResource(getConfVarInputStream());
{code}

This unnecessarily bloats each Configuration object - 400+ conf variables being 
set instead of probably <30 which would exist in hive-site.xml.

Looking at a HS2 heapdump - HiveConf is almost always the largest component by 
a long way. Conf objects are also serialized very often - transmitting lots of 
unneeded variables (serialized Hive conf is typically 1000+ variables - due to 
Hadoop injecting it's configs into every config instance).

As long as HiveConf.get() is the approach used to read from a config - this is 
avoidable. Hive code itself should be doing this.

This would be a potentially incompatible change for UDFs and other plugins 
which have access to a Configuration object.

I'd suggest turning off the insert by default, and adding a flag to control 
this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to