Siddharth Seth created HIVE-14168: ------------------------------------- Summary: Avoid serializing all parameters from HiveConf.java into in-memory HiveConf instances Key: HIVE-14168 URL: https://issues.apache.org/jira/browse/HIVE-14168 Project: Hive Issue Type: Bug Reporter: Siddharth Seth Priority: Critical
All non-null parameters from HiveConf.java are explicitly set in each HiveConf instance. {code} // Overlay the ConfVars. Note that this ignores ConfVars with null values addResource(getConfVarInputStream()); {code} This unnecessarily bloats each Configuration object - 400+ conf variables being set instead of probably <30 which would exist in hive-site.xml. Looking at a HS2 heapdump - HiveConf is almost always the largest component by a long way. Conf objects are also serialized very often - transmitting lots of unneeded variables (serialized Hive conf is typically 1000+ variables - due to Hadoop injecting it's configs into every config instance). As long as HiveConf.get() is the approach used to read from a config - this is avoidable. Hive code itself should be doing this. This would be a potentially incompatible change for UDFs and other plugins which have access to a Configuration object. I'd suggest turning off the insert by default, and adding a flag to control this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)