Carter Shanklin created HIVE-12312: -------------------------------------- Summary: Excessive logging in PPD code Key: HIVE-12312 URL: https://issues.apache.org/jira/browse/HIVE-12312 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Carter Shanklin Priority: Minor
One of my very complex queries takes about 14 minutes to compile with PPD on. Profiling it I saw a lot of time spent in this stack which is called many many thousands of times. {code} java.lang.Throwable.getStackTraceElement(-2) java.lang.Throwable.getOurStackTrace(827) java.lang.Throwable.getStackTrace(816) sun.reflect.GeneratedMethodAccessor5.invoke(-1) sun.reflect.DelegatingMethodAccessorImpl.invoke(43) java.lang.reflect.Method.invoke(497) org.apache.log4j.spi.LocationInfo.<init>(139) org.apache.log4j.spi.LoggingEvent.getLocationInformation(253) org.apache.log4j.helpers.PatternParser$LocationPatternConverter.convert(500) org.apache.log4j.helpers.PatternConverter.format(65) org.apache.log4j.PatternLayout.format(506) org.apache.log4j.WriterAppender.subAppend(310) org.apache.log4j.DailyRollingFileAppender.subAppend(369) org.apache.log4j.WriterAppender.append(162) org.apache.log4j.AppenderSkeleton.doAppend(251) org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(66) org.apache.log4j.Category.callAppenders(206) org.apache.log4j.Category.forcedLog(391) org.apache.log4j.Category.log(856) org.apache.commons.logging.impl.Log4JLogger.info(176) org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.logExpr(707) org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.mergeWithChildrenPred(752) org.apache.hadoop.hive.ql.ppd.OpProcFactory$FilterPPD.process(437) {code} logExpr is set to log at INFO level, but I think DEBUG is more appropriate. When I set log level to debug I see > 20% speedup in compile time: Before: {code} real 14m47.972s user 15m25.609s sys 0m20.282s {code} After: {code} real 11m30.946s user 12m10.870s sys 0m7.320s {code} It looks like there's a lot of stuff in the PPD code that could be optimized, when I turn PPD off the query compiles in 2m 30s. But this seems like an easy and low risk win. -- This message was sent by Atlassian JIRA (v6.3.4#6332)