[ https://issues.apache.org/jira/browse/HIVE-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990523#comment-14990523 ]
Ashutosh Chauhan commented on HIVE-12312: ----------------------------------------- [~cartershanklin] Thanks for the patch. Patch needs to be name as per convention : https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing for Hive QA to trigger QA run on it. > Excessive logging in PPD code > ----------------------------- > > Key: HIVE-12312 > URL: https://issues.apache.org/jira/browse/HIVE-12312 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.2.1 > Reporter: Carter Shanklin > Priority: Minor > Attachments: ppd_debug.patch > > > One of my very complex queries takes about 14 minutes to compile with PPD on. > Profiling it I saw a lot of time spent in this stack which is called many > many thousands of times. > {code} > java.lang.Throwable.getStackTraceElement(-2) > java.lang.Throwable.getOurStackTrace(827) > java.lang.Throwable.getStackTrace(816) > sun.reflect.GeneratedMethodAccessor5.invoke(-1) > sun.reflect.DelegatingMethodAccessorImpl.invoke(43) > java.lang.reflect.Method.invoke(497) > org.apache.log4j.spi.LocationInfo.<init>(139) > org.apache.log4j.spi.LoggingEvent.getLocationInformation(253) > org.apache.log4j.helpers.PatternParser$LocationPatternConverter.convert(500) > org.apache.log4j.helpers.PatternConverter.format(65) > org.apache.log4j.PatternLayout.format(506) > org.apache.log4j.WriterAppender.subAppend(310) > org.apache.log4j.DailyRollingFileAppender.subAppend(369) > org.apache.log4j.WriterAppender.append(162) > org.apache.log4j.AppenderSkeleton.doAppend(251) > org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(66) > org.apache.log4j.Category.callAppenders(206) > org.apache.log4j.Category.forcedLog(391) > org.apache.log4j.Category.log(856) > org.apache.commons.logging.impl.Log4JLogger.info(176) > org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.logExpr(707) > org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.mergeWithChildrenPred(752) > org.apache.hadoop.hive.ql.ppd.OpProcFactory$FilterPPD.process(437) > {code} > logExpr is set to log at INFO level, but I think DEBUG is more appropriate. > When I set log level to debug I see > 20% speedup in compile time: > Before: > {code} > real 14m47.972s > user 15m25.609s > sys 0m20.282s > {code} > After: > {code} > real 11m30.946s > user 12m10.870s > sys 0m7.320s > {code} > It looks like there's a lot of stuff in the PPD code that could be optimized, > when I turn PPD off the query compiles in 2m 30s. But this seems like an easy > and low risk win. -- This message was sent by Atlassian JIRA (v6.3.4#6332)