[ https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293674#comment-17293674 ]
Robbie Zhang commented on HIVE-24839: ------------------------------------- We can see such backtrace in HS2 log file: {code:java} java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.getRangeWidth(UDFSubstr.java:177) at org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.estimate(UDFSubstr.java:156) at org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExpression(StatsUtils.java:1576) at org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExprMap(StatsUtils.java:1435) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$SelectStatsRule.process(StatsRulesProcFactory.java:197) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143) at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78) at org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:447) at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:185) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:158) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12823) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:422) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:598) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:544) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:538) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:274) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} The expression "substr(t0.s, t1.i-1)" has a nested function. The second parameter of substr is actually GenericUDFOPMinus. The ColStatistics on it doesn't have a valid range. But getRangeWidth doesn't check it: {code:java} private Optional<Double> getRangeWidth(Range range) { if (range.minValue != null && range.maxValue != null) { return Optional.of(range.maxValue.doubleValue() - range.minValue.doubleValue()); } return Optional.empty(); } {code} Only 4 UDF classes implement StatEstimatorProvider and only UDFSubstr has this bug. > SubStrStatEstimator.estimate throws NullPointerException > -------------------------------------------------------- > > Key: HIVE-24839 > URL: https://issues.apache.org/jira/browse/HIVE-24839 > Project: Hive > Issue Type: Bug > Reporter: Robbie Zhang > Assignee: Robbie Zhang > Priority: Major > > This issue can be reproduced by running the following queries: > {code:java} > create table t0 (s string); > create table t1 (s string, i int); > insert into t0 select "abc"; > insert into t1 select "abc", 4; > select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s; > {code} > The select query fails with error: > {code:java} > Error: Error while compiling statement: FAILED: NullPointerException null > (state=42000,code=40000) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)