Not sure about EMR. Your best bet is to ask on EMR forums. Thanks, Ashutosh
On Tue, Sep 3, 2013 at 2:18 PM, Nathanial Thelen <n...@natethelen.com>wrote: > Is there a way to run a patch on EMR? > > Thanks, > Nate > > On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <hashut...@apache.org> wrote: > > Fix in very related area has been checked in trunk today : > https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your > issue. > Can you try latest trunk? > > Ashutosh > > > On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <n...@natethelen.com>wrote: > >> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have >> been getting NullPointerExceptions (NPE) for certain queries in our staging >> environment. Only difference between stage and production is the amount of >> traffic we get so the data set is much smaller. We are not using any >> custom code. >> >> I have greatly simplified the query down to the bare minimum that will >> cause the error: >> >> SELECT >> count(DISTINCT ag.adGroupGuid) as groups, >> count(DISTINCT av.adViewGuid) as ads, >> count(DISTINCT ac.adViewGuid) as uniqueClicks >> FROM >> adgroup ag >> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid >> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid >> >> This will return the following before any Map Reduce jobs start: >> >> FAILED: NullPointerException null >> >> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, >> I see this error: >> >> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities >> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for >> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29 >> length: 94324 file count: 20 directory count: 1 >> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities >> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for >> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length: >> 142609 file count: 21 directory count: 1 >> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities >> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for >> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30 >> length: 65519 file count: 21 directory count: 1 >> 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities >> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for >> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length: >> 205096 file count: 20 directory count: 1 >> 2013-09-03 18:09:19,800 INFO >> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer >> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where >> optimization is applicable >> 2013-09-03 18:09:19,801 INFO >> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer >> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table >> scans >> 2013-09-03 18:09:19,801 INFO >> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer >> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where >> optimization is applicable >> 2013-09-03 18:09:19,801 INFO >> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer >> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table >> scans >> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver >> (SessionState.java:printError(386)) - FAILED: NullPointerException null >> java.lang.NullPointerException >> at >> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308) >> at >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) >> at >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124) >> at >> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) >> at >> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175) >> at >> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) >> at >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426) >> at >> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789) >> at >> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) >> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) >> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) >> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) >> at >> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310) >> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231) >> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466) >> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819) >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:187) >> >> The same error also happens if I do an INNER JOIN to adclick, FYI. >> >> I have checked that there are not any null values for any of the columns >> referenced in the query. >> >> Making almost any changes to the query results it in successfully >> running. Here are some I have tried: >> >> Removed JOIN to adgroup: >> >> SELECT >> count(DISTINCT av.adViewGuid) as ads, >> count(DISTINCT ac.adViewGuid) as uniqueClicks >> FROM >> adview av >> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid; >> >> WORKS: >> 561 6 >> >> Removed JOIN to adclick: >> >> SELECT >> count(DISTINCT ag.adGroupGuid) as groups, >> count(DISTINCT av.adViewGuid) as ads >> FROM >> adgroup ag >> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid; >> >> WORKS: >> 543 561 >> >> Removing DISTINCT from any of the 3 counts >> >> SELECT >> count(DISTINCT ag.adGroupGuid) as groups, >> count(DISTINCT av.adViewGuid) as ads, >> count(ac.adViewGuid) as uniqueClicks >> FROM >> adgroup ag >> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid >> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid; >> >> WORKS: >> 543 561 7 >> >> >> SELECT >> count(ag.adGroupGuid) as groups, >> count(DISTINCT av.adViewGuid) as ads, >> count(DISTINCT ac.adViewGuid) as uniqueClicks >> FROM >> adgroup ag >> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid >> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid; >> >> WORKS: >> 562 561 6 >> >> >> SELECT >> count(DISTINCT ag.adGroupGuid) as groups, >> count(av.adViewGuid) as ads, >> count(DISTINCT ac.adViewGuid) as uniqueClicks >> FROM >> adgroup ag >> INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid >> LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid; >> >> WORKS: >> 543 562 6 >> >> I am not exactly sure what to do next. Thoughts? >> >> Nate >> > > >