Fix in very related area has been checked in trunk today :
https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix your
issue.
Can you try latest trunk?

Ashutosh


On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <n...@natethelen.com>wrote:

> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have
> been getting NullPointerExceptions (NPE) for certain queries in our staging
> environment.  Only difference between stage and production is the amount of
> traffic we get so the data set is much smaller.  We are not using any
> custom code.
>
> I have greatly simplified the query down to the bare minimum that will
> cause the error:
>
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
>
> This will return the following before any Map Reduce jobs start:
>
> FAILED: NullPointerException null
>
> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I
> see this error:
>
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
> length: 94324 file count: 20 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30 length:
> 142609 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
> length: 65519 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities
> (Utilities.java:getInputSummary(1889)) - Cache Content Summary for
> s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29 length:
> 205096 file count: 20 directory count: 1
> 2013-09-03 18:09:19,800 INFO
>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
> optimization is applicable
> 2013-09-03 18:09:19,801 INFO
>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table
> scans
> 2013-09-03 18:09:19,801 INFO
>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where
> optimization is applicable
> 2013-09-03 18:09:19,801 INFO
>  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
> (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table
> scans
> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver
> (SessionState.java:printError(386)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
>  at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
>  at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
> at
> org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
>  at
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
>  at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>
> The same error also happens if I do an INNER JOIN to adclick, FYI.
>
> I have checked that there are not any null values for any of the columns
> referenced in the query.
>
> Making almost any changes to the query results it in successfully running.
>  Here are some I have tried:
>
> Removed JOIN to adgroup:
>
> SELECT
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adview av
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 561 6
>
> Removed JOIN to adclick:
>
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
>
> WORKS:
> 543 561
>
> Removing DISTINCT from any of the 3 counts
>
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 543 561 7
>
>
> SELECT
>     count(ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 562 561 6
>
>
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
>
> WORKS:
> 543 562 6
>
> I am not exactly sure what to do next.  Thoughts?
>
> Nate
>

Reply via email to