[ https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13577438#comment-13577438 ]
Amareshwari Sriramadasu commented on HIVE-4018: ----------------------------------------------- The setup is as follows : We have 7 dimension tables dim1,... dim7. Number of rows in each dimension - 1009530, 3, 227358, 238514, 519, 203841, 47. and the query is {noformat} Select SUM(msr1), SUM(msr2) , .... from fact Left outer join dim1 on fact.d1= dim1.id Left outer join dim2 on dim1.id2 = dim2.id Left outer Join dim3 on fact.d3= dim3.id1 Left outer Join dim4 on dim3.id3= dim4.id4 Left outer join dim5 on dim4.id5= dim5.id Left outer Join dim6 on dim3.id6= dim6.id Left outer Join dim7 on dim6.id7 = dim7.id; {noformat} here is the log of lacal task loading hash tables, I'm seeing an NPE while loading one the tables : {noformat} 2013-02-13 09:04:47 Starting to launch local task to process map join; maximum memory = 1004929024 2013-02-13 09:04:48 Processing rows: 519 Hashtable size: 519 Memory usage: 11845496 rate: 0.012 2013-02-13 09:04:48 Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile21--.hashtable 2013-02-13 09:04:48 Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile21--.hashtable File size: 31191 2013-02-13 09:04:49 Processing rows: 200000 Hashtable size: 199999 Memory usage: 60980296 rate: 0.061 2013-02-13 09:04:54 Processing rows: 200000 Hashtable size: 199999 Memory usage: 156217016 rate: 0.155 2013-02-13 09:05:01 Processing rows: 300000 Hashtable size: 299999 Memory usage: 202205440 rate: 0.201 2013-02-13 09:05:05 Processing rows: 400000 Hashtable size: 399999 Memory usage: 260133024 rate: 0.259 2013-02-13 09:05:10 Processing rows: 500000 Hashtable size: 499999 Memory usage: 293007176 rate: 0.292 2013-02-13 09:05:14 Processing rows: 600000 Hashtable size: 599999 Memory usage: 347795184 rate: 0.346 2013-02-13 09:05:22 Processing rows: 700000 Hashtable size: 699999 Memory usage: 388323912 rate: 0.386 2013-02-13 09:05:28 Processing rows: 800000 Hashtable size: 799999 Memory usage: 453952824 rate: 0.452 2013-02-13 09:05:34 Processing rows: 900000 Hashtable size: 899999 Memory usage: 482001544 rate: 0.48 2013-02-13 09:05:43 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 539703480 rate: 0.537 2013-02-13 09:05:47 Processing rows: 1009530 Hashtable size: 1009530 Memory usage: 530473664 rate: 0.528 2013-02-13 09:05:47 Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile61--.hashtable 2013-02-13 09:06:29 Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile61--.hashtable File size: 148246102 2013-02-13 09:06:31 Processing rows: 258054 Hashtable size: 54213 Memory usage: 111883448 rate: 0.111 2013-02-13 09:06:31 Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile31--.hashtable 2013-02-13 09:06:33 Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile31--.hashtable File size: 4251559 2013-02-13 09:06:34 Processing rows: 258054 Hashtable size: 203841 Memory usage: 72276192 rate: 0.072 2013-02-13 09:06:34 Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile32--.hashtable java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.persistence.MapJoinObjectValue.writeExternal(MapJoinObjectValue.java:138) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1443) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1414) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346) at java.util.HashMap.writeObject(HashMap.java:1018) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:959) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346) at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.flushMemoryCacheToPersistent(HashMapWrapper.java:116) at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.closeOp(HashTableSinkOperator.java:415) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:607) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:616) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.startForward(MapredLocalTask.java:324) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:276) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:677) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) 2013-02-13 09:06:34 Processing rows: 47 Hashtable size: 47 Memory usage: 72554224 rate: 0.072 2013-02-13 09:06:34 Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile11--.hashtable 2013-02-13 09:06:34 Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile11--.hashtable File size: 2908 2013-02-13 09:06:37 Processing rows: 200000 Hashtable size: 199999 Memory usage: 154624680 rate: 0.154 2013-02-13 09:06:38 Processing rows: 227358 Hashtable size: 227358 Memory usage: 165643352 rate: 0.165 2013-02-13 09:06:38 Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile41--.hashtable 2013-02-13 09:06:46 Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile41--.hashtable File size: 34351618 2013-02-13 09:06:47 Processing rows: 3 Hashtable size: 3 Memory usage: 74456192 rate: 0.074 2013-02-13 09:06:47 Dump the hashtable into file: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile51--.hashtable 2013-02-13 09:06:47 Upload 1 File to: file:/tmp/ubuntu/hive_2013-02-13_09-04-35_481_9216097600487630659/-local-10008/HashTable-Stage-19/MapJoin-mapfile51--.hashtable File size: 457 2013-02-13 09:06:47 End of local task; Time Taken: 119.326 sec. Execution completed successfully Mapred Local Task Succeeded . Convert the Join into MapJoin Mapred Local Task Succeeded . Convert the Join into MapJoin {noformat} > MapJoin failing with Distributed Cache error > -------------------------------------------- > > Key: HIVE-4018 > URL: https://issues.apache.org/jira/browse/HIVE-4018 > Project: Hive > Issue Type: Bug > Components: SQL > Affects Versions: 0.11.0 > Reporter: Amareshwari Sriramadasu > Fix For: 0.11.0 > > > When I'm a running a star join query after HIVE-3784, it is failing with > following error: > 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: > Load Distributed Cache Error > 2013-02-13 08:36:04,585 FATAL ExecMapper: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614) > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) > at org.apache.hadoop.mapred.Child$4.run(Child.java:266) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) > at org.apache.hadoop.mapred.Child.main(Child.java:260) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira