[ https://issues.apache.org/jira/browse/HIVE-18412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319248#comment-16319248 ]
Benjamin BONNET commented on HIVE-18412: ---------------------------------------- Hi [~ekoifman], in our use case, we have 2 tables : a raw table named "raw_table" that is partitionned by date, and an ACID table named "clean_table" containing the same columns as the ones in "raw_table". "clean_table" has 3 buckets. Then, we have a cleansing query that will delete from "clean_table" all rows that exist in a specified partition of "raw_table". Comparison is done on a combination of functional keys (key0,key1,key2 and key3). That job is made using 3 reducers (enforced by a parameter set before executing the query). Here is how it looks like : {code} set mapred.reduce.tasks=3; DELETE FROM clean_table WHERE concat(CASE WHEN key0 IS NULL THEN '' ELSE CAST(key0 AS STRING) END,'#',CASE WHEN key1 IS NULL THEN '' ELSE CAST(key1 AS STRING) END,'#',CASE WHEN key2 IS NULL THEN '' ELSE CAST(key2 AS STRING) END,'#',CASE WHEN key3 IS NULL THEN '' ELSE CAST(key3 AS STRING) END) IN ( SELECT concat(CASE WHEN mt.key0 IS NULL THEN '' ELSE CAST(mt.key0 AS STRING) END,'#',CASE WHEN mt.key1 IS NULL THEN '' ELSE CAST(mt.key1 AS STRING) END,'#',CASE WHEN mt.key2 IS NULL THEN '' ELSE CAST(mt.key2 AS STRING) END,'#',CASE WHEN mt.key3 IS NULL THEN '' ELSE CAST(mt.key3 AS STRING) END) FROM clean_table clean LEFT SEMI JOIN( SELECT concat(CASE WHEN key0 IS NULL THEN '' ELSE CAST(key0 AS STRING) END,'#',CASE WHEN key1 IS NULL THEN '' ELSE CAST(key1 AS STRING) END,'#',CASE WHEN key2 IS NULL THEN '' ELSE CAST(key2 AS STRING) END,'#',CASE WHEN key3 IS NULL THEN '' ELSE CAST(key3 AS STRING) END) AS key FROM raw_table WHERE (year='2017' AND month='01' AND day='01' AND INPUT__FILE__NAME like '%20170101%') AND 1=1) raw ON concat(CASE WHEN clean.key0 IS NULL THEN '' ELSE CAST(clean.key0 AS STRING) END,'#',CASE WHEN clean.key1 IS NULL THEN '' ELSE CAST(clean.key1 AS STRING) END,'#',CASE WHEN clean.key2 IS NULL THEN '' ELSE CAST(clean.key2 AS STRING) END,'#',CASE WHEN clean.key3 IS NULL THEN '' ELSE CAST(clean.key3 AS STRING) END) = raw.key ); {code} Execution plan confirms a multifile sprayer is used to run that request. > FileSinkOperator thows NullPointerException > -------------------------------------------- > > Key: HIVE-18412 > URL: https://issues.apache.org/jira/browse/HIVE-18412 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions > Environment: HDP2.6.1, Hive 1.2.1 > Reporter: Benjamin BONNET > Priority: Blocker > > Hi, > while executing a query (DELETE with a join) on an ACID table, I get a > NullPointerException in reducer. > See stack trace below. > According to FileSinkOperator source code, it seems that buckepMap transient > field is Null. > In my opinion, the only circumstance in which this field may be null is when > the involved FileSinkOperator has been serialized and then deserialized. > Actually, deserialization lets that transient reference uninitialized. > I checked source code for more recent versions (including Hive 2.x) but > everywhere that field may remain uninitialized (if FileSinkOperator is > serialized/deserialized). So I think that issue may concern any version of > Hive. > ERROR : Vertex failed, vertexName=Reducer 3, > vertexId=vertex_1513704146031_77754_2_05, diagnostics=[Task failed, > taskId=task_1513704146031_77754_2_05_000000, diagnostics=[TaskAttempt 0 > failed, info=[Error: Failure while running task:java > .lang.RuntimeException: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":{"transactionid":108117,"bucketid":0,"rowid":1114}},"value":{" > _col0":"2017","_col1":"10"}} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) > {"key":{"reducesinkkey0":{"transactionid":108117,"bucketid":0,"rowid":1114}},"value":{"_col0":"2017" > ,"_col1":"10"}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > ... 14 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row (tag=0) > {"key":{"reducesinkkey0":{"transactionid":108117,"bucketid":0,"rowid":1114}},"value":{"_col0":"2017","_col1":"10"}} > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274) > ... 16 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:830) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:758) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at > org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343) > ... 17 more > ], TaskAttempt 1 failed, info=[Error: Failure while running > task:java.lang.RuntimeException: java.lang.RuntimeException: .... etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)