[ https://issues.apache.org/jira/browse/HIVE-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580610#comment-14580610 ]
Sergio Peña commented on HIVE-9868: ----------------------------------- I have the following error when vectorization is enabled for parquet tables: {noformat} hive> select count(*) from tpcds_10_parquet.store_sales; Query ID = hduser_20150610145517_ea534f33-d196-4bda-a810-27b1bc89d003 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1430340087154_0339, Tracking URL = http://ip-10-129-3-89:8088/proxy/application_1430340087154_0339/ Kill Command = /opt/local/hadoop/bin/hadoop job -kill job_1430340087154_0339 Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1 2015-06-10 14:55:22,351 Stage-1 map = 0%, reduce = 0% 2015-06-10 14:55:50,331 Stage-1 map = 100%, reduce = 100% Ended Job = job_1430340087154_0339 with errors Error during job, obtaining debugging information... Examining task ID: task_1430340087154_0339_m_000001 (and more) from job job_1430340087154_0339 Examining task ID: task_1430340087154_0339_m_000004 (and more) from job job_1430340087154_0339 Task with the most failures(4): ----- Task ID: task_1430340087154_0339_m_000004 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1430340087154_0339&tipid=task_1430340087154_0339_m_000004 ----- Diagnostic Messages for this Task: Error: java.io.IOException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:227) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:137) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:106) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:42) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:225) ... 11 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:214) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:119) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 15 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 5 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec {noformat} The table description is: {noformat} hive> desc tpcds_10_parquet.store_sales; OK ss_sold_date_sk int ss_sold_time_sk int ss_item_sk int ss_customer_sk int ss_cdemo_sk int ss_hdemo_sk int ss_addr_sk int ss_store_sk int ss_promo_sk int ss_ticket_number int ss_quantity int ss_wholesale_cost float ss_list_price float ss_sales_price float ss_ext_discount_amt float ss_ext_sales_price float ss_ext_wholesale_cost float ss_ext_list_price float ss_ext_tax float ss_coupon_amt float ss_net_paid float ss_net_paid_inc_tax float ss_net_profit float {noformat} > Turn on Parquet vectorization in parquet branch > ----------------------------------------------- > > Key: HIVE-9868 > URL: https://issues.apache.org/jira/browse/HIVE-9868 > Project: Hive > Issue Type: Sub-task > Affects Versions: parquet-branch > Reporter: Dong Chen > Assignee: Dong Chen > Attachments: HIVE-9868-parquet.patch > > > Parquet vectorization was turned off in HIVE-9235 due to data types issue. As > the vectorization refactor work is starting in HIVE-8128 on parquet branch, > let's turn on it on branch at first. The data types will be handled in > refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)