[jira] [Commented] (HIVE-18553) Support schema evolution in Parquet Vectorization reader

Hive QA (JIRA) Wed, 14 Feb 2018 15:48:06 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-18553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364974#comment-16364974
 ]


Hive QA commented on HIVE-18553:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12910631/HIVE-18553.91.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 29 failed/errored test(s), 13103 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=78)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=174)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=170)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[results_cache_1]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=160)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_opt_shuffle_serde]
 (batchId=179)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=121)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query1] 
(batchId=250)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.metastore.client.TestFunctions.testGetFunctionNullDatabase[Embedded]
 (batchId=205)
org.apache.hadoop.hive.metastore.client.TestTablesGetExists.testGetAllTablesCaseInsensitive[Embedded]
 (batchId=205)
org.apache.hadoop.hive.metastore.client.TestTablesList.testListTableNamesByFilterNullDatabase[Embedded]
 (batchId=205)
org.apache.hadoop.hive.ql.TestAcidOnTez.testGetSplitsLocks (batchId=224)
org.apache.hadoop.hive.ql.TestTxnNoBucketsVectorized.testInsertFromUnion 
(batchId=280)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=187)
org.apache.hive.hcatalog.listener.TestDbNotificationListener.alterIndex 
(batchId=242)
org.apache.hive.hcatalog.listener.TestDbNotificationListener.createIndex 
(batchId=242)
org.apache.hive.hcatalog.listener.TestDbNotificationListener.dropIndex 
(batchId=242)
org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd 
(batchId=235)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestTriggersMoveWorkloadManager.testTriggerMoveConflictKill
 (batchId=235)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/9218/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/9218/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-9218/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 29 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12910631 - PreCommit-HIVE-Build

> Support schema evolution in Parquet Vectorization reader
> --------------------------------------------------------
>
>                 Key: HIVE-18553
>                 URL: https://issues.apache.org/jira/browse/HIVE-18553
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0, 2.4.0, 2.3.2
>            Reporter: Vihang Karajgaonkar
>            Assignee: Ferdinand Xu
>            Priority: Major
>         Attachments: HIVE-18553.10.patch, HIVE-18553.11.patch, 
> HIVE-18553.2.patch, HIVE-18553.3.patch, HIVE-18553.4.patch, 
> HIVE-18553.5.patch, HIVE-18553.6.patch, HIVE-18553.7.patch, 
> HIVE-18553.8.patch, HIVE-18553.9.patch, HIVE-18553.91.patch, 
> HIVE-18553.patch, test_result_based_on_HIVE-18553.xlsx
>
>
> For schema evolution, it includes the following points:
> 1. column changes
>     column reorder
>     column add, column delete
>     column rename
> 2. type conversion
>     low precision to high precision
>     type to String
> For 1st type, current the code is not supporting the column addition 
> operation. Detailed error is as follows:
> {code}
> 0: jdbc:hive2://localhost:10000/default> desc test_p;
> +-----------+------------+----------+
> | col_name  | data_type  | comment  |
> +-----------+------------+----------+
> | t1        | tinyint    |          |
> | t2        | tinyint    |          |
> | i1        | int        |          |
> | i2        | int        |          |
> +-----------+------------+----------+
> 0: jdbc:hive2://localhost:10000/default> set hive.fetch.task.conversion=none;
> 0: jdbc:hive2://localhost:10000/default> set 
> hive.vectorized.execution.enabled=true;
> 0: jdbc:hive2://localhost:10000/default> alter table test_p add columns (ts 
> timestamp);
> 0: jdbc:hive2://localhost:10000/default> select * from test_p;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
> {code}
> Following exception is seen in the logs
> {code}
> Caused by: java.lang.IllegalArgumentException: [ts] BINARY is not in the 
> store: [[i1] INT32, [i2] INT32, [t1] INT32, [t2] INT32] 3
>         at 
> org.apache.parquet.hadoop.ColumnChunkPageReadStore.getPageReader(ColumnChunkPageReadStore.java:160)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.buildVectorizedParquetReader(VectorizedParquetRecordReader.java:479)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:432)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:393)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:345)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:88)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:167)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:52)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:229)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:142)
>  ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>         at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
>  ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) 
> ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) 
> ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459) 
> ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
>  ~[hadoop-mapreduce-client-common-3.0.0-alpha3-cdh6.x-SNAPSHOT.jar:?]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_121]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_121]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_121]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_121]
>         at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_121]
> {code}
> For 2nd type operation, non Vectorized Parquet reader leverages existing 
> Parquet String inspector to do the conversion while vectorized path does not.
> To support, this JIRA is providing an abstract layer to read the underlying 
> data and convert it to what Hive required for further computing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18553) Support schema evolution in Parquet Vectorization reader

Reply via email to