[ https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15772993#comment-15772993 ]
Hive QA commented on HIVE-11394: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12844565/HIVE-11394.095.patch {color:green}SUCCESS:{color} +1 due to 159 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 113 failed/errored test(s), 10836 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=234) TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=144) [vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q] TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=146) [load_dyn_part5.q,vector_complex_join.q,orc_llap.q,vectorization_pushdown.q,cbo_gby_empty.q,vectorization_short_regress.q,cbo_gby.q,auto_sortmerge_join_1.q,lineage3.q,cross_product_check_1.q,cbo_join.q,vector_struct_in.q,bucketmapjoin3.q,current_date_timestamp.q,orc_ppd_schema_evol_2a.q,groupby2.q,schema_evol_text_vec_table.q,vectorized_join46.q,orc_ppd_date.q,multiMapJoin1.q,sample10.q,vector_outer_join1.q,vector_char_simple.q,dynpart_sort_optimization_acid.q,auto_sortmerge_join_2.q,bucketizedhiveinputformat.q,leftsemijoin.q,special_character_in_tabnames_1.q,cte_mat_2.q,vectorization_8.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_complex_all] (batchId=55) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_date_funcs] (batchId=69) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part_all_complex] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part_all_primitive] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_table] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_complex] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_primitive] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_table] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_complex_all] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_left_outer_join2] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_left_outer_join] (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_leftsemi_mapjoin] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_mr_diff_schema_alias] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_multi_insert] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_null_projection] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_nullsafe_join] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_number_compare_projection] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_nvl] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_orderby_5] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join0] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join2] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join4] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join5] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_outer_join6] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partition_diff_num_cols] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_reduce1] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_reduce2] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_reduce3] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_reduce_groupby_decimal] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_string_concat] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_4] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_mapjoin1] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_0] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_13] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_14] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_15] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_16] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_17] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_7] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_9] (batchId=138) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_decimal_date] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_part_project] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_bucketmapjoin1] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_case] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_casts] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_context] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_date_funcs] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_distinct_gby] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_dynamic_partition_pruning] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_mapjoin] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_math_funcs] (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_nested_mapjoin] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_ptf] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_shufflejoin] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_string_funcs] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_timestamp] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_timestamp_funcs] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_timestamp_ints_casts] (batchId=147) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_inner_join] (batchId=161) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join0] (batchId=160) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join3] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join4] (batchId=161) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join5] (batchId=161) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] (batchId=118) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_cast_constant] (batchId=98) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_char_4] (batchId=132) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct] (batchId=105) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_data_types] (batchId=127) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_aggregate] (batchId=102) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_mapjoin] (batchId=117) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_distinct_2] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_elt] (batchId=109) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_groupby_3] (batchId=121) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_left_outer_join] (batchId=104) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce] (batchId=128) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_orderby_5] (batchId=111) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_string_concat] (batchId=108) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_varchar_4] (batchId=107) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_0] (batchId=129) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_13] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_14] (batchId=100) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_15] (batchId=121) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_16] (batchId=112) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_17] (batchId=131) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_9] (batchId=94) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_decimal_date] (batchId=129) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_div0] (batchId=123) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_part_project] (batchId=105) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_pushdown] (batchId=114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress] (batchId=114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_case] (batchId=118) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_mapjoin] (batchId=125) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_math_funcs] (batchId=103) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_nested_mapjoin] (batchId=101) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=121) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin] (batchId=125) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_string_funcs] (batchId=118) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_timestamp_funcs] (batchId=107) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2716/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2716/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2716/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 113 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12844565 - PreCommit-HIVE-Build > Enhance EXPLAIN display for vectorization > ----------------------------------------- > > Key: HIVE-11394 > URL: https://issues.apache.org/jira/browse/HIVE-11394 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, > HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, > HIVE-11394.06.patch, HIVE-11394.07.patch, HIVE-11394.08.patch, > HIVE-11394.09.patch, HIVE-11394.091.patch, HIVE-11394.092.patch, > HIVE-11394.093.patch, HIVE-11394.094.patch, HIVE-11394.095.patch > > > Add detail to the EXPLAIN output showing why a Map and Reduce work is not > vectorized. > New syntax is: EXPLAIN VECTORIZATION \[ONLY\] > \[SUMMARY|OPERATOR|EXPRESSION|DETAIL\] > The ONLY option suppresses most non-vectorization elements. > SUMMARY shows vectorization information for the PLAN (is vectorization > enabled) and a summary of Map and Reduce work. > OPERATOR shows vectorization information for operators. E.g. Filter > Vectorization. It includes all information of SUMMARY, too. > EXPRESSION shows vectorization information for expressions. E.g. > predicateExpression. It includes all information of SUMMARY and OPERATOR, > too. > DETAIL shows very vectorization information. > It includes all information of SUMMARY, OPERATOR, and EXPRESSION too. > The optional clause defaults are not ONLY and SUMMARY. > --------------------------------------------------------------------------------------------------- > Here are some examples: > EXPLAIN VECTORIZATION example: > (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization > sections) > Since SUMMARY is the default, it is the output of EXPLAIN VECTORIZATION > SUMMARY. > Under Reducer 3’s "Reduce Vectorization:" you’ll see > notVectorizedReason: Aggregation Function UDF avg parameter expression for > GROUPBY operator: Data type struct<count:bigint,sum:double,input:int> of > Column\[VALUE._col2\] not supported > For Reducer 2’s "Reduce Vectorization:" you’ll see "groupByVectorOutput:": > "false" which says a node has a GROUP BY with an AVG or some other aggregator > that outputs a non-PRIMITIVE type (e.g. STRUCT) and all downstream operators > are row-mode. I.e. not vector output. > If "usesVectorUDFAdaptor:": "false" were true, it would say there was at > least one vectorized expression is using VectorUDFAdaptor. > And, "allNative:": "false" will be true when all operators are native. > Today, GROUP BY and FILE SINK are not native. MAP JOIN and REDUCE SINK are > conditionally native. FILTER and SELECT are native. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > ... > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > ... > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: alltypesorc > Statistics: Num rows: 12288 Data size: 36696 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: cint (type: int) > outputColumnNames: cint > Statistics: Num rows: 12288 Data size: 36696 Basic stats: > COMPLETE Column stats: COMPLETE > Group By Operator > keys: cint (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 5775 Data size: 17248 Basic > stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 5775 Data size: 17248 Basic > stats: COMPLETE Column stats: COMPLETE > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2 > Execution mode: vectorized, llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez, spark] IS true > groupByVectorOutput: false > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reduce Operator Tree: > Group By Operator > keys: KEY._col0 (type: int) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 5775 Data size: 17248 Basic stats: > COMPLETE Column stats: COMPLETE > Group By Operator > aggregations: sum(_col0), count(_col0), avg(_col0), > std(_col0) > mode: hash > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 1 Data size: 172 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 172 Basic stats: > COMPLETE Column stats: COMPLETE > value expressions: _col0 (type: bigint), _col1 (type: > bigint), _col2 (type: struct<count:bigint,sum:double,input:int>), _col3 > (type: struct<count:bigint,sum:double,variance:double>) > Reducer 3 > Execution mode: llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez, spark] IS true > notVectorizedReason: Aggregation Function UDF avg parameter > expression for GROUPBY operator: Data type > struct<count:bigint,sum:double,input:int> of Column[VALUE._col2] not supported > vectorized: false > Reduce Operator Tree: > Group By Operator > aggregations: sum(VALUE._col0), count(VALUE._col1), > avg(VALUE._col2), std(VALUE._col3) > mode: mergepartial > outputColumnNames: _col0, _col1, _col2, _col3 > Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE > Column stats: COMPLETE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE > Column stats: COMPLETE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > EXPLAIN VECTORIZATION OPERATOR > Notice the added TableScan Vectorization, Select Vectorization, Group By > Vectorization, Map Join Vectorizatin, Reduce Sink Vectorization sections in > this example. > Notice the nativeConditionsMet detail on why Reduce Vectorization is native. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > #### A masked pattern was here #### > Edges: > Map 2 <- Map 1 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > #### A masked pattern was here #### > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: a > Statistics: Num rows: 3 Data size: 294 Basic stats: > COMPLETE Column stats: NONE > TableScan Vectorization: > native: true > projectedOutputColumns: [0, 1] > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicate: c2 is not null (type: boolean) > Statistics: Num rows: 3 Data size: 294 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: c1 (type: int), c2 (type: char(10)) > outputColumnNames: _col0, _col1 > Select Vectorization: > className: VectorSelectOperator > native: true > projectedOutputColumns: [0, 1] > Statistics: Num rows: 3 Data size: 294 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col1 (type: char(20)) > sort order: + > Map-reduce partition columns: _col1 (type: char(20)) > Reduce Sink Vectorization: > className: VectorReduceSinkStringOperator > native: true > nativeConditionsMet: > hive.vectorized.execution.reducesink.new.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE > IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No > DISTINCT columns IS true, BinarySortableSerDe for keys IS true, > LazyBinarySerDe for values IS true > Statistics: Num rows: 3 Data size: 294 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: int) > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: true > usesVectorUDFAdaptor: false > vectorized: true > Map 2 > Map Operator Tree: > TableScan > alias: b > Statistics: Num rows: 3 Data size: 324 Basic stats: > COMPLETE Column stats: NONE > TableScan Vectorization: > native: true > projectedOutputColumns: [0, 1] > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicate: c2 is not null (type: boolean) > Statistics: Num rows: 3 Data size: 324 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: c1 (type: int), c2 (type: char(20)) > outputColumnNames: _col0, _col1 > Select Vectorization: > className: VectorSelectOperator > native: true > projectedOutputColumns: [0, 1] > Statistics: Num rows: 3 Data size: 324 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col1 (type: char(20)) > 1 _col1 (type: char(20)) > Map Join Vectorization: > className: VectorMapJoinInnerStringOperator > native: true > nativeConditionsMet: > hive.vectorized.execution.mapjoin.native.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS > true, No nullsafe IS true, Supports Key Types IS true, Not empty key IS true, > When Fast Hash Table, then requires no Hybrid Hash Join IS true, Small table > vectorizes IS true > outputColumnNames: _col0, _col1, _col2, _col3 > input vertices: > 0 Map 1 > Statistics: Num rows: 3 Data size: 323 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Reduce Sink Vectorization: > className: VectorReduceSinkOperator > native: false > nativeConditionsMet: > hive.vectorized.execution.reducesink.new.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE > IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true, > BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true > nativeConditionsNotMet: Uniform Hash IS false > Statistics: Num rows: 3 Data size: 323 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col1 (type: char(10)), _col2 > (type: int), _col3 (type: char(20)) > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reducer 3 > Execution mode: vectorized, llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez, spark] IS true > groupByVectorOutput: true > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 > (type: char(10)), VALUE._col1 (type: int), VALUE._col2 (type: char(20)) > outputColumnNames: _col0, _col1, _col2, _col3 > Select Vectorization: > className: VectorSelectOperator > native: true > projectedOutputColumns: [0, 1, 2, 3] > Statistics: Num rows: 3 Data size: 323 Basic stats: COMPLETE > Column stats: NONE > File Output Operator > compressed: false > File Sink Vectorization: > className: VectorFileSinkOperator > native: false > Statistics: Num rows: 3 Data size: 323 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > EXPLAIN VECTORIZATION EXPRESSION > Notice the predicateExpression in this example. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > #### A masked pattern was here #### > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > #### A masked pattern was here #### > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: vector_interval_2 > Statistics: Num rows: 2 Data size: 788 Basic stats: > COMPLETE Column stats: NONE > TableScan Vectorization: > native: true > projectedOutputColumns: [0, 1, 2, 3, 4, 5] > Filter Operator > Filter Vectorization: > className: VectorFilterOperator > native: true > predicateExpression: FilterExprAndExpr(children: > FilterTimestampScalarEqualTimestampColumn(val 2001-01-01 01:02:03.0, col > 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) > -> 6:timestamp) -> boolean, FilterTimestampScalarNotEqualTimestampColumn(val > 2001-01-01 01:02:03.0, col 6)(children: DateColAddIntervalDayTimeScalar(col > 1, val 0 01:02:04.000000000) -> 6:timestamp) -> boolean, > FilterTimestampScalarLessEqualTimestampColumn(val 2001-01-01 01:02:03.0, col > 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) > -> 6:timestamp) -> boolean, FilterTimestampScalarLessTimestampColumn(val > 2001-01-01 01:02:03.0, col 6)(children: DateColAddIntervalDayTimeScalar(col > 1, val 0 01:02:04.000000000) -> 6:timestamp) -> boolean, > FilterTimestampScalarGreaterEqualTimestampColumn(val 2001-01-01 01:02:03.0, > col 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0 > 01:02:03.000000000) -> 6:timestamp) -> boolean, > FilterTimestampScalarGreaterTimestampColumn(val 2001-01-01 01:02:03.0, col > 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0 > 01:02:04.000000000) -> 6:timestamp) -> boolean, > FilterTimestampColEqualTimestampScalar(col 6, val 2001-01-01 > 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 > 01:02:03.000000000) -> 6:timestamp) -> boolean, > FilterTimestampColNotEqualTimestampScalar(col 6, val 2001-01-01 > 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 > 01:02:04.000000000) -> 6:timestamp) -> boolean, > FilterTimestampColGreaterEqualTimestampScalar(col 6, val 2001-01-01 > 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 > 01:02:03.000000000) -> 6:timestamp) -> boolean, > FilterTimestampColGreaterTimestampScalar(col 6, val 2001-01-01 > 01:02:03.0)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 > 01:02:04.000000000) -> 6:timestamp) -> boolean, > FilterTimestampColLessEqualTimestampScalar(col 6, val 2001-01-01 > 01:02:03.0)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0 > 01:02:03.000000000) -> 6:timestamp) -> boolean, > FilterTimestampColLessTimestampScalar(col 6, val 2001-01-01 > 01:02:03.0)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0 > 01:02:04.000000000) -> 6:timestamp) -> boolean, > FilterTimestampColEqualTimestampColumn(col 0, col 6)(children: > DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:03.000000000) -> > 6:timestamp) -> boolean, FilterTimestampColNotEqualTimestampColumn(col 0, col > 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) > -> 6:timestamp) -> boolean, FilterTimestampColLessEqualTimestampColumn(col 0, > col 6)(children: DateColAddIntervalDayTimeScalar(col 1, val 0 > 01:02:03.000000000) -> 6:timestamp) -> boolean, > FilterTimestampColLessTimestampColumn(col 0, col 6)(children: > DateColAddIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) -> > 6:timestamp) -> boolean, FilterTimestampColGreaterEqualTimestampColumn(col 0, > col 6)(children: DateColSubtractIntervalDayTimeScalar(col 1, val 0 > 01:02:03.000000000) -> 6:timestamp) -> boolean, > FilterTimestampColGreaterTimestampColumn(col 0, col 6)(children: > DateColSubtractIntervalDayTimeScalar(col 1, val 0 01:02:04.000000000) -> > 6:timestamp) -> boolean) -> boolean > predicate: ((2001-01-01 01:02:03.0 = (dt + 0 > 01:02:03.000000000)) and (2001-01-01 01:02:03.0 <> (dt + 0 > 01:02:04.000000000)) and (2001-01-01 01:02:03.0 <= (dt + 0 > 01:02:03.000000000)) and (2001-01-01 01:02:03.0 < (dt + 0 > 01:02:04.000000000)) and (2001-01-01 01:02:03.0 >= (dt - 0 > 01:02:03.000000000)) and (2001-01-01 01:02:03.0 > (dt - 0 > 01:02:04.000000000)) and ((dt + 0 01:02:03.000000000) = 2001-01-01 > 01:02:03.0) and ((dt + 0 01:02:04.000000000) <> 2001-01-01 01:02:03.0) and > ((dt + 0 01:02:03.000000000) >= 2001-01-01 01:02:03.0) and ((dt + 0 > 01:02:04.000000000) > 2001-01-01 01:02:03.0) and ((dt - 0 01:02:03.000000000) > <= 2001-01-01 01:02:03.0) and ((dt - 0 01:02:04.000000000) < 2001-01-01 > 01:02:03.0) and (ts = (dt + 0 01:02:03.000000000)) and (ts <> (dt + 0 > 01:02:04.000000000)) and (ts <= (dt + 0 01:02:03.000000000)) and (ts < (dt + > 0 01:02:04.000000000)) and (ts >= (dt - 0 01:02:03.000000000)) and (ts > (dt > - 0 01:02:04.000000000))) (type: boolean) > Statistics: Num rows: 1 Data size: 394 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: ts (type: timestamp) > outputColumnNames: _col0 > Select Vectorization: > className: VectorSelectOperator > native: true > projectedOutputColumns: [0] > Statistics: Num rows: 1 Data size: 394 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: timestamp) > sort order: + > Reduce Sink Vectorization: > className: VectorReduceSinkOperator > native: false > nativeConditionsMet: > hive.vectorized.execution.reducesink.new.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE > IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true, > BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true > nativeConditionsNotMet: Uniform Hash IS false > Statistics: Num rows: 1 Data size: 394 Basic stats: > COMPLETE Column stats: NONE > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2 > ... > {code} > The standard @Explain Annotation Type is used. A new 'vectorization' > annotation marks each new class and method. > Works for FORMATTED, like other non-vectorization EXPLAIN variations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)