[ https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt McCline updated HIVE-11394: -------------------------------- Description: Add detail to the EXPLAIN output showing why a Map and Reduce work is not vectorized. New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\] The ONLY option suppresses most non-vectorization elements. SUMMARY shows vectorization information for the PLAN (is vectorization enabled) and a summary of Map and Reduce work. The optional clause defaults are not ONLY and SUMMARY. Here are some examples: EXPLAIN VECTORIZATION example: (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections) It is the same as EXPLAIN VECTORIZATION SUMMARY. {code} PLAN VECTORIZATION: enabled: true enabledConditionsMet: [hive.vectorized.execution.enabled IS true] STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez … Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) … Vertices: Map 1 Map Operator Tree: TableScan alias: decimal_date_test Statistics: Num rows: 12288 Data size: 2467616 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: boolean) Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: cdate (type: date) outputColumnNames: _col0 Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: date) sort order: + Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column stats: NONE Execution mode: vectorized, llap LLAP IO: all inputs Map Vectorization: enabled: true enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true groupByVectorOutput: true inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat allNative: false usesVectorUDFAdaptor: false vectorized: true Reducer 2 Execution mode: vectorized, llap Reduce Vectorization: enabled: true enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true groupByVectorOutput: true allNative: false usesVectorUDFAdaptor: false vectorized: true Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: date) outputColumnNames: _col0 Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe {code} EXPLAIN VECTORIZATION DETAIL (Note the added Select Vectorization, Group By Vectorization, Reduce Sink Vectorization sections in this example) {code} PLAN VECTORIZATION: enabled: true enabledConditionsMet: [hive.vectorized.execution.enabled IS true] STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez … Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) … Vertices: Map 1 Map Operator Tree: TableScan alias: vectortab2korc Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: bo (type: boolean), b (type: bigint) outputColumnNames: bo, b Select Vectorization: className: VectorSelectOperator native: true nativeConditionsMet: Supported IS true selectExpressions: IdentityExpression[7:boolean], IdentityExpression[3:bigint] vectorized: true Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: max(b) Group By Vectorization: aggregators: VectorUDAFMaxLong(IdentityExpression[3:bigint]) className: VectorGroupByOperator vectorOutput: true keyExpressions: IdentityExpression[7:boolean] native: false nativeConditionsNotMet: Supported IS false vectorized: true keys: bo (type: boolean) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: boolean) sort order: + Map-reduce partition columns: _col0 (type: boolean) Reduce Sink Vectorization: className: VectorReduceSinkLongOperator native: true nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true vectorized: true Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Execution mode: vectorized Map Vectorization: enabled: true enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true groupByVectorOutput: true inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat allNative: false usesVectorUDFAdaptor: false vectorized: true Reducer 2 Execution mode: vectorized Reduce Vectorization: enabled: true enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true groupByVectorOutput: true allNative: false usesVectorUDFAdaptor: false vectorized: true Reduce Operator Tree: Group By Operator aggregations: max(VALUE._col0) Group By Vectorization: aggregators: VectorUDAFMaxLong(IdentityExpression[1:bigint]) className: VectorGroupByOperator vectorOutput: true keyExpressions: IdentityExpression[0:boolean] native: false nativeConditionsNotMet: Supported IS false vectorized: true keys: KEY._col0 (type: boolean) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 1000 Data size: 459356 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: boolean) sort order: - Reduce Sink Vectorization: className: VectorReduceSinkOperator native: false nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true nativeConditionsNotMet: Uniform Hash IS false vectorized: true Statistics: Num rows: 1000 Data size: 459356 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) … {code} EXPLAIN VECTORIZATION ONLY example: {code} PLAN VECTORIZATION: enabled: true enabledConditionsMet: [hive.vectorized.execution.enabled IS true] STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 <- Map 2 (BROADCAST_EDGE) Vertices: Map 1 Map Vectorization: enabled: true enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true groupByVectorOutput: true inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat allNative: false usesVectorUDFAdaptor: false vectorized: true Map 2 Map Vectorization: enabled: true enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true groupByVectorOutput: true inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat allNative: true usesVectorUDFAdaptor: false vectorized: true Stage: Stage-0 {code} The standard @Explain Annotation Type is used. A new 'vectorization' annotation marks each new class and method. Works for FORMATTED, like other non-vectorization EXPLAIN variations. EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED {code} {"PLAN VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map 3","type":"BROADCAST_EDGE"},{"parent":"Map 4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map 1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format IS true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map 3":{"Map Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format IS true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map 4":{"Map Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format IS true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer 2":{"Reduce Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled IS true","hive.execution.engine tez IN [tez, spark] IS true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}} {code} or pretty printed: {code} { "PLAN VECTORIZATION": { "enabled": true, "enabledConditionsMet": [ "hive.vectorized.execution.enabled IS true" ] }, "STAGE DEPENDENCIES": { "Stage-1": { "ROOT STAGE": "TRUE" }, "Stage-0": { "DEPENDENT STAGES": "Stage-1" } }, "STAGE PLANS": { "Stage-1": { "Tez": { "Edges:": { "Map 1": [ { "parent": "Map 3", "type": "BROADCAST_EDGE" }, { "parent": "Map 4", "type": "BROADCAST_EDGE" } ], "Reducer 2": { "parent": "Map 1", "type": "SIMPLE_EDGE" } }, "Vertices:": { "Map 1": { "Map Vectorization:": { "enabled:": "true", "enabledConditionsMet:": [ "hive.vectorized.use.vectorized.input.format IS true" ], "groupByVectorOutput:": "true", "inputFileFormats:": [ "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" ], "allNative:": "false", "usesVectorUDFAdaptor:": "false", "vectorized:": "true" } }, "Map 3": { "Map Vectorization:": { "enabled:": "true", "enabledConditionsMet:": [ "hive.vectorized.use.vectorized.input.format IS true" ], "groupByVectorOutput:": "true", "inputFileFormats:": [ "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" ], "allNative:": "true", "usesVectorUDFAdaptor:": "false", "vectorized:": "true" } }, "Map 4": { "Map Vectorization:": { "enabled:": "true", "enabledConditionsMet:": [ "hive.vectorized.use.vectorized.input.format IS true" ], "groupByVectorOutput:": "true", "inputFileFormats:": [ "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" ], "allNative:": "true", "usesVectorUDFAdaptor:": "false", "vectorized:": "true" } }, "Reducer 2": { "Reduce Vectorization:": { "enabled:": "true", "enableConditionsMet:": [ "hive.vectorized.execution.reduce.enabled IS true", "hive.execution.engine tez IN [tez, spark] IS true" ], "groupByVectorOutput:": "true", "allNative:": "false", "usesVectorUDFAdaptor:": "false", "vectorized:": "true" } } } } }, "Stage-0": { } {code} was: Add detail to the EXPLAIN output showing why a Map and Reduce work is not vectorized. New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\] The ONLY option suppresses most non-vectorization elements. SUMMARY shows vectorization information for the PLAN (is vectorization enabled) and a summary of Map and Reduce work. The optional clause defaults are not ONLY and SUMMARY. Here are some examples: EXPLAIN VECTORIZATION example: (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization sections) It is the same as EXPLAIN VECTORIZATION SUMMARY. {code} PLAN VECTORIZATION: enabled: true enabledConditionsMet: [hive.vectorized.execution.enabled IS true] STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez … Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) … Vertices: Map 1 Map Operator Tree: TableScan alias: decimal_date_test Statistics: Num rows: 12288 Data size: 2467616 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: boolean) Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: cdate (type: date) outputColumnNames: _col0 Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: date) sort order: + Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column stats: NONE Execution mode: vectorized, llap LLAP IO: all inputs Map Vectorization: enabled: true enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true groupByVectorOutput: true inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat allNative: false usesVectorUDFAdaptor: false vectorized: true Reducer 2 Execution mode: vectorized, llap Reduce Vectorization: enabled: true enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true groupByVectorOutput: true allNative: false usesVectorUDFAdaptor: false vectorized: true Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: date) outputColumnNames: _col0 Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 6144 Data size: 1233808 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe {code} EXPLAIN VECTORIZATION DETAIL (Note the added Select Vectorization, Group By Vectorization, Reduce Sink Vectorization sections in this example) {code} PLAN VECTORIZATION: enabled: true enabledConditionsMet: [hive.vectorized.execution.enabled IS true] STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez … Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) … Vertices: Map 1 Map Operator Tree: TableScan alias: vectortab2korc Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: bo (type: boolean), b (type: bigint) outputColumnNames: bo, b Select Vectorization: className: VectorSelectOperator native: true nativeConditionsMet: Supported IS true selectExpressions: IdentityExpression[7:boolean], IdentityExpression[3:bigint] vectorized: true Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: max(b) Group By Vectorization: aggregators: VectorUDAFMaxLong(IdentityExpression[3:bigint]) className: VectorGroupByOperator vectorOutput: true keyExpressions: IdentityExpression[7:boolean] native: false nativeConditionsNotMet: Supported IS false vectorized: true keys: bo (type: boolean) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: boolean) sort order: + Map-reduce partition columns: _col0 (type: boolean) Reduce Sink Vectorization: className: VectorReduceSinkLongOperator native: true nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true vectorized: true Statistics: Num rows: 2000 Data size: 918712 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Execution mode: vectorized Map Vectorization: enabled: true enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true groupByVectorOutput: true inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat allNative: false usesVectorUDFAdaptor: false vectorized: true Reducer 2 Execution mode: vectorized Reduce Vectorization: enabled: true enableConditionsMet: hive.vectorized.execution.reduce.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true groupByVectorOutput: true allNative: false usesVectorUDFAdaptor: false vectorized: true Reduce Operator Tree: Group By Operator aggregations: max(VALUE._col0) Group By Vectorization: aggregators: VectorUDAFMaxLong(IdentityExpression[1:bigint]) className: VectorGroupByOperator vectorOutput: true keyExpressions: IdentityExpression[0:boolean] native: false nativeConditionsNotMet: Supported IS false vectorized: true keys: KEY._col0 (type: boolean) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 1000 Data size: 459356 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: boolean) sort order: - Reduce Sink Vectorization: className: VectorReduceSinkOperator native: false nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true nativeConditionsNotMet: Uniform Hash IS false vectorized: true Statistics: Num rows: 1000 Data size: 459356 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) … {code} EXPLAIN VECTORIZATION ONLY example: {code} PLAN VECTORIZATION: enabled: true enabledConditionsMet: [hive.vectorized.execution.enabled IS true] STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 <- Map 2 (BROADCAST_EDGE) Vertices: Map 1 Map Vectorization: enabled: true enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true groupByVectorOutput: true inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat allNative: false usesVectorUDFAdaptor: false vectorized: true Map 2 Map Vectorization: enabled: true enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true groupByVectorOutput: true inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat allNative: true usesVectorUDFAdaptor: false vectorized: true Stage: Stage-0 {code} The standard @Explain Annotation Type is used. A new 'vectorization' annotation marks each new class and method. Works for FORMATTED, like other non-vectorization EXPLAIN variations. {code} {"PLAN VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map 3","type":"BROADCAST_EDGE"},{"parent":"Map 4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map 1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format IS true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map 3":{"Map Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format IS true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map 4":{"Map Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format IS true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer 2":{"Reduce Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled IS true","hive.execution.engine tez IN [tez, spark] IS true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}} {code} or pretty printed: {code} { "PLAN VECTORIZATION": { "enabled": true, "enabledConditionsMet": [ "hive.vectorized.execution.enabled IS true" ] }, "STAGE DEPENDENCIES": { "Stage-1": { "ROOT STAGE": "TRUE" }, "Stage-0": { "DEPENDENT STAGES": "Stage-1" } }, "STAGE PLANS": { "Stage-1": { "Tez": { "Edges:": { "Map 1": [ { "parent": "Map 3", "type": "BROADCAST_EDGE" }, { "parent": "Map 4", "type": "BROADCAST_EDGE" } ], "Reducer 2": { "parent": "Map 1", "type": "SIMPLE_EDGE" } }, "Vertices:": { "Map 1": { "Map Vectorization:": { "enabled:": "true", "enabledConditionsMet:": [ "hive.vectorized.use.vectorized.input.format IS true" ], "groupByVectorOutput:": "true", "inputFileFormats:": [ "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" ], "allNative:": "false", "usesVectorUDFAdaptor:": "false", "vectorized:": "true" } }, "Map 3": { "Map Vectorization:": { "enabled:": "true", "enabledConditionsMet:": [ "hive.vectorized.use.vectorized.input.format IS true" ], "groupByVectorOutput:": "true", "inputFileFormats:": [ "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" ], "allNative:": "true", "usesVectorUDFAdaptor:": "false", "vectorized:": "true" } }, "Map 4": { "Map Vectorization:": { "enabled:": "true", "enabledConditionsMet:": [ "hive.vectorized.use.vectorized.input.format IS true" ], "groupByVectorOutput:": "true", "inputFileFormats:": [ "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" ], "allNative:": "true", "usesVectorUDFAdaptor:": "false", "vectorized:": "true" } }, "Reducer 2": { "Reduce Vectorization:": { "enabled:": "true", "enableConditionsMet:": [ "hive.vectorized.execution.reduce.enabled IS true", "hive.execution.engine tez IN [tez, spark] IS true" ], "groupByVectorOutput:": "true", "allNative:": "false", "usesVectorUDFAdaptor:": "false", "vectorized:": "true" } } } } }, "Stage-0": { } {code} > Enhance EXPLAIN display for vectorization > ----------------------------------------- > > Key: HIVE-11394 > URL: https://issues.apache.org/jira/browse/HIVE-11394 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, > HIVE-11394.03.patch, HIVE-11394.04.patch > > > Add detail to the EXPLAIN output showing why a Map and Reduce work is not > vectorized. > New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\] > The ONLY option suppresses most non-vectorization elements. > SUMMARY shows vectorization information for the PLAN (is vectorization > enabled) and a summary of Map and Reduce work. > The optional clause defaults are not ONLY and SUMMARY. > Here are some examples: > EXPLAIN VECTORIZATION example: > (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization > sections) > It is the same as EXPLAIN VECTORIZATION SUMMARY. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > … > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > … > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: decimal_date_test > Statistics: Num rows: 12288 Data size: 2467616 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: > boolean) > Statistics: Num rows: 6144 Data size: 1233808 Basic > stats: COMPLETE Column stats: NONE > Select Operator > expressions: cdate (type: date) > outputColumnNames: _col0 > Statistics: Num rows: 6144 Data size: 1233808 Basic > stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: date) > sort order: + > Statistics: Num rows: 6144 Data size: 1233808 Basic > stats: COMPLETE Column stats: NONE > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2 > Execution mode: vectorized, llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez, spark] IS true > groupByVectorOutput: true > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: date) > outputColumnNames: _col0 > Statistics: Num rows: 6144 Data size: 1233808 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 6144 Data size: 1233808 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > {code} > EXPLAIN VECTORIZATION DETAIL > (Note the added Select Vectorization, Group By Vectorization, Reduce Sink > Vectorization sections in this example) > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > … > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > … > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: vectortab2korc > Statistics: Num rows: 2000 Data size: 918712 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: bo (type: boolean), b (type: bigint) > outputColumnNames: bo, b > Select Vectorization: > className: VectorSelectOperator > native: true > nativeConditionsMet: Supported IS true > selectExpressions: IdentityExpression[7:boolean], > IdentityExpression[3:bigint] > vectorized: true > Statistics: Num rows: 2000 Data size: 918712 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: max(b) > Group By Vectorization: > aggregators: > VectorUDAFMaxLong(IdentityExpression[3:bigint]) > className: VectorGroupByOperator > vectorOutput: true > keyExpressions: IdentityExpression[7:boolean] > native: false > nativeConditionsNotMet: Supported IS false > vectorized: true > keys: bo (type: boolean) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 2000 Data size: 918712 Basic > stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: boolean) > sort order: + > Map-reduce partition columns: _col0 (type: boolean) > Reduce Sink Vectorization: > className: VectorReduceSinkLongOperator > native: true > nativeConditionsMet: > hive.vectorized.execution.reducesink.new.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE > IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No > DISTINCT columns IS true, BinarySortableSerDe for keys IS true, > LazyBinarySerDe for values IS true > vectorized: true > Statistics: Num rows: 2000 Data size: 918712 Basic > stats: COMPLETE Column stats: NONE > value expressions: _col1 (type: bigint) > Execution mode: vectorized > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2 > Execution mode: vectorized > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez, spark] IS true > groupByVectorOutput: true > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reduce Operator Tree: > Group By Operator > aggregations: max(VALUE._col0) > Group By Vectorization: > aggregators: > VectorUDAFMaxLong(IdentityExpression[1:bigint]) > className: VectorGroupByOperator > vectorOutput: true > keyExpressions: IdentityExpression[0:boolean] > native: false > nativeConditionsNotMet: Supported IS false > vectorized: true > keys: KEY._col0 (type: boolean) > mode: mergepartial > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1000 Data size: 459356 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: boolean) > sort order: - > Reduce Sink Vectorization: > className: VectorReduceSinkOperator > native: false > nativeConditionsMet: > hive.vectorized.execution.reducesink.new.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE > IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true, > BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true > nativeConditionsNotMet: Uniform Hash IS false > vectorized: true > Statistics: Num rows: 1000 Data size: 459356 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col1 (type: bigint) > … > {code} > EXPLAIN VECTORIZATION ONLY example: > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 1 <- Map 2 (BROADCAST_EDGE) > Vertices: > Map 1 > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Map 2 > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: true > usesVectorUDFAdaptor: false > vectorized: true > Stage: Stage-0 > {code} > The standard @Explain Annotation Type is used. A new 'vectorization' > annotation marks each new class and method. > Works for FORMATTED, like other non-vectorization EXPLAIN variations. > EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED > {code} > {"PLAN > VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled > IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT > STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE > PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map > 3","type":"BROADCAST_EDGE"},{"parent":"Map > 4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map > 1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map > Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format > IS > true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map > 3":{"Map > Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format > IS > true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map > 4":{"Map > Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format > IS > true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer > 2":{"Reduce > Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled > IS true","hive.execution.engine tez IN [tez, spark] IS > true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}} > {code} > or pretty printed: > {code} > { > "PLAN VECTORIZATION": { > "enabled": true, > "enabledConditionsMet": [ > "hive.vectorized.execution.enabled IS true" > ] > }, > "STAGE DEPENDENCIES": { > "Stage-1": { > "ROOT STAGE": "TRUE" > }, > "Stage-0": { > "DEPENDENT STAGES": "Stage-1" > } > }, > "STAGE PLANS": { > "Stage-1": { > "Tez": { > "Edges:": { > "Map 1": [ > { > "parent": "Map 3", > "type": "BROADCAST_EDGE" > }, > { > "parent": "Map 4", > "type": "BROADCAST_EDGE" > } > ], > "Reducer 2": { > "parent": "Map 1", > "type": "SIMPLE_EDGE" > } > }, > "Vertices:": { > "Map 1": { > "Map Vectorization:": { > "enabled:": "true", > "enabledConditionsMet:": [ > "hive.vectorized.use.vectorized.input.format IS true" > ], > "groupByVectorOutput:": "true", > "inputFileFormats:": [ > "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" > ], > "allNative:": "false", > "usesVectorUDFAdaptor:": "false", > "vectorized:": "true" > } > }, > "Map 3": { > "Map Vectorization:": { > "enabled:": "true", > "enabledConditionsMet:": [ > "hive.vectorized.use.vectorized.input.format IS true" > ], > "groupByVectorOutput:": "true", > "inputFileFormats:": [ > "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" > ], > "allNative:": "true", > "usesVectorUDFAdaptor:": "false", > "vectorized:": "true" > } > }, > "Map 4": { > "Map Vectorization:": { > "enabled:": "true", > "enabledConditionsMet:": [ > "hive.vectorized.use.vectorized.input.format IS true" > ], > "groupByVectorOutput:": "true", > "inputFileFormats:": [ > "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" > ], > "allNative:": "true", > "usesVectorUDFAdaptor:": "false", > "vectorized:": "true" > } > }, > "Reducer 2": { > "Reduce Vectorization:": { > "enabled:": "true", > "enableConditionsMet:": [ > "hive.vectorized.execution.reduce.enabled IS true", > "hive.execution.engine tez IN [tez, spark] IS true" > ], > "groupByVectorOutput:": "true", > "allNative:": "false", > "usesVectorUDFAdaptor:": "false", > "vectorized:": "true" > } > } > } > } > }, > "Stage-0": { > > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)