Yep, "set hive.vectorized.reuse.scratch.columns=false;" fixes the problem. And it is definitely something wrong with 'if', without it everything works fine;
explain vectorization detail +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | PLAN VECTORIZATION: | | enabled: true | | enabledConditionsMet: [hive.vectorized.execution.enabled IS true] | | | | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-0 depends on stages: Stage-1 | | | | STAGE PLANS: | | Stage: Stage-1 | | Tez | | DagId: hive_20181224123402_3536cf16-bb5b-496c-b196-417d6dff4be0:11867 | | Edges: | | Map 1 <- Map 2 (BROADCAST_EDGE) | | DagName: hive_20181224123402_3536cf16-bb5b-496c-b196-417d6dff4be0:11867 | | Vertices: | | Map 1 | | Map Operator Tree: | | TableScan | | alias: xs | | Statistics: Num rows: 5 Data size: 20 Basic stats: COMPLETE Column stats: COMPLETE | | TableScan Vectorization: | | native: true | | vectorizationSchemaColumns: [0:key:int, 1:a:int, 2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>] | | Select Operator | | expressions: key (type: int) | | outputColumnNames: _col0 | | Select Vectorization: | | className: VectorSelectOperator | | native: true | | projectedOutputColumnNums: [0] | | Statistics: Num rows: 5 Data size: 20 Basic stats: COMPLETE Column stats: COMPLETE | | Map Join Operator | | condition map: | | Left Outer Join 0 to 1 | | keys: | | 0 if(_col0 is null, 44, _col0) (type: int) | | 1 _col0 (type: int) | | Map Join Vectorization: | | bigTableKeyColumnNums: [4] | | bigTableKeyExpressions: IfExprLongScalarLongColumn(col 3:boolean, val 44, col 0:int)(children: IsNull(col 0:int) -> 3:boolean) -> 4:int | | bigTableOuterKeyMapping: 4 -> 5 | | bigTableRetainedColumnNums: [0, 5] | | bigTableValueColumnNums: [0] | | className: VectorMapJoinOuterLongOperator | | native: true | | nativeConditionsMet: hive.mapjoin.optimized.hashtable IS true, hive.vectorized.execution.mapjoin.native.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, One MapJoin Condition IS true, No nullsafe IS true, Small table vectorizes IS true, Outer Join has keys IS true, Fast Hash Table and No Hybrid Hash Join IS true | | projectedOutputColumnNums: [0, 5, 6] | | smallTableMapping: [6] | | outputColumnNames: _col0, _col1, _col2 | | input vertices: | | 1 Map 2 | | Statistics: Num rows: 5 Data size: 52 Basic stats: COMPLETE Column stats: COMPLETE | | File Output Operator | | compressed: false | | File Sink Vectorization: | | className: VectorFileSinkOperator | | native: false | | Statistics: Num rows: 5 Data size: 52 Basic stats: COMPLETE Column stats: COMPLETE | | table: | | input format: org.apache.hadoop.mapred.SequenceFileInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | Execution mode: vectorized | | Map Vectorization: | | enabled: true | | enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true | | inputFormatFeatureSupport: [DECIMAL_64] | | featureSupportInUse: [DECIMAL_64] | | inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | | allNative: false | | usesVectorUDFAdaptor: false | | vectorized: true | | rowBatchContext: | | dataColumnCount: 2 | | includeColumns: [0] | | dataColumns: key:int, a:int | | partitionColumnCount: 0 | | scratchColumnTypeNames: [bigint, bigint, bigint, bigint] | | Map 2 | | Map Operator Tree: | | TableScan | | alias: dict | | Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE | | TableScan Vectorization: | | native: true | | vectorizationSchemaColumns: [0:key:int, 1:b:int, 2:ROW__ID:struct<writeid:bigint,bucketid:int,rowid:bigint>] | | Select Operator | | expressions: key (type: int), b (type: int) | | outputColumnNames: _col0, _col1 | | Select Vectorization: | | className: VectorSelectOperator | | native: true | | projectedOutputColumnNums: [0, 1] | | Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE | | Reduce Output Operator | | key expressions: _col0 (type: int) | | sort order: + | | Map-reduce partition columns: _col0 (type: int) | | Reduce Sink Vectorization: | +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | className: VectorReduceSinkLongOperator | | keyColumnNums: [0] | | native: true | | nativeConditionsMet: hive.vectorized.execution.reducesink.new.enabled IS true, hive.execution.engine tez IN [tez, spark] IS true, No PTF TopN IS true, No DISTINCT columns IS true, BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true | | valueColumnNums: [1] | | Statistics: Num rows: 2 Data size: 16 Basic stats: COMPLETE Column stats: COMPLETE | | value expressions: _col1 (type: int) | | Execution mode: vectorized | | Map Vectorization: | | enabled: true | | enabledConditionsMet: hive.vectorized.use.vectorized.input.format IS true | | inputFormatFeatureSupport: [DECIMAL_64] | | featureSupportInUse: [DECIMAL_64] | | inputFileFormats: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | | allNative: true | | usesVectorUDFAdaptor: false | | vectorized: true | | rowBatchContext: | | dataColumnCount: 2 | | includeColumns: [0, 1] | | dataColumns: key:int, b:int | | partitionColumnCount: 0 | | scratchColumnTypeNames: [] | | | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | ListSink | | | +----------------------------------------------------+ On Sun, Dec 23, 2018 at 4:11 AM Gopal Vijayaraghavan <gop...@apache.org> wrote: > Hi, > > > Subject: Re: hive 3.1 mapjoin with complex predicate produce incorrect > results > ... > > | 0 if(_col0 is null, 44, _col0) (type: int) | > > | 1 _col0 (type: int) | > > That rewrite is pretty neat, but I feel like the IF expression nesting is > what is broken here. > > Can you run the same query with "set > hive.vectorized.reuse.scratch.columns=false;" and see if this is a join > expression column reuse problem. > > If that does work, can you send out a > > explain vectorization detail <query>; > > I'll eventually get back to my dev env in a week, but this looks like a > low-level exec issue right now. > > Cheers, > Gopal > > > -- С уважением Зиновьев Андрей