> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/in_typecheck_pointlook.q.out
> > Lines 56 (patched)
> > <https://reviews.apache.org/r/69019/diff/2/?file=2101646#file2101646line56>
> >
> >     I expected 'Unknown' should have been char of length 6. Is there a 
> > reason to expand the length to 10?
> >     As I mentioned previously if constant is of smaller length, then it 
> > doesn't make a difference, but is unnecessary, but if constant is of bigger 
> > length then LHS, then char::compare() actually truncates constant, so it 
> > better to create char with original length of constant.
> 
> Zoltan Haindrich wrote:
>     It worked before the other way around; constants are expanded to the 
> target type - the addition I've made is that if the constant is longer; then 
> its made invalid

I think its better to let runtime dictacte the semantics in such cases. So, we 
just create constant char of its original length and then whatever runtime does 
with it, we will get that, instead of us "pre-processing" value at compile 
time. Other way to think about this is if there are 2 cols of char(5) and 
char(10) what will runtime do? Runtime already has logic to handle such cases, 
we let it handle that.


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/in_typecheck_varchar.q.out
> > Lines 42 (patched)
> > <https://reviews.apache.org/r/69019/diff/2/?file=2101647#file2101647line42>
> >
> >     This is inconsistent. Char and string comparison happens in char. But, 
> > varchar and string comparison happens in String. Was this behavior present 
> > before this patch too?
> 
> Zoltan Haindrich wrote:
>     yes; only char is handled differently

Can you leave a TODO : that to recheck it later.


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/infer_const_type.q.out
> > Line 145 (original), 145 (patched)
> > <https://reviews.apache.org/r/69019/diff/2/?file=2101648#file2101648line145>
> >
> >     Is 'or null'  because of fl  = 'float' OR
> >       db  = 'double' ? I expected that to become " or false". Though "or 
> > null" will evaluate to same but "or false" is what I would expect.
> 
> Zoltan Haindrich wrote:
>     the appearance of " or null" is because of the TODO fix in UDFOPEquals:
>     
>     
> https://github.com/kgyrtkirk/hive/blob/93f5a429fe5eb26f0b720fcef729efebe0549d76/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java#L1179
>     
>     I'm not sure why it wasn't removed in this case.
>     
>     should I change it back and address it later?

Lets keep the change. Code change we have done is certainly valid.


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/join45.q.out
> > Line 717 (original), 717 (patched)
> > <https://reviews.apache.org/r/69019/diff/2/?file=2101649#file2101649line717>
> >
> >     As discussed this should have been
> >     (struct(cast (_col0 as double), cast(_col2 as double))) IN (const 
> > struct(100.0D,100.0D), const struct(101.0D,101.0D), const 
> > struct(102.0D,102.0D))
> 
> Zoltan Haindrich wrote:
>     * column has type string.
>     * the IN statement may have the same type for all element in this case 
> it's int...
>     * in this case to change the left hand side to double: it may work
>     
>     
>     but what should happen in case the IN operands contain 1 int, 1 decimal 
> and 1 string ?
>     
>     note: during UDF evaluations there is a Set with all the values; and 
> "contains" is used - so if the inferred type is not a numeric: 
>     
>     I've just checked that even the standard IN doesn't work like this:
>     
>     ```
>     create table t (a string);
>     
>     insert into t values ('1'),('x'),('2.0');
>     
>     select * from t where a in (1.0,'x',2);
>     1
>     2.0
>     -- it doesn't return 'x' because it's casted to double
>     
>     ```
>     
>     I'm starting to think that it would be better to do this with the IN 
> unwinded into ORs: so that we could do the one-on-one constant checks and 
> then pointlookupoptimizer might collapse them if the types are the same - in 
> this case I think we would not loose 'x' in the above case - and it would 
> also make this whole recursive typecheck unneccessary.

We shall strive to match semantics which is already there. In Hive, for expr 
str_col = 12, we get cast of double on both sides. So, by extensions IN clause 
should do the same IFF all types are same. If types aren't same I agree what 
you are proposing ie. keep it as unwinded ORs.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69019/#review209827
-----------------------------------------------------------


On Oct. 20, 2018, 5:51 a.m., Zoltan Haindrich wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69019/
> -----------------------------------------------------------
> 
> (Updated Oct. 20, 2018, 5:51 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Vineet Garg.
> 
> 
> Bugs: HIVE-20617
>     https://issues.apache.org/jira/browse/HIVE-20617
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> For IN expressions the types were never corrected; and pointlookupoptimizer 
> was probably leaving behind fields already which were uncomparable; 
> HIVE-20296 exposed it further by changing the minimal number from  32 to 2.
> 
> This change generalizes the retyping of constants to also run it for the IN 
> operator ; and also for struct-s.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/type/HiveChar.java 29dc06dca1 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java e7d71595c7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 4968d16876 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java 
> c274fd7cc9 
>   ql/src/test/queries/clientpositive/in_typecheck_char.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/in_typecheck_pointlook.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/in_typecheck_varchar.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_partition_coltype.q.out f6c3c5642e 
>   ql/src/test/results/clientpositive/in_typecheck2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck_char.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck_pointlook.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck_varchar.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_const_type.q.out e1d7de5422 
>   ql/src/test/results/clientpositive/join45.q.out 47aaf7d0ab 
>   ql/src/test/results/clientpositive/join47.q.out 4d9e937815 
>   ql/src/test/results/clientpositive/llap/dec_str.q.out 554031e952 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out f240468558 
>   ql/src/test/results/clientpositive/llap/lineage3.q.out cf38816127 
>   ql/src/test/results/clientpositive/llap/vectorization_13.q.out 4ce654f960 
>   ql/src/test/results/clientpositive/llap/vectorization_6.q.out a2f730beca 
>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 21ce7b8ebd 
>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 
> 7f1c6a295e 
>   ql/src/test/results/clientpositive/mapjoin47.q.out 294dd69de5 
>   ql/src/test/results/clientpositive/parquet_vectorization_13.q.out 
> 0efce98b55 
>   ql/src/test/results/clientpositive/parquet_vectorization_6.q.out 0bb6888364 
>   ql/src/test/results/clientpositive/parquet_vectorization_8.q.out 957bd7b264 
>   ql/src/test/results/clientpositive/ppd_udf_col.q.out 814fb5afcf 
>   ql/src/test/results/clientpositive/spark/cbo_simple_select.q.out acf91bf178 
>   ql/src/test/results/clientpositive/spark/parquet_vectorization_13.q.out 
> 3812239343 
>   ql/src/test/results/clientpositive/spark/parquet_vectorization_6.q.out 
> 6108457aad 
>   ql/src/test/results/clientpositive/spark/parquet_vectorization_8.q.out 
> 3352dedc58 
>   ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 
> f5a4c9ad86 
>   ql/src/test/results/clientpositive/spark/subquery_scalar.q.out b3252f5415 
>   ql/src/test/results/clientpositive/spark/vectorization_13.q.out 34ec9c42dd 
>   ql/src/test/results/clientpositive/spark/vectorization_6.q.out 5679bb8cfa 
>   ql/src/test/results/clientpositive/spark/vectorization_short_regress.q.out 
> 231dea6de3 
>   ql/src/test/results/clientpositive/vectorization_13.q.out 8897f8427f 
>   ql/src/test/results/clientpositive/vectorization_6.q.out 8dedb63e7d 
>   ql/src/test/results/clientpositive/vectorization_8.q.out d81df76a2f 
> 
> 
> Diff: https://reviews.apache.org/r/69019/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>

Reply via email to