[jira] [Commented] (HIVE-14593) Non-canonical integer partition columns do not work with IN operations

Harsh J (JIRA) Sun, 21 Aug 2016 08:21:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429771#comment-15429771
 ]


Harsh J commented on HIVE-14593:
--------------------------------

Thank you [~gopalv],

Would that keep compatibility? For instance, on such an existent table? And 
will inserting a b=7 produce a second partition?

Can the GenericUDFIn not be changed to match MySQL approach instead, where the 
IN(…) types are converted to match the column type instead of vice-versa? The 
class-doc states it was done to keep consistency with other UDF approaches.

{quote}
 * Also noteworthy: type conversion behavior is different from MySQL. With
 * expr IN expr1, expr2... in MySQL, exprN will each be converted into the same
 * type as expr. In the Hive implementation, all expr(N) will be converted into
 * a common type for conversion consistency with other UDF's, and to prevent
 * conversions from a big type to a small type (e.g. int to tinyint)
{quote}

> Non-canonical integer partition columns do not work with IN operations
> ----------------------------------------------------------------------
>
>                 Key: HIVE-14593
>                 URL: https://issues.apache.org/jira/browse/HIVE-14593
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 1.0.0
>            Reporter: Harsh J
>
> The below use-case no longer works (tested on a PostgresQL backed HMS using 
> JDO as well as on a MySQL backed HMS with DirectSQL):
> {code}
> CREATE TABLE foo (a STRING) PARTITIONED BY (b INT, c INT);
> ALTER TABLE foo ADD PARTITION (b='07', c='08');
> LOAD DATA LOCAL INPATH '/etc/hostname' INTO TABLE foo PARTITION(b='07', 
> c='08');
> -- Does not work if you provide a string IN variable:
> SELECT a, c FROM foo WHERE b IN ('07');
> (No rows selected)
> -- Works if you provide it in integer forms or canonical integer strings:
> SELECT a, c FROM foo WHERE b IN (07);
> (1 row(s) selected)
> SELECT a, c FROM foo WHERE b IN (7);
> (1 row(s) selected)
> SELECT a, c FROM foo WHERE b IN ('7');
> (1 row(s) selected)
> {code}
> This worked fine prior to HIVE-8099. The change of HIVE-8099 is inducing a 
> double conversion on the partition column input, such that the IN 
> GenericUDFIn now receives b's value as a column type converted canonical 
> integer 7, as opposed to an as-is DB stored non-canonical value 07. 
> Subsequently the GenericUDFIn again up-converts the b's value to match its 
> argument's value types instead, making 7 (int) into a string "7". Then, "7" 
> is compared against "07" which naturally never matches.
> As a regression, this breaks anyone upgrading pre-1.0 to 1.0 or higher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14593) Non-canonical integer partition columns do not work with IN operations

Reply via email to