[jira] [Commented] (HIVE-4070) Like operator in Hive is case sensitive while in MySQL (and most likely other DBs) it's case insensitive

John Omernik (JIRA) Sat, 25 May 2013 10:53:23 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13667142#comment-13667142
 ]


John Omernik commented on HIVE-4070:
------------------------------------

A couple of points to consider:
1. " If we change the default that would change the results current users are 
getting." I am in absolute agreement here.  I do not wish to change the 
default, the horse has left the gate. If I have referenced that in my previous 
posts, it's in the context of other ideas etc.  My issue with the current 
default isn't the choice of what it is, as much as there is no way to change it 
on a case by base basis. 

2. Paragraph 2: Those are solid points based on risks to the ongoing 
maintenance of Hive. That said, most Relational Databases, including MySQL have 
the option to change collation for given databases. True it may not be global 
setting (although the collation default is a global setting), but rather a per 
database setting. I.e. in MySQL you can have one database use a case sensitive 
collation and another use a case insensitive collation. Perhaps hive-site isn't 
the place for this, but metadata is (Set it as a per database or table 
setting?)  

3. For the most part, we model functionality in hive based on what mysql does - 
Because of Point 1, this is moot, but by default, MySQL is case insensitive. 

4.  Most users do not want to have to heavily test before upgrade, they want 
consistent behaviour between versions. - This is exactly why I think we need an 
option for administrators of Hive to be able to set the case sensitivity on 
database/table level, and the default case sensitivity at a global level.  Most 
users do not test.  From a philosophical (data philosopher?) point of view, 
users will not test, therefore if failure is to happen because of lack of 
testing, please fail with pomp and circumstance.  

I.e. Both lines of users (MySQL/MSSQL Migrating to Hive and Oracle/Postgres 
Migrating to Hive) are going to make assumptions when they write their queries 
that will cause a failure to happen.  If LIKE is Case Sensitive as it is, 
Oracle/Postgres users will assume correctly and all will be well.  MySQL/Users 
will assume wrong, but they won't KNOW they assumed wrong, they will just 
assume no results (given a lack of testing).   The converse is actually the 
preferred model: If LIKE is not case sensitive, then Oracle/Postgres users will 
assume incorrectly, they will run a query, and they will get their expected 
results, but will also get extra results allowing them to understand they 
assumed wrong. MySQL/MSSQL users will assume correctly and get their results.   

Going back to Point 1, the horse has left the building, we can't change default 
because of what it would do to the current user base, thus the next best option 
is to allow administrators to set that so users don't have to deal with it, 
document it so good administrators can handle it out of the gate, and to find a 
way to do it like '%MysQL%' (i.e. similar to collation settings on databases 
rather then a global setting as described previously).  

I do see the challenges with a global setting affecting how a function works; 
no precedence for that, and thus it could introduce risk, but I still hold that 
the current risk, especially on a system that is touted as modeled after MySQL 
(that's how it was explained to me, thus I assumed LIKE was case insensitive), 
to users is high.  As I was typing this novelette, I realized I tossed out an 
idea related to per database settings.  Thoughts on that? That is similar to 
how MySQL handles it, and thus follows the models mentioned while avoiding a 
global setting that affects the behavior of a UDF.  (The setting doesn't change 
the UDF, the "collation equivalent" setting on the databases does, and thus the 
global setting is just the default collation equivalent. 
                
> Like operator in Hive is case sensitive while in MySQL (and most likely other 
> DBs) it's case insensitive
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4070
>                 URL: https://issues.apache.org/jira/browse/HIVE-4070
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.10.0
>            Reporter: Mark Grover
>            Assignee: Mark Grover
>            Priority: Trivial
>
> Hive's like operator seems to be case sensitive.
> See 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java#L164
> However, MySQL's like operator is case insensitive. I don't have other DB's 
> (like PostgreSQL) installed and handy but I am guessing their LIKE is case 
> insensitive as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4070) Like operator in Hive is case sensitive while in MySQL (and most likely other DBs) it's case insensitive

Reply via email to