[ 
https://issues.apache.org/jira/browse/HIVE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666713#comment-13666713
 ] 

John Omernik commented on HIVE-4070:
------------------------------------

They may be a bit scary, but from what perspective? The programming of the 
system or the users/operators of the data warehouse that need consistency in 
results?  The scary thing about the LIKE operator and it not being able to be 
controlled in it's behavior is ensuring that assumptions by users don't lead to 
false negative hits.  I know personally, I came from MySQl/MSSQL and I had that 
happened to me until I figured out the difference. It isn't well documented, I 
can't control it, it bit me. Ok, no problem, how do I fix it now in my 
environment, most of my users are from that same background, so now I have to 
do training to change this? How effective will that be? Will people forget? 
Will I get false negatives? 

A "mlike" may be an option, however, I am not sure; it still requires the user 
to remember there is a change, is mlike better than other work arounds? Not if 
the user forgets to use it and in the case of like, the end result is false 
negatives(user forgets, no results, moves on not realizing they are missing 
something).  

If a global option isn't available, could like be case insensitive and an olike 
function (oracle like) be added? I will say that while false positives are a 
pain, at least you can look at the results and say "wait, why isn't this case 
sensitive?" and you know something is different than what you are used to (i.e. 
Oracle, hence olike).  You see and then understand there is a discrepancy. With 
mlike and case sensitivity being the default, you don't have any indicator, 
because unexpected case sensitivity produces false negatives, unexpected case 
insensitivity produces false positives. In analytics false positives are far 
preferred than false negatives.  

Just brainstorming here, what if we implemented mlike AND olike (my/ms SQL like 
and oracle  like) (or cslike and cilike (case sensitive and case insensitive)) 
Then we have the like function just be a link to mlike or olike.  Since Hive 
started out with case sensitivity, we could have like link to olike, but it can 
be changed to mlike in the hive-site.  The point being, absence of 
specification, default to normal (LIKE = OLIKE Case sensitive) 
hive.like.insensitive = true make it MLIKE.  Simple, doesn't change the default 
hive setting/behavior (out of the box), gives the option of global setting, and 
from a programing standpoint we are doing an if test on what LIKE links to, and 
we'd have olike and mlike available as UDFs when a user wanted to do the 
opposite of what the default setting is. 



                
> Like operator in Hive is case sensitive while in MySQL (and most likely other 
> DBs) it's case insensitive
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4070
>                 URL: https://issues.apache.org/jira/browse/HIVE-4070
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.10.0
>            Reporter: Mark Grover
>            Assignee: Mark Grover
>            Priority: Trivial
>
> Hive's like operator seems to be case sensitive.
> See 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java#L164
> However, MySQL's like operator is case insensitive. I don't have other DB's 
> (like PostgreSQL) installed and handy but I am guessing their LIKE is case 
> insensitive as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to