[ https://issues.apache.org/jira/browse/HIVE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13666713#comment-13666713 ]
John Omernik commented on HIVE-4070: ------------------------------------ They may be a bit scary, but from what perspective? The programming of the system or the users/operators of the data warehouse that need consistency in results? The scary thing about the LIKE operator and it not being able to be controlled in it's behavior is ensuring that assumptions by users don't lead to false negative hits. I know personally, I came from MySQl/MSSQL and I had that happened to me until I figured out the difference. It isn't well documented, I can't control it, it bit me. Ok, no problem, how do I fix it now in my environment, most of my users are from that same background, so now I have to do training to change this? How effective will that be? Will people forget? Will I get false negatives? A "mlike" may be an option, however, I am not sure; it still requires the user to remember there is a change, is mlike better than other work arounds? Not if the user forgets to use it and in the case of like, the end result is false negatives(user forgets, no results, moves on not realizing they are missing something). If a global option isn't available, could like be case insensitive and an olike function (oracle like) be added? I will say that while false positives are a pain, at least you can look at the results and say "wait, why isn't this case sensitive?" and you know something is different than what you are used to (i.e. Oracle, hence olike). You see and then understand there is a discrepancy. With mlike and case sensitivity being the default, you don't have any indicator, because unexpected case sensitivity produces false negatives, unexpected case insensitivity produces false positives. In analytics false positives are far preferred than false negatives. Just brainstorming here, what if we implemented mlike AND olike (my/ms SQL like and oracle like) (or cslike and cilike (case sensitive and case insensitive)) Then we have the like function just be a link to mlike or olike. Since Hive started out with case sensitivity, we could have like link to olike, but it can be changed to mlike in the hive-site. The point being, absence of specification, default to normal (LIKE = OLIKE Case sensitive) hive.like.insensitive = true make it MLIKE. Simple, doesn't change the default hive setting/behavior (out of the box), gives the option of global setting, and from a programing standpoint we are doing an if test on what LIKE links to, and we'd have olike and mlike available as UDFs when a user wanted to do the opposite of what the default setting is. > Like operator in Hive is case sensitive while in MySQL (and most likely other > DBs) it's case insensitive > -------------------------------------------------------------------------------------------------------- > > Key: HIVE-4070 > URL: https://issues.apache.org/jira/browse/HIVE-4070 > Project: Hive > Issue Type: Bug > Components: UDF > Affects Versions: 0.10.0 > Reporter: Mark Grover > Assignee: Mark Grover > Priority: Trivial > > Hive's like operator seems to be case sensitive. > See > https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLike.java#L164 > However, MySQL's like operator is case insensitive. I don't have other DB's > (like PostgreSQL) installed and handy but I am guessing their LIKE is case > insensitive as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira