[ https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083236#comment-17083236 ]
David Mollitor edited comment on HIVE-23176 at 4/22/20, 1:54 PM: ----------------------------------------------------------------- [~kgyrtkirk] Thanks for the feedback. This feature is not standard. I discussed the motivation here: [http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E] There are two primary concerns: * If Hive is going to support UTF-8 in the same way other major vendors do, then there are almost no restrictions to what characters can be in a object identifier, so it is not possible to simply "detect" and is therefore ambiguous if a user wanted to use a Regex or a complex table name. * This feature accidentally added a bunch of weird edge cases where object identifier parsing takes different code paths This feature could be interesting, though since it's not a SQL standard, it's a bit of a Hive-only shortcut which can cause interoperability problems, but it is not currently implemented in a great way. It should not be reflected in the actual grammar of the SQL parser. To do implement such a feature, it would make sense that it be: (EDIT: based on discussions) * Extends the standard SQL grammar instead of overloading the existing was (Author: belugabehr): [~kgyrtkirk] Thanks for the feedback. This feature is not standard. I discussed the motivation here: [http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E] There are two primary concerns: * If Hive is going to support UTF-8 in the same way other major vendors do, then there are almost no restrictions to what characters can be in a object identifier, so it is not possible to simply "detect" and is therefore ambiguous if a user wanted to use a Regex or a complex table name. * This feature accidentally added a bunch of weird edge cases where object identifier parsing takes different code paths This feature could be interesting, though since it's not a SQL standard, it's a bit of a Hive-only shortcut which can cause interoperability problems, but it is not currently implemented in a great way. It should not be reflected in the actual grammar of the SQL parser. To do implement such a feature, it would make sense that it be: * Not part of the grammar * Configurable (enabled/disabled) for interpreting the literal object identifiers supplied in the SQL statement in the Java parser code * Applies only to back ticked object identifiers that are ASCII-only > Remove SELECT REGEX Column Feature > ---------------------------------- > > Key: HIVE-23176 > URL: https://issues.apache.org/jira/browse/HIVE-23176 > Project: Hive > Issue Type: Improvement > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Major > Labels: backwards-incompatible > Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, > HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch, HIVE-23176.4.patch > > > Remove the Hive feature: REGEX Column. > > Hive has this interesting feature for doing REGEX to SELECT multiple columns. > This needs to go. It is not SQL standard and as currently implemented, it > is impossible to determine if a column identifier is a REGEX or the actual > name of the column. If a column name is enclosed in back ticks then any > UTF-8 character is a valid table name. > > [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html] > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select] -- This message was sent by Atlassian Jira (v8.3.4#803005)