[ https://issues.apache.org/jira/browse/HIVE-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Mollitor reassigned HIVE-23172: ------------------------------------- > Quoted Backtick Columns Are Not Parsing Correctly > ------------------------------------------------- > > Key: HIVE-23172 > URL: https://issues.apache.org/jira/browse/HIVE-23172 > Project: Hive > Issue Type: Improvement > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Critical > > I recently came across a weird behavior while examining failures of > {{special_character_in_tabnames_2.q}} while working on HIVE-23150. I was > surprised to see it fail because I couldn't see of any reason why it > should... it's doing pretty standard SQL statements just like every other > test, but for some reason this test is just a *little bit* differently than > most others and it brought this issue to light. > Turns out,... the parsing of table names is pretty much wrong across the > board. > The statement that caught my attention was this: > {code:sql} > DROP TABLE IF EXISTS `s/c`; > {code} > And here is the relevant grammar: > {code:none} > fragment > RegexComponent > : 'a'..'z' | 'A'..'Z' | '0'..'9' | '_' > | PLUS | STAR | QUESTION | MINUS | DOT > | LPAREN | RPAREN | LSQUARE | RSQUARE | LCURLY | RCURLY > | BITWISEXOR | BITWISEOR | DOLLAR | '!' > ; > Identifier > : > (Letter | Digit) (Letter | Digit | '_')* > | {allowQuotedId()}? QuotedIdentifier /* though at the language level we > allow all Identifiers to be QuotedIdentifiers; > at the API level only columns > are allowed to be of this form */ > | '`' RegexComponent+ '`' > ; > fragment > QuotedIdentifier > : > '`' ( '``' | ~('`') )* '`' { > setText(StringUtils.replace(getText().substring(1, getText().length() -1 ), > "``", "`")); } > ; > {code} > The mystery for me was that, for some reason, this String {{`s/c`}} was being > stripped of its back-ticks. Every other test I investigated did not have this > behavior, the back ticks were always preserved around the table name. The > main Hive Java code base would see the back-ticks and deal with it > internally. For HIVE-23150, I introduced some sanity checks and they were > failing because they were expecting the back ticks to be present. > With the help of HIVE-23171 I finally figured it out. So, what I discovered > is that pretty much every table name is hitting the {{RegexComponent}} rule > and the back ticks are carried forward. However, {{`s/c`}} the forward slash > `/` is not allowable in {{RegexComponent}} so it hits on {{QuotedIdentifier}} > rule which is trimming the back ticks. > I validated this by disabling {{QuotedIdentifier}}. When I did this, > {{`s/c`}} fails in error but {{`sc`}} parses successfully... because {{`sc`}} > is being treated as a {{RegexComponent}}. > So, if you have {{allowQuotedId}} disabled, table names can only use the > characters defined in the {{RegexComponent}} rule (otherwise it errors), and > it will *not* strip the back ticks. If you have {{allowQuotedId}} enabled, > then if the table name has a character not specified in {{RegexComponent}}, > it will identify it as a table name and it *will* strip the back ticks, if > all the characters are part of {{RegexComponent}} then it will *not* strip > the back ticks. -- This message was sent by Atlassian Jira (v8.3.4#803005)