[ 
https://issues.apache.org/jira/browse/HIVE-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-23172:
-------------------------------------


> Quoted Backtick Columns Are Not Parsing Correctly
> -------------------------------------------------
>
>                 Key: HIVE-23172
>                 URL: https://issues.apache.org/jira/browse/HIVE-23172
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Critical
>
> I recently came across a weird behavior while examining failures of 
> {{special_character_in_tabnames_2.q}} while working on HIVE-23150. I was 
> surprised to see it fail because I couldn't see of any reason why it 
> should... it's doing pretty standard SQL statements just like every other 
> test, but for some reason this test is just a *little bit* differently than 
> most others and it brought this issue to light.
> Turns out,... the parsing of table names is pretty much wrong across the 
> board.
> The statement that caught my attention was this:
> {code:sql}
> DROP TABLE IF EXISTS `s/c`;
> {code}
> And here is the relevant grammar:
> {code:none}
> fragment
> RegexComponent
>     : 'a'..'z' | 'A'..'Z' | '0'..'9' | '_'
>     | PLUS | STAR | QUESTION | MINUS | DOT
>     | LPAREN | RPAREN | LSQUARE | RSQUARE | LCURLY | RCURLY
>     | BITWISEXOR | BITWISEOR | DOLLAR | '!'
>     ;
> Identifier
>     :
>     (Letter | Digit) (Letter | Digit | '_')*
>     | {allowQuotedId()}? QuotedIdentifier  /* though at the language level we 
> allow all Identifiers to be QuotedIdentifiers; 
>                                               at the API level only columns 
> are allowed to be of this form */
>     | '`' RegexComponent+ '`'
>     ;
> fragment    
> QuotedIdentifier 
>     :
>     '`'  ( '``' | ~('`') )* '`' { 
> setText(StringUtils.replace(getText().substring(1, getText().length() -1 ), 
> "``", "`")); }
>     ;
> {code}
> The mystery for me was that, for some reason, this String {{`s/c`}} was being 
> stripped of its back-ticks. Every other test I investigated did not have this 
> behavior, the back ticks were always preserved around the table name. The 
> main Hive Java code base would see the back-ticks and deal with it 
> internally. For HIVE-23150, I introduced some sanity checks and they were 
> failing because they were expecting the back ticks to be present.
> With the help of HIVE-23171 I finally figured it out. So, what I discovered 
> is that pretty much every table name is hitting the {{RegexComponent}} rule 
> and the back ticks are carried forward. However, {{`s/c`}} the forward slash 
> `/` is not allowable in {{RegexComponent}} so it hits on {{QuotedIdentifier}} 
> rule which is trimming the back ticks.
> I validated this by disabling {{QuotedIdentifier}}. When I did this, 
> {{`s/c`}} fails in error but {{`sc`}} parses successfully... because {{`sc`}} 
> is being treated as a {{RegexComponent}}.
> So, if you have {{allowQuotedId}} disabled, table names can only use the 
> characters defined in the {{RegexComponent}} rule (otherwise it errors), and 
> it will *not* strip the back ticks. If you have {{allowQuotedId}} enabled, 
> then if the table name has a character not specified in {{RegexComponent}}, 
> it will identify it as a table name and it *will* strip the back ticks, if 
> all the characters are part of {{RegexComponent}} then it will *not* strip 
> the back ticks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to