mvzink commented on code in PR #1538:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1538#discussion_r1859136590


##########
src/tokenizer.rs:
##########
@@ -1202,6 +1202,9 @@ impl<'a> Tokenizer<'a> {
                             }
                         }
                         Some(' ') => Ok(Some(Token::AtSign)),
+                        Some('\'') => Ok(Some(Token::AtSign)),
+                        Some('\"') => Ok(Some(Token::AtSign)),
+                        Some('`') => Ok(Some(Token::AtSign)),

Review Comment:
   I thought so at first too, but variables in MySQL start with `@` and I don't 
think there's a way to differentiate them during tokenization. We would want to 
tokenize `@foo` as a single `Token::Word`, but `@'foo'` (in my understanding) 
as `[Token::AtSign, Token::SingleQuotedString]`. In this check during 
tokenization, we aren't differentiating between "schema names etc." and 
"variable names", just "identifiers". Maybe there is something more correct to 
do here, like checking for `self.is_identifier_continuation(sch)` and ruling 
out `'` there. But my approach of just adding cases for quotes is based on the 
fact that as far as I can tell, no dialect allows quote marks within an 
identifier unless it's in an outer quotation, which doesn't get tokenized by 
this path.
   
   I think an approach like `self.is_identifier_continuation(sch)` would 
effectively be the same but could allow customization per dialect, but it has 
another pitfall: say in addition to quote marks, a dialect doesn't allow `_` in 
an identifier. Then should we fail to tokenize `@_` with an error, or tokenize 
them separately as `[@, _]`? If the former, we would then need another special 
case for quote marks, and the latter seems wrong to me.
   
   One other example in support of tokenizing like `[Token::AtSign, 
Token::SingleQuotedString]`, just in case it's not clear why I think that's 
right: in Postgres, `@` is absolute value, and it will implicitly cast the 
operand to a numeric type, so that `select @'1';` (no space after `@`) treats 
`'1'` as a string and casts it to double.
   
   All that in mind, I still think my current approach is the simplest correct 
I can think of; let me know what you think!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to