romanb opened a new pull request, #1803: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1803
This PR is a follow-up to https://github.com/apache/datafusion-sqlparser-rs/pull/856. The remaining problem is that queries with qualified identifiers having numeric prefixes currently fail to parse due to incorrect tokenization. For example: ```sql SELECT t.123abc FROM my_table t ``` This is currently tokenized as ``` ... "t" (Word) ".123abc" (Number) ... ``` whereas it should be tokenized as ``` "t" (Word) "." (Period) "123abc" (Word) ``` Of course, the potential ambiguity of identifiers of the form `12e34`, i.e. that on their own could be seen as number tokens, also needs to be taken into account. If `12e34` is unqualified, it should be tokenized as a number (this is already the case) but in `SELECT t.12e34 FROM my_table t`, it should be tokenized as a word as well, to match valid MySQL semantics. The only option I saw to solve these problems unambiguously was to give the private `next_token` function in the `Tokenizer` as context the previous token in the second argument, which can then be used to disambiguate these cases and correctly decide what type of token to produce for dialects that support numeric prefixes. I included commentary and tests to help further clarify the situation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org