[PR] Fix tokenization of qualified identifiers with numeric prefix. [datafusion-sqlparser-rs]

via GitHub Wed, 09 Apr 2025 11:53:54 -0700


romanb opened a new pull request, #1803:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1803


   This PR is a follow-up to 
https://github.com/apache/datafusion-sqlparser-rs/pull/856. The remaining 
problem is that queries with qualified identifiers having numeric prefixes 
currently fail to parse due to incorrect tokenization. For example:
   ```sql
   SELECT t.123abc FROM my_table t 
   ```
   This is currently tokenized as
   ```
   ...
   "t" (Word)
   ".123abc" (Number)
   ...
   ```
   whereas it should be tokenized as
   ```
   "t" (Word)
   "." (Period)
   "123abc" (Word)
   ```
   Of course, the potential ambiguity of identifiers of the form `12e34`, i.e. 
that on their own could be seen as number tokens, also needs to be taken into 
account. If `12e34` is unqualified, it should be tokenized as a number (this is 
already the case) but in `SELECT t.12e34 FROM my_table t`, it should be 
tokenized as a word as well, to match valid MySQL semantics.
   
   The only option I saw to solve these problems unambiguously was to give the 
private  `next_token` function in the `Tokenizer` as context the previous token 
in the second argument, which can then be used to disambiguate these cases and 
correctly decide what type of token to produce for dialects that support 
numeric prefixes. I included commentary and tests to help further clarify the 
situation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] Fix tokenization of qualified identifiers with numeric prefix. [datafusion-sqlparser-rs]

Reply via email to