iffyio commented on code in PR #2034:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/2034#discussion_r2419908820


##########
src/tokenizer.rs:
##########
@@ -936,10 +939,16 @@ impl<'a> Tokenizer<'a> {
 
     /// Get the next token or return None
     fn next_token(
-        &self,
+        &mut self,
         chars: &mut State,
         prev_token: Option<&Token>,
     ) -> Result<Option<Token>, TokenizerError> {
+        // Return any previously injected tokens first

Review Comment:
   impl wise I'm thinking we can do something like the following that would be 
less invasive to the tokenizer:
   
   If we take as an example `/*!50110 KEY_BLOCK_SIZE = 1024*/` - today that 
will get parse as a `Token::MultilineComment("!50110 KEY_BLOCK_SIZE = 1024")` - 
which we receive [here in the tokenizer 
loop](https://github.com/altmannmarcelo/datafusion-sqlparser-rs/blob/f96249cc1d52d34edcfb0ca246bdd78f62552b91/src/tokenizer.rs#L907).
   
   Then the idea would be to essentially re-tokenize that string and add to the 
tokens buffer.
   
   So something like this:
   
   ```rust
   // we check if this is a token containing optimizer hints
   match token {
        Token::Whitespace(Whitespace::MultiLineComment(comment)) if 
self.dialect.supports_multiline_comment_hints() && s.starts_with("!") => {
       // re-tokenize the hints
        buf.extend(tokenize_comment_hints(comment, span)?);
        token => {
                buf.push(TokenWithSpan{ token, span })
        }
   }
   
   // here we reuse existing tokenizer machinery to re-tokenize the hints.
   fn tokenize_comment_hints(&self, hints: String, span: Span, tokens: &mut 
Vec<TokenWithSpan>) -> Result<()> {
        let mut state = State {
                peekable: hints.chars().peekable(),
                line: span.start.line,
                col: span.start.column,
        };
   
        while let Some(token) = self.next_token(&mut state, None)? {
                tokens.push(token);
        }
   
        Ok(())
   }
   ```
   
   



##########
src/dialect/mod.rs:
##########
@@ -898,6 +898,12 @@ pub trait Dialect: Debug + Any {
         false
     }
 
+    /// Returns true if the dialect supports hint and C-style comments
+    /// e.g. `/*! hint */`
+    fn supports_c_style_hints(&self) -> bool {

Review Comment:
   ```suggestion
       /// Returns true if the dialect supports optimizer hints in multiline 
comments
       /// e.g. `/*!50110 KEY_BLOCK_SIZE = 1024*/`
       fn supports_multiline_comment_hints(&self) -> bool {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to