LucaCappelletti94 commented on code in PR #2077:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/2077#discussion_r2552413997


##########
src/tokenizer.rs:
##########
@@ -896,14 +929,37 @@ impl<'a> Tokenizer<'a> {
             line: 1,
             col: 1,
         };
+        let mut prev_keyword = None;
+        let mut cs_handler = CopyStdinHandler::default();
 
         let mut location = state.location();
-        while let Some(token) = self.next_token(&mut state, buf.last().map(|t| 
&t.token))? {
-            let span = location.span_to(state.location());
+        while let Some(token) = self.next_token(
+            &mut location,
+            &mut state,
+            buf.last().map(|t| &t.token),
+            prev_keyword,
+            false,
+        )? {
+            if let Token::Word(Word { keyword, .. }) = &token {
+                if *keyword != Keyword::NoKeyword {
+                    prev_keyword = Some(*keyword);
+                }
+            }
 
+            let span = location.span_to(state.location());
+            cs_handler.update(&token);
             buf.push(TokenWithSpan { token, span });
-
             location = state.location();
+
+            if cs_handler.is_in_copy_from_stdin() {

Review Comment:
   It would work for correctly formatted TSVs for which the separator and 
number of header columns is provided, but would have weird behaviours with more 
cursed string inputs, for instance an empty row would be just a sequence of 
tabs.
   
   I think it may be okay to add this logic duplication for the tokeniser in 
these specific cases because we are dealing with something that is not SQL, and 
breaks otherwise valid assumptions.
   
   
   Ideally, in the future, we might be able to make the tokeniser and parser 
co-inform each other of the current context while parsing so to avoid any such 
code duplication and allow for streaming parsing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to