alamb opened a new issue, #1558: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1558
Part of https://github.com/apache/datafusion-sqlparser-rs/issues/1557 While looking at the flamegraph posted to that ticket, one obvious source of improvement is to stop copying each token so much. Functions like `peek_token()`, and `expect_token()` copy the `Token` (which often includes a string) ```rust impl Parser { // returns an OWNED TokenWithLocation pub fn peek_token(&self) -> TokenWithLocation { ... } } ``` For example https://github.com/apache/datafusion-sqlparser-rs/blob/2e90e105a74bf9f50f2bad6c22992759ddb06880/src/parser/mod.rs#L3314-L3334 In the above code, the `non_whitespace.cloned()` call actually copies the token (and string) even when many callsites only need to check what it is and could get by with a & ## Suggestions for improvements The biggest suggestion is to stop copying each token unless it actually needed a clone. For example instead of this ```rust impl Parser { // returns an OWNED TokenWithLocation pub fn peek_token(&self) -> TokenWithLocation { ... } } ``` Make it like this ```rust impl Parser { // returns an reference to TokenWithLocation // (the & means no copying!) pub fn peek_token_ref(&self) -> &TokenWithLocation { ... } } ``` I think we could do this without massive breaking changes by adding new functions like `peek_token_ref()` that returned a reference to the token rather than a clone ## Ideas I played around with it a bit and here is what I came up with. I think a similar pattern could be applied to other places ```rust impl Parser { ... /// Return the first non-whitespace token that has not yet been processed /// (or None if reached end-of-file) and mark it as processed. OK to call /// repeatedly after reaching EOF. pub fn next_token(&mut self) -> TokenWithLocation { self.next_token_ref().clone() } /// Return the first non-whitespace token that has not yet been processed /// (or None if reached end-of-file) and mark it as processed. OK to call /// repeatedly after reaching EOF. pub fn next_token_ref(&mut self) -> &TokenWithLocation { loop { self.index += 1; // skip whitespace if let Some(TokenWithLocation { token: Token::Whitespace(_), location: _, }) = self.tokens.get(self.index - 1) { continue } break; } if (self.index - 1) < self.tokens.len() { &self.tokens[self.index - 1] } else { eof_token() } } ... } static EOF_TOKEN: OnceLock<TokenWithLocation> = OnceLock::new(); fn eof_token() -> &'static TokenWithLocation { EOF_TOKEN.get_or_init(|| { TokenWithLocation::wrap(Token::EOF) }) } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org