davisp opened a new issue, #1588: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1588
While working on #1587 I noticed that Instruments is showing `Token::make_word` as the second hottest single function, right after `alloc::raw_vec::finish_grow`. Looking into the implementation I saw that its just doing a binary search across all keywords to find if its a known keyword or not. This is a fairly classical case where we have a known set of strings and want to check if a given string is in that list. There are a bunch of ways that we could speed this up. This issue is to figure out a good compromise between those possible speedups and other project constraints like maintaining a `no_std` ability. My [first approach](https://github.com/apache/datafusion-sqlparser-rs/commit/4551933dc0a9e892e412be5ca0022a124859dad0) at speeding this up was to create a table for the first byte in every keyword to reduce the number of entries that need to be searched. This small optimization managed to shave off about 400ms of time (of the 1.4ish seconds total). However, there are other approaches that could speed this up even more. Either by generating parsing/lookup tables or using something like [phf](https://crates.io/crates/phf) to do the heavy lifting for us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
