davisp opened a new issue, #1588:
URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1588

   While working on #1587 I noticed that Instruments is showing 
`Token::make_word` as the second hottest single function, right after 
`alloc::raw_vec::finish_grow`.
   
   Looking into the implementation I saw that its just doing a binary search 
across all keywords to find if its a known keyword or not. This is a fairly 
classical case where we have a known set of strings and want to check if a 
given string is in that list. There are a bunch of ways that we could speed 
this up. This issue is to figure out a good compromise between those possible 
speedups and other project constraints like maintaining a `no_std` ability.
   
   My [first 
approach](https://github.com/apache/datafusion-sqlparser-rs/commit/4551933dc0a9e892e412be5ca0022a124859dad0)
 at speeding this up was to create a table for the first byte in every keyword 
to reduce the number of entries that need to be searched. This small 
optimization managed to shave off about 400ms of time (of the 1.4ish seconds 
total).
   
   However, there are other approaches that could speed this up even more. 
Either by generating parsing/lookup tables or using something like 
[phf](https://crates.io/crates/phf) to do the heavy lifting for us.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to