eyalsatori commented on issue #2036:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/issues/2036#issuecomment-3622531203

   **Progress Update - Found the Main Cause of Performance Regression**
   
   **TL;DR** - The issue was in how I handled the 
[`make_word`](https://github.com/apache/datafusion-sqlparser-rs/blob/main/src/tokenizer.rs#L397)
 function. String borrowing improved performance from 2,915 µs to 2,801 µs (~4% 
improvement).
   
   **Root Cause**
   
   To avoid the `word.to_uppercase()` call in 
[`make_word`](https://github.com/apache/datafusion-sqlparser-rs/blob/main/src/tokenizer.rs#L397),
 I implemented a custom case-insensitive string comparison function. Profiling 
revealed this function was expensive and caused the performance regression.
   
   **Solution**
   
   Instead of using the `ALL_KEYWORDS` array, I created a `HashMap` with 
keywords stored as [`Unicase`](https://docs.rs/unicase/latest/unicase/) 
strings. The hash map is initialized once at runtime using 
[`OnceLock`](https://doc.rust-lang.org/std/sync/struct.OnceLock.html), giving 
us O(1) lookup performance.
   
   **Next Steps**
   
   While this shows only a ~4% improvement in the benchmark, I believe the 
real-world impact will be more significant. By dramatically reducing 
allocations, programs with more fragmented heaps should see better performance. 
I feel confident continuing in this direction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to