eyalleshem commented on issue #2036: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2036#issuecomment-3678852408
**Progress Update: 60% Performance Improvement with Borrowed Tokenizer** I've opened PR #2136 (currently WIP) implementing zero-copy borrowing for strings, identifiers, and comments in the tokenizer: https://github.com/apache/datafusion-sqlparser-rs/pull/2136 ## Performance Results Using `cargo bench` with Criterion on the same ~30K string query from my previous tests: ``` tokenization/tokenize_complex_sql time: [274.24 µs 274.74 µs 275.31 µs] change: [−59.937% −59.826% −59.716%] (p = 0.00 < 0.05) Performance has improved. ``` **~60% faster tokenization** (from ~683 µs to ~275 µs) ## Why Different from Previous Measurements? My earlier manual timing measurements showed inconsistent results. I've now switched to `cargo bench` with Criterion, which provides: - **Warmup iterations** to reach steady-state performance - **Statistical analysis** that identifies and filters outliers - **Consistent, reproducible methodology** ## Reproducibility The benchmark is included in PR #2136. To run the comparison locally: - **With borrowing**: PR #2136 - **Without borrowing (baseline)**: https://github.com/eyalleshem/sqlparser-rs/tree/benchmark_base -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
