Re: potential performance improvement in sun.nio.cs.UTF_8

2025-05-12 Thread Johannes Döbler
Hi Chen, thanks for your feedback. Indeed it does not make sense to optimize UTF-8 processing for a rather vague set of beneficiaries when there are realistic counterexamples. Still I don't want to give up on my idea too early :-) I tried this modification: * harvest pure ASCII-bytes before

Re: potential performance improvement in sun.nio.cs.UTF_8

2025-05-12 Thread Chen Liang
Hi Johannes, I think the 3rd scenario you've mentioned is likely: we have Swedish or other languages that extend the ascii encoding with diacritics, which are non-ascii bytes are frequently interrupting ascii. For non-ascii heavy languages like Chinese, sometimes the text can include spaces or a