Hi Chen,
thanks for your feedback. Indeed it does not make sense to optimize
UTF-8 processing for a rather vague set of beneficiaries when there are
realistic counterexamples.
Still I don't want to give up on my idea too early :-)
I tried this modification:
* harvest pure ASCII-bytes before the loop (as in the current decoder)
* within the loop if a 1-byte-UTF8-sequence is recognized invoke
JLA.decodeAscii but only limited times (e.g. 10), else just copy the
byte to the output buffer (like in the current implementation)
* in my benchmark timings this give the JLA.decodeAscii-boost for
inputs which have rather long ASCII input sequences, whereas not
degrading performance due to JLA call overhead in other scenarios
Thanks
Johannes