Hi Chen,

thanks for your feedback. Indeed it does not make sense to optimize UTF-8 processing for a rather vague set of beneficiaries when there are realistic counterexamples.
Still I don't want to give up on my idea too early :-)
I tried this modification:

 * harvest pure ASCII-bytes before the loop (as in the current decoder)
 * within the loop if a 1-byte-UTF8-sequence is recognized invoke
   JLA.decodeAscii but only limited times (e.g. 10), else just copy the
   byte to the output buffer (like in the current implementation)
 * in my benchmark timings this give the JLA.decodeAscii-boost for
   inputs which have rather long ASCII input sequences, whereas not
   degrading performance due to JLA call overhead in other scenarios

Thanks
Johannes

Reply via email to