On Tue, Oct 7, 2014 at 3:57 PM, Henri Sivonen <[email protected]> wrote:
> > UTF-8 strings will mean that we will have to copy all non-7-bit ASCII > strings between the DOM and JS. > > Not if JS stores strings as WTF-8. I think it would be tragic not to > bother to try to make the JS engine use WTF-8 when having the > opportunity to fix things and thereby miss the opportunity to use > UTF-8 in the DOM in Servo. UTF-16 is such a mistake. > When I added Latin1 to SpiderMonkey, we did consider using UTF8 but it's complicated. As mentioned, we have to ensure charAt/charCodeAt stay fast (crypto benchmarks etc rely on this, sadly). Many other string operations are also very perf-sensitive and extra branches in tight loops can hurt a lot. Also, the regular expression engine currently emits JIT code to load and compare multiple characters at once. All this is fixable to work on WTF-8 strings, but it's a lot of work and performance is a risk. Also note that the copying we do for strings passed from JS to Gecko is not only necessary for moving GC, but also to inflate Latin1 strings (= most strings) to TwoByte Gecko strings. If Servo or Gecko could deal with both Latin1 and TwoByte strings, we could think about ways to avoid the copying. Though, as Boris said, I'm not aware of any (non-micro-)benchmark regressions from the copying so I don't expect big wins from optimizing this. But again, doing a Latin1 -> TwoByte copy is a very tight loop that compilers can probably vectorize. UTF8/WTF8 -> TwoByte is more complicated and probably slower. Jan _______________________________________________ dev-servo mailing list [email protected] https://lists.mozilla.org/listinfo/dev-servo

