On Fri, Jan 04, 2008 at 07:43:18PM -0800, chromatic wrote: > I just ran a little experiment. I patched Parrot::HLLCompiler to transcode > the source code it reads to UCS-2 before parsing and compiling it, then I > profiled building perl6.pbc. > > Without this hack, the build takes around 20 seconds, mostly running NQP over > languages/perl6/src/parser/actions.pm. > > With the hack, the build takes around 12 seconds.
Interesting. > Now the tests don't all pass, and I think that this is because Perl 6 intends > to store its identifiers as UTF-8, and comparing the two is not exact. Actually, the perl6 compiler and PCT are really agnostic about utf8 -- they rely on Parrot to handle any transcoding issues. They try to keep strings as ASCII whenever possible, and only use unicode:"..." when there's a character that can't be encoded in ascii. An odd(?) feature of Parrot is that if any of the operands to a string opcode has a utf8 encoding, then the result ends up being marked as utf8, whether it needs to be or not. I don't know how ucs2 affects this -- but if the tests aren't passing after your hack then I suspect that Parrot is unable to do certain operations (e.g., compare) on ucs2 strings. > Caveats aside, it does seem like there's a point at which converting a string > to a fixed-width encoding before performing indexed access may improve > performance notably. Indeed. Until Parrot's non-ICU implementation becomes a bit more robust, and when we figure out what is causing the tests to fail, we could have HLLCompiler check for the presence of ICU and transcode the source to ucs2 prior to parsing. It would also be a good idea to get the 'escape' method of CodeString to somehow produce its strings using ucs2 instead of utf8 encoding (although imcc doesn't really support a good way to do that yet). > (Callgrind suggests that about 45% of the running time of the NQP part of the > build comes from utf8_set_position and utf8_skip_forward.) Even better might be to figure out why utf8_set_position and utf8_skip_forward are slow and see if those can be sped up somehow. Pm