Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-10 Thread Larry Wall
Also consider this recent addition to S02: Author: larry Date: Thu Jan 10 13:05:42 2008 New Revision: 14486 Modified: doc/trunk/design/syn/S02.pod Log: Added some random thoughts about performance implications of grapheme view Modified: doc/trunk/design/syn/S02.pod =

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-06 Thread ajr
> >> > Jarkko's view was that if he were doing Perl 5 Unicode again he would >> opt for fixed width 32 bit rather than UTF-8, It seems to be a general principle of system design that the best way to process irregular and unpredictable things, is to grab them as close to the outside of the system a

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Nicholas Clark
On Sat, Jan 05, 2008 at 12:19:14PM -0600, Patrick R. Michaud wrote: > On Sat, Jan 05, 2008 at 11:09:57AM +, Nicholas Clark wrote: > > On Sat, Jan 05, 2008 at 02:11:35AM -0800, chromatic wrote: > > Jarkko's view was that if he were doing Perl 5 Unicode again he would opt > > for > > fixed widt

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Patrick R. Michaud
On Sat, Jan 05, 2008 at 11:09:57AM +, Nicholas Clark wrote: > On Sat, Jan 05, 2008 at 02:11:35AM -0800, chromatic wrote: > > On Saturday 05 January 2008 01:26:48 Patrick R. Michaud wrote: > > > > I think it will still be worthwhile to investigate > > > converting strings into a fixed-width enc

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Patrick R. Michaud
On Sat, Jan 05, 2008 at 02:11:35AM -0800, chromatic wrote: > On Saturday 05 January 2008 01:26:48 Patrick R. Michaud wrote: > > > As of r24557 I've rewritten find_cclass and find_not_cclass > > so that they use a string iterator instead of repeated calls > > to ENCODING_GET_CODEPOINT. I also impr

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Patrick R. Michaud
On Sat, Jan 05, 2008 at 12:17:00PM +0100, Cosimo Streppone wrote: > Patrick wrote: > > > >[...] I also improved utf8_set_position > >a bit so that it doesn't always have to restart position > >counting from the beginning of the string. As a result, > >compiling the actions.pl script on my machine

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Cosimo Streppone
Cosimo wrote: Patrick wrote: [...] I also improved utf8_set_position What happens if string already has `i->charpos > pos' ? [... /me reads again the diff ...] I realized while writing this that if `i->charpos > pos'. you simply end up re-scanning the string from the start. Is that correct

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Cosimo Streppone
Patrick wrote: [...] I also improved utf8_set_position a bit so that it doesn't always have to restart position counting from the beginning of the string. As a result, compiling the actions.pl script on my machine goes from 39s to a little over 28s -- about a 25% speed increase. I have a doub

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Nicholas Clark
On Sat, Jan 05, 2008 at 02:11:35AM -0800, chromatic wrote: > On Saturday 05 January 2008 01:26:48 Patrick R. Michaud wrote: > > I think it will still be worthwhile to investigate > > converting strings into a fixed-width encoding of some sort > > instead of performing scans on variable-width encod

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread chromatic
On Saturday 05 January 2008 01:26:48 Patrick R. Michaud wrote: > As of r24557 I've rewritten find_cclass and find_not_cclass > so that they use a string iterator instead of repeated calls > to ENCODING_GET_CODEPOINT. I also improved utf8_set_position > a bit so that it doesn't always have to rest

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread chromatic
On Friday 04 January 2008 22:29:40 Patrick R. Michaud wrote: > Actually, the perl6 compiler and PCT are really agnostic about utf8 -- > they rely on Parrot to handle any transcoding issues. They try > to keep strings as ASCII whenever possible, and only use unicode:"..." > when there's a characte

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-05 Thread Patrick R. Michaud
On Sat, Jan 05, 2008 at 01:09:01AM -0600, Patrick R. Michaud wrote: > On Sat, Jan 05, 2008 at 12:29:40AM -0600, Patrick R. Michaud wrote: > > On Fri, Jan 04, 2008 at 07:43:18PM -0800, chromatic wrote: > > > (Callgrind suggests that about 45% of the running time of > > > the NQP part of the build c

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-04 Thread Patrick R. Michaud
On Sat, Jan 05, 2008 at 12:29:40AM -0600, Patrick R. Michaud wrote: > On Fri, Jan 04, 2008 at 07:43:18PM -0800, chromatic wrote: > > (Callgrind suggests that about 45% of the running time of > > the NQP part of the build comes from utf8_set_position > > and utf8_skip_forward.) > > Even better mi

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-04 Thread Patrick R. Michaud
On Fri, Jan 04, 2008 at 07:43:18PM -0800, chromatic wrote: > I just ran a little experiment. I patched Parrot::HLLCompiler to transcode > the source code it reads to UCS-2 before parsing and compiling it, then I > profiled building perl6.pbc. > > Without this hack, the build takes around 20 sec

Repeated Loopy Variable Width String Character Access is Slooooow-ish

2008-01-04 Thread chromatic
I just ran a little experiment. I patched Parrot::HLLCompiler to transcode the source code it reads to UCS-2 before parsing and compiling it, then I profiled building perl6.pbc. Without this hack, the build takes around 20 seconds, mostly running NQP over languages/perl6/src/parser/actions.pm.