Re: GUILE 2/3 and string encoding cost

Han-Wen Nienhuys Wed, 22 Jan 2020 12:08:07 -0800

On Wed, Jan 22, 2020 at 12:01 PM David Kastrup <d...@gnu.org> wrote:

> Han-Wen Nienhuys <hanw...@gmail.com> writes:
>
> > I looked a bit through the GUILE source code to see what is going on.
> >
> > I believe our current hypothesis (LilyPond's slowdown is caused by
> > expensive unicode transcoding into 32-bit strings) is incorrect.
> >
> > If you look into the source code, you can see that the UTF-8 -> SCM
> > conversion checks if there are any code points over 255
> >
> >
> >
> https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/strings.c/?id=1b8e9ca0e37fab366435436995248abdfc780a10#n1620
> >
> > if there aren't, it uses Latin1 encoding ("narrow == 1") to encode the
> > string as a normal byte array. This code walks the string twice, but that
> > is very cheap due to CPU cache locality, so it should be
> > essentially equivalent to whatever GUILE 1.8 was doing.
>
> GUILE 1.8 did not walk the string even once
>


GUILE 1.8 walks it once when you do memcpy.


> > Even so, if the input flie does use UTF-8, there should be little
> > overhead, because the number of texts that we process is always
> > small. LilyPond is not a text processor.
> >
> > So, what hard data do we have on GUILE 2/3 slowness, and what does
> > that data say?
>
> That data says "humongous slowdown".  There is not much more than
> speculation what this is caused by as far as I know.
>
>
Do we have a standardized test file for benchmarking performance?

-- 
Han-Wen Nienhuys - hanw...@gmail.com - http://www.xs4all.nl/~hanwen

Re: GUILE 2/3 and string encoding cost

Reply via email to