Re: GUILE 2/3 and string encoding cost

David Kastrup Wed, 22 Jan 2020 12:21:20 -0800

Han-Wen Nienhuys <hanw...@gmail.com> writes:

> On Wed, Jan 22, 2020 at 12:01 PM David Kastrup <d...@gnu.org> wrote:
>
>> Han-Wen Nienhuys <hanw...@gmail.com> writes:
>>
>> > I looked a bit through the GUILE source code to see what is going on.
>> >
>> > I believe our current hypothesis (LilyPond's slowdown is caused by
>> > expensive unicode transcoding into 32-bit strings) is incorrect.
>> >
>> > If you look into the source code, you can see that the UTF-8 -> SCM
>> > conversion checks if there are any code points over 255
>> >
>> >
>> >
>> https://git.savannah.nongnu.org/cgit/guile.git//tree/libguile/strings.c/?id=1b8e9ca0e37fab366435436995248abdfc780a10#n1620
>> >
>> > if there aren't, it uses Latin1 encoding ("narrow == 1") to encode the
>> > string as a normal byte array. This code walks the string twice, but that
>> > is very cheap due to CPU cache locality, so it should be
>> > essentially equivalent to whatever GUILE 1.8 was doing.
>>
>> GUILE 1.8 did not walk the string even once
>>
>
> GUILE 1.8 walks it once when you do memcpy.


Ok, but that's sort of a cheap walk.

>> > Even so, if the input flie does use UTF-8, there should be little
>> > overhead, because the number of texts that we process is always
>> > small. LilyPond is not a text processor.
>> >
>> > So, what hard data do we have on GUILE 2/3 slowness, and what does
>> > that data say?
>>
>> That data says "humongous slowdown".  There is not much more than
>> speculation what this is caused by as far as I know.
>>
>>
> Do we have a standardized test file for benchmarking performance?

input/regression/mozart-hrn-3.ly possibly, but it's not particularly
large.

-- 
David Kastrup

Re: GUILE 2/3 and string encoding cost

Reply via email to