On Apr 13, 2004, at 8:35 AM, Leopold Toetsch wrote:


Jeff Clites <[EMAIL PROTECTED]> wrote:

One other thing occurred to me, to save a few bytes: When upscaling,
rather than passing clength, we can pass (result->strlen + number of
bytes left in cstring).

If I read that correctly, s->strlen (or clength) is the desired length. - on creation a onebyte STRING is created with clength (-1) - on upscaling this is still the length which then get's doubled

When you start off, clength is the right thing, but once you hit an escape sequence, you find out that some of the input bytes were part of a single escape sequence. That is, consider this string which needs unescaping:


ab\x{212b}de  //clength is 12
----------^

When you get to the section of code that is about to trigger the upscale, you'll have 2 characters ("a" and "b") already in your accumulated string, you're about the add the Angstrom character, and you know you only have 2 more bytes to parse. So at that point, you know the max characters you could end up with is 5 (2 + 1 + 2), so when you call upscale, you could pass in 5 rather than 12. That's not a huge savings, but the nice thing in this case is that you will have originally allocated 12 bytes for the result string, and while upscaling you're saying you need room for 5 character == 10 bytes for rep-2, so the actual allocated storage doesn't have to be expanded. (If you passed in 12, it would make room for 24 bytes in the upscaled string, even though it didn't need them.)

Not an enormous savings, but worth the tiny bit of math, probably, since we'd know for sure that we'd be allocating more storage than we need.

[Note: _string_upscale is currently simple, but not optimized. We should enhance it for the case where we can upscale in place because we know that we have enough storage already allocated accommodate max(passed in length, current length). That's what would let the above be a savings.]

JEff



Reply via email to