On Apr 12, 2004, at 9:54 AM, Leopold Toetsch (via RT) wrote:

# New Ticket Created by  Leopold Toetsch
# Please include the string:  [perl #28494]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=28494 >


Attached patch: * adds a new test file for Unicode-related string tests * reimplements string_unescape_cstring which uses now ICU for the work * fixes a bug in string_compare with equally length strings

It's also by far more efficient then the old code.

TODO: move it out of string.c, docs.

Jeff, please have a look at it.

It looks very similar to what I had come up with. The only important differences are:

1) My version handles the case of code points > 0xFFFF as well. (The string_append_chr function encapsulates the logic of dealing with the "anything above 0xFF" case, but needs to be rewritten to improve efficiency.)

2) When I was implementing the previous version of string_unescape_cstring, I'm pretty sure I had a reason for doing that string_constant_copy at the end, rather than creating a constant string at the beginning. I'm not recalling 100% why, but I believe that there were problems in the case where the string has to expand its storage because there are characters > 0xFF, if had been created as a constant.

Just a tiny note:

instead of this:
     result->bufused = d * (had_int16 ? 2 : 1);

you can do this:
result->bufused = string_max_bytes(interpreter, result, result->strlen);

to update the bufused to match strlen.

I'm attaching a patch which contains the version I had written, and also includes my changes from [perl #28473], which I didn't see make it to the list. Take a look, and you can probably take the best parts of both--I'm sure there are a few places where your version is more efficient. (Also, I have the couple of bits which call directly into the ICU API factored out into string_primitives.c)

BTW, I have some benchmarks that I will clean up and send in to go with your tests.

JEff


Attachment: unescaping-and-icu-config.patch
Description: Binary data


Reply via email to