# New Ticket Created by Leopold Toetsch # Please include the string: [perl #28494] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=28494 >
Attached patch: * adds a new test file for Unicode-related string tests * reimplements string_unescape_cstring which uses now ICU for the work * fixes a bug in string_compare with equally length strings
It's also by far more efficient then the old code.
TODO: move it out of string.c, docs.
Jeff, please have a look at it.
It looks very similar to what I had come up with. The only important differences are:
1) My version handles the case of code points > 0xFFFF as well. (The string_append_chr function encapsulates the logic of dealing with the "anything above 0xFF" case, but needs to be rewritten to improve efficiency.)
2) When I was implementing the previous version of string_unescape_cstring, I'm pretty sure I had a reason for doing that string_constant_copy at the end, rather than creating a constant string at the beginning. I'm not recalling 100% why, but I believe that there were problems in the case where the string has to expand its storage because there are characters > 0xFF, if had been created as a constant.
Just a tiny note:
instead of this: result->bufused = d * (had_int16 ? 2 : 1);
you can do this:
result->bufused = string_max_bytes(interpreter, result, result->strlen);
to update the bufused to match strlen.
I'm attaching a patch which contains the version I had written, and also includes my changes from [perl #28473], which I didn't see make it to the list. Take a look, and you can probably take the best parts of both--I'm sure there are a few places where your version is more efficient. (Also, I have the couple of bits which call directly into the ICU API factored out into string_primitives.c)
BTW, I have some benchmarks that I will clean up and send in to go with your tests.
JEff
unescaping-and-icu-config.patch
Description: Binary data