Paolo Bonzini wrote:
> >> The point is that u8_mbtouc will look only one byte past the end of
> >> a (valid or invalid) character
> >
> > This is not guaranteed. If you call u8_mbtouc(&uc, needle, 4) you are
> > asserting that needle[0..3] are valid memory addresses.
>
> In practice that's not wha
On 07/31/2010 09:35 PM, Bruno Haible wrote:
Hi Paolo,
I pushed it now, thanks for the review!
After I added some more unit tests for u8_strchr, I see crashes and assertion
failures:
1)
mem = (UNIT *) (page_boundary - 2 * sizeof (UNIT));
mem[0] = 0x50;
mem[1] = 0;
Hi Paolo,
> I pushed it now, thanks for the review!
After I added some more unit tests for u8_strchr, I see crashes and assertion
failures:
1)
mem = (UNIT *) (page_boundary - 2 * sizeof (UNIT));
mem[0] = 0x50;
mem[1] = 0;
ASSERT (u8_strchr (mem, 0x123) == NULL);
On 07/29/2010 06:00 PM, Pádraig Brady wrote:
On 29/07/10 16:29, Paolo Bonzini wrote:
if (!needle[0])
return haystack;
n = u8_mbtouc(&uc, needle, 4);
^
u8_strmbtouc here as suggested by Bruno. u8_strmbtouc also checks
itself for NULL, and returns zero in that cas
On 07/29/2010 05:36 PM, Bruno Haible wrote:
You should better use u8_strmbtouc in this case.
Aha, that's true. This function is a better match, but only because it
supports CONFIG_UNICODE_SAFETY and is faster in that case (because it
can just compare with zero).
Still, without safety u8_st
On 07/29/2010 05:36 PM, Bruno Haible wrote:
Paolo Bonzini wrote:
The point is that u8_mbtouc will look only one byte past the end of
a (valid or invalid) character
This is not guaranteed. If you call u8_mbtouc(&uc, needle, 4) you are
asserting that needle[0..3] are valid memory addresses.
In
Paolo Bonzini wrote:
> The point is that
> u8_mbtouc will look only one byte past the end of a (valid or invalid)
> character
This is not guaranteed. If you call u8_mbtouc(&uc, needle, 4)
you are asserting that needle[0..3] are valid memory addresses.
If it's not the case, u8_mbtouc may crash.
Yo
On 07/29/2010 04:32 PM, Pádraig Brady wrote:
No! You can cheat!:) Just pass 4 to u8_mbtouc. There will be no
out-of-bounds access: if the string ends before that, you get either a
complete character or an EILSEQ.
You don't get EILSEQ for u8_mbtouc(&uc,"aa",6);
But you get a complete character
On 07/29/2010 12:40 PM, Pádraig Brady wrote:
I thought of that but thought it would be too much overhead for the
general case since it seemed like it would need strlen, u8_mbsnlen& u8_mbtouc,
with conversions back and forth.
No! You can cheat! :) Just pass 4 to u8_mbtouc. There will be no
o
On 29/07/10 10:26, Paolo Bonzini wrote:
> On 07/29/2010 11:05 AM, Pádraig Brady wrote:
>>> I would better recommend to use
>>> the u8_strstr function.
>>
>> I wonder could we speed that up for UTF-8
>> by just deferring to strstr() ?
>> I've not tested this so feel free to bin it.
>
> An alternati
On 07/29/2010 11:05 AM, Pádraig Brady wrote:
I would better recommend to use
the u8_strstr function.
I wonder could we speed that up for UTF-8
by just deferring to strstr() ?
I've not tested this so feel free to bin it.
An alternative idea: check if there is more than 1 character using
u8_mb
On 28/07/10 22:32, Bruno Haible wrote:
> Pádraig Brady wrote:
>> I would suggest a new function due to the
>> way I see this function called most often.
>>
>> /* definitely not sure of this name */
>> uint8_t *
>> u8_str_u8_chr (const uint8_t *s, const uint8_t *c, size_t size)
>> {
>> switch (siz
Pádraig Brady wrote:
> I would suggest a new function due to the
> way I see this function called most often.
>
> /* definitely not sure of this name */
> uint8_t *
> u8_str_u8_chr (const uint8_t *s, const uint8_t *c, size_t size)
> {
> switch (size):
> {
> case 1:
> return (uint8_
Hi Paolo,
> I pushed it now, thanks for the review!
I thought you were going to add ChangeLog entries right before pushing.
I added them for you (from the git commit messages).
When compiling a testdir like this:
./gnulib-tool --create-testdir --dir=$HOME/data/tmp/testdir1 --with-tests
unist
On 07/28/2010 11:01 PM, Bruno Haible wrote:
I thought you were going to add ChangeLog entries right before pushing.
I added them for you (from the git commit messages).
Oops, looks like I was confusing coreutils/grep vs. gnulib habits.
Paolo
On 07/27/2010 06:28 PM, Pádraig Brady wrote:
On 27/07/10 19:14, Paolo Bonzini wrote:
On 07/27/2010 06:06 PM, Pádraig Brady wrote:
I would suggest a new function due to the
way I see this function called most often.
I.E. repeatedly with the same character.
Is this really a bottleneck? i.e.,
On 27/07/10 19:14, Paolo Bonzini wrote:
> On 07/27/2010 06:06 PM, Pádraig Brady wrote:
>>
>> I would suggest a new function due to the
>> way I see this function called most often.
>> I.E. repeatedly with the same character.
>
> Is this really a bottleneck? i.e., what does u8_uctomb_aux look like
On 07/27/2010 06:06 PM, Pádraig Brady wrote:
On 23/07/10 13:03, Paolo Bonzini wrote:
On 07/23/2010 09:21 AM, Bruno Haible wrote:
Otherwise fine, please commit. I can add comments that prove the
validity of
the code later, after you committed it.
v2 already had large comments that detail the a
On 23/07/10 13:03, Paolo Bonzini wrote:
> On 07/23/2010 09:21 AM, Bruno Haible wrote:
>> Otherwise fine, please commit. I can add comments that prove the
>> validity of
>> the code later, after you committed it.
>
> v2 already had large comments that detail the algorithm used and do
> include all
On 23/07/10 07:53, bonz...@gnu.org wrote:
> From: Paolo Bonzini
>
> v2 of the patch, now including strchr operations too and with the
> changes suggested by Bruno. Padraig, can you test it on your
> cases too?
Cool. There was no measurable change for the 2 byte case,
but there was another 6% im
On 07/23/2010 09:21 AM, Bruno Haible wrote:
Otherwise fine, please commit. I can add comments that prove the validity of
the code later, after you committed it.
v2 already had large comments that detail the algorithm used and do
include all the info in the comments you suggested. Feel free to
Hi Paolo,
Thanks for the update.
In [PATCH v2 3/5] unistr/u*-chr: test multibyte sequences more:
> +n = U_UCTOMB(c, uc, 6);
Still lacking a space before the opening parenthesis.
In [PATCH v2 4/5] unistr/u*-strchr: add tests:
> +n = U_UCTOMB(c, uc, 6);
Likewise.
> +/* Test of u
From: Paolo Bonzini
v2 of the patch, now including strchr operations too and with the
changes suggested by Bruno. Padraig, can you test it on your
cases too?
Paolo Bonzini (5):
unistr/u*-chr: prepare for multibyte tests
unistr/u*-chr: test multibyte sequences
unistr/u*-chr: test multibyte
23 matches
Mail list logo