On 09/21/2011 02:15 AM, Christophe wrote:
Timon Gehr , dans le message (digitalmars.D.learn:29641), a écrit :
Last point: WalkLength is not optimized for strings.
std.utf.count should be.

This short implementation of count was 3 to 8 times faster than
walkLength is a simple benchmark:

size_t myCount(string text)
{
    size_t n = text.length;
    for (uint i=0; i<text.length; ++i)
      {
        auto s = text[i]>>6;
        n -= (s>>1) - ((s+1)>>2);
      }
    return n;
}

(compiled with gdc on 64 bits, the sample text was the introduction of
french wikipedia UTF-8 article down to the sommaire -
http://fr.wikipedia.org/wiki/UTF-8 ).

The reason is that the loop can be unrolled by the compiler.

Very good point, you might want to file an enhancement request. It would
make the functionality different enough to prevent count from being
removed: walkLength throws on an invalid UTF sequence.

I would be glad to do so, but I am quite new here, so I don't know how
to. A little pointer could help.


http://d.puremagic.com/issues/

You can tick 'Severity: enhancement request'. Probably it would be best if it throws if the final result is larger than text.length though.


Reply via email to