On 09/21/2011 12:37 PM, Dmitry Olshansky wrote:
On 21.09.2011 4:04, Timon Gehr wrote:
On 09/21/2011 01:57 AM, Christophe wrote:
"Jonathan M Davis" , dans le message (digitalmars.D.learn:29637), a
écrit :
On Tuesday, September 20, 2011 14:43 Andrej Mitrovic wrote:
On 9/20/11, Jonathan M Davis<jmdavisp...@gmx.com> wrote:
Or std.range.walkLength. I don't know why we really have
std.utf.count. I
just
calls walkLength anyway. I suspect that it's a function that predates
walkLength and was made to use walkLength after walkLength was
introduced. But
it's kind of pointless now.
- Jonathan M Davis
I don't think having better-named aliases is a bad thing. Although now
I'm seeing it's not just an alias but a function.
std.utf.count has on advantage: someone looking for the function will
find it. The programmer might not look in std.range to find a function
about UFT strings, and even if he did, it is not indicated in walkLength
that it works with (narrow) strings the way it does. To know you can use
walklength, you must know that:
-popFront works differently in string.
-hasLength is not true for strings.
-what is walkLength.
So yes, you experienced programmer don't need std.utf.count, but newbies
do.
Last point: WalkLength is not optimized for strings.
std.utf.count should be.
This short implementation of count was 3 to 8 times faster than
walkLength is a simple benchmark:
size_t myCount(string text)
{
size_t n = text.length;
for (uint i=0; i<text.length; ++i)
{
auto s = text[i]>>6;
n -= (s>>1) - ((s+1)>>2);
}
return n;
}
(compiled with gdc on 64 bits, the sample text was the introduction of
french wikipedia UTF-8 article down to the sommaire -
http://fr.wikipedia.org/wiki/UTF-8 ).
The reason is that the loop can be unrolled by the compiler.
Very good point, you might want to file an enhancement request. It would
make the functionality different enough to prevent count from being
removed: walkLength throws on an invalid UTF sequence.
Actually, I don't buy it. I guess the reason it's faster is that it
doesn't check if the codepoint is valid. In fact you can easily get
ridiculous overflowed "negative" lengths.
Most of these could be caught by a final check. I think having the
option of a version that is so much faster would be nice. Chances are
pretty high that code actually manipulating the string will throw
eventually if it is invalid.
> Maybe we can put it here as
unsafe and fast version though.
Also check std.utf.stride to see if you can get it better, it's the
beast behind narrow string popFront.