Re: Strange counter-performance in an alternative `decimalLength9` function

Basile B. via Digitalmars-d-learn Thu, 27 Feb 2020 01:46:18 -0800

On Thursday, 27 February 2020 at 09:33:28 UTC, Dennis Cote wrote:

On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:

So after reading the translation of RYU I was interested toosee if the decimalLength() function can be written to befaster, as it cascades up to 8 CMP.


Perhaps you could try something like this.

<code>
int decimalDigitLength(ulong n) {
        if (n < 10000)
                if (n < 100)
                        return n < 10 ? 1 : 2;
                else
                        return n < 1000 ? 3 : 4;
        else
                if (n < 100000000)
                        if (n < 1000000)
                                return n < 100000 ? 5 : 6;
                        else
                                return n < 10000000 ? 7 : 8;
                else
                        if (n < 1000000000000)
                                if (n < 10000000000)
                                        return n < 1000000000 ? 9 : 10;
                                else
                                        return n < 100000000000 ? 11 : 12;
                        else
                                if (n < 10000000000000000)
                                        if (n < 100000000000000)
                                                return n < 10000000000000 ? 13 
: 14;
                                        else
                                                return n < 1000000000000000 ? 
15 : 16;
                                else
                                        if (n < 1000000000000000000)
                                                return n < 100000000000000000 ? 
17 : 18;
                                        else
                                                return n < 10000000000000000000 
? 19 : 20;
}                                                                               
</code>

This uses at most 6 compares for any 64 bit number and only 3for the most common small numbers less than 10000.

I was glad to see that with ldc at run.dlang.io using the -O3optimization I could change the function signature to matchyours and the compiler eliminated all the unreachable dead codefor larger values. The compiler produced the following assembler


<code>

.section .text.ubyteonlineapp.decimalLength9(uint),"axG",@progbits,ubyteonlineapp.decimalLength9(uint),comdat

        .globl  ubyte onlineapp.decimalLength9(uint)
        .p2align        4, 0x90
        .type   ubyte onlineapp.decimalLength9(uint),@function
ubyte onlineapp.decimalLength9(uint):
        .cfi_startproc
        cmpl    $9999, %edi
        ja      .LBB1_5
        cmpl    $99, %edi
        ja      .LBB1_4
        cmpl    $10, %edi
        movb    $2, %al
        sbbb    $0, %al
        retq
.LBB1_5:
        cmpl    $99999999, %edi
        ja      .LBB1_9
        cmpl    $999999, %edi
        ja      .LBB1_8
        cmpl    $100000, %edi
        movb    $6, %al
        sbbb    $0, %al
        retq
.LBB1_4:
        cmpl    $1000, %edi
        movb    $4, %al
        sbbb    $0, %al
        retq
.LBB1_9:
        cmpl    $1000000000, %edi
        movb    $10, %al
        sbbb    $0, %al
        retq
.LBB1_8:
        cmpl    $10000000, %edi
        movb    $8, %al
        sbbb    $0, %al
        retq
.Lfunc_end1:

.size ubyte onlineapp.decimalLength9(uint), .Lfunc_end1-ubyteonlineapp.decimalLength9(uint)

        .cfi_endproc
</code>

for the same body with signature ubyte decimalLength9(uint n).

This may be faster than your sequential comparison functiondepending upon the distribution of numbers. In realapplications, small numbers are far more common so the reducednumber of compares for those values should be beneficial inmost cases.

Sorry but no. I think that you have missed how this has changedsince the first message.1. the way it was tested initially was wrong because LLVM wasoptimizing some stuff in some tests and not others, due toliterals constants.2. Apparently there would be a branchless version that's fastwhen testing with unbiased input (to be verified)


this version is:

---
ubyte decimalLength9_4(const uint v) pure nothrow
{
    return 1 +  (v >= 10) +
                (v >= 100) +
                (v >= 1000) +
                (v >= 10000) +
                (v >= 100000) +
                (v >= 1000000) +
                (v >= 10000000) +
                (v >= 100000000) ;
}
---

but i cannot see the improvment when use time on the test programand 100000000 calls feeded with a random number.

seehttps://forum.dlang.org/post/ctidwrnxvwwkouprj...@forum.dlang.orgfor the latest evolution of the discussion.

Re: Strange counter-performance in an alternative `decimalLength9` function

Reply via email to