it looks like you are comparing these two functions

void
loopxinc(void)
{
        uint i, x;

        for(i = 0; i < N; i++){
                _xinc(&x);
                _xdec(&x);
        }
}

void
looplock(void)
{
        uint i;
        static Lock l;

        for(i = 0; i < N; i++){
                lock(&l);
                unlock(&l);
        }
}

but the former does two operations and the latter
only one.  your claim was that _xinc is slower
than incref (== lock(), x++, unlock()).  but you are
timing xinc+xdec against incref.

assuming xinc and xdec are approximately the same
cost (so i can just halve the numbers for loopxinc),
that would make the fair comparison produce:

intel core i7 2.4ghz
loop    0 nsec/call
loopxinc        10 nsec/call  // was 20
looplock        11 nsec/call

intel 5000 1.6ghz
loop    0 nsec/call
loopxinc        22 nsec/call  // was 44
looplock        25 nsec/call

intel atom 330 1.6ghz (exception!)
loop    2 nsec/call
loopxinc        7 nsec/call  // was 14
looplock        22 nsec/call

amd k10 2.0ghz
loop    2 nsec/call
loopxinc        15 nsec/call  // was 30
looplock        20 nsec/call

intel p4 xeon 3.0ghz

loop    1 nsec/call
loopxinc        38 nsec/call  // was 76
looplock        42 nsec/call

which looks like a much different story.

russ

Reply via email to