And I have of course embarrassed myself publicly by getting the sign
wrong.  That's what I get for posting *before* booting the result.

You may now point and bray like a donkey. :-)


Anyway. the following actually works:

#if ADD_INTERRUPT_BENCH
static unsigned long avg_cycles, avg_deviation;

#define AVG_SHIFT 8     /* Exponential average factor k=1/256 */
#define FIXED_1_2 (1 << (AVG_SHIFT-1))

static void add_interrupt_bench(cycles_t start)
{
        long delta = random_get_entropy() - start;

        /* Use a weighted moving average */
        delta = delta - ((avg_cycles + FIXED_1_2) >> AVG_SHIFT);
        avg_cycles += delta;
        /* And average deviation */
        delta = abs(delta) - ((avg_deviation + FIXED_1_2) >> AVG_SHIFT);
        avg_deviation += delta;
}
#else
#define add_interrupt_bench(x)
#endif

And here are some measurements (uncorrected for *256 scaling) on my
primary (Ivy Bridge E) test machine.  I've included 10 samples of
each value, takesn at 10s intervals.  avg_cycles is first, followed
by avg_deviation.  The three conditions are idle (1.2 GHz), idle with
performance governor enabled (3.9 GHz), and during a "make -j7" in the
kernel tree (also all processors at maximum).

Rather against my intuition, a busy system greatly *reduces* the
time spent.  Just to see what interrupt rate did, on the last kernel
I also tested it while being ping flooded.

They're sorted in increasing order of speed.  Unrolling definitely
makes a difference, but it's not faster than the old code until I drop
to 2 iterations in the inner loop (which would be called 4 rounds by
most people).  The 64-bit mix is noticeably faster yet.

Idle            performance     make -j7

ORIG_FAST_MIX=0
74761 22228     78799 20305     46527 24966
71984 23619     78466 20599     50044 25202
71949 23760     77262 21363     48295 25460
72966 23859     76188 21921     47393 25130
73974 23543     76040 22135     42979 24341
74601 23407     75294 22602     50502 26715
75359 23169     71267 24990     45649 25338
75450 22855     71065 25022     48792 25731
76338 22711     71569 25016     48564 26040
76546 22567     71143 24972     48414 27882

ORIG_FAST_MIX=0, unrolled:
54830 20312     60343 21814     29577 16699
55510 20787     60655 22504     40344 24749
56994 21080     60691 22497     41095 27184
57674 21566     60261 22713     39578 26717
57560 22221     60690 22709     41361 26138
58220 22593     59978 22924     36334 24249
58646 22422     58391 23466     37125 25089
59485 21927     58000 23968     24091 11892
60444 21959     58633 24486     28816 15585
60637 22133     58576 24593     25125 13174

ORIG_FAST_MIX=1
50554 13117     54732 13010     24043 12804
51294 13623     53269 14066     35671 25957
51063 13682     52531 14214     34391 22749
49833 13612     51833 14272     24097 13802
49458 13624     49288 15046     31378 18110
50383 13936     48720 15540     25088 17320
51167 14210     49058 15637     26478 13247
51356 14157     48206 15787     30542 19717
51077 14155     48587 15742     27461 15865
52199 14361     48710 15933     27608 14826

ORIG_FAST_MIX=0, unrolled, 2 (double) rounds:
43011 10685     44846 10523     21469 10994
42568 10261     43968 10834     19694 8501
42012 10304     43657 10619     19174 8557
42166 10063     43064 10598     20221 9398
41496 10353     42125 10843     19034 6685
42176 10826     41547 10984     19462 8002
41671 10947     40756 11242     21654 12140
41691 10643     40309 11312     20526 9188
41091 10817     40135 11318     20159 9267
41151 10553     39877 11484     19653 8393

64-bit hash, 2 (double) rounds (which is excellent avalanche):
36117 11269     39285 11171     16953 5664      35107 14735
35391 11035     36564 11600     18143 7216      35322 14176
34728 11280     35278 12085     16815 6245      35479 14453
35552 11606     35627 11863     16876 5841      34717 14505
35553 11633     35145 11892     17825 6166      35241 14555
35468 11406     35773 11857     16834 5094      34814 14719
35301 11390     35357 11771     16750 4987      35248 14566
34841 10821     35785 11531     19170 8296      35627 14103
34818 10942     35045 11592     17004 6814      34948 14399
35113 11158     35469 11343     19344 7969      33859 14035

Idle            performance     make -j7        ping -f (from outside)

(Again, all numbers must be divided by 256 to get cycles.  You
can probably divide by 1000 amd multiply by 5 in your head, which
is a pretty good approximation.))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to