Re: "objtrm" problem probably found

Peter Jeremy Mon, 12 Jul 1999 15:13:09 -0700
Doug Rabson <[EMAIL PROTECTED]> wrote:
>We don't need the lock prefix for the current SMP implementation. A lock
>prefix would be needed in a multithreaded implementation but should not be
>added unless the kernel is an SMP kernel otherwise UP performance would
>suffer.

Modulo the issue of UP vs SMP modules, the code would seem to be simply:

#ifdef SMP
#define ATOMIC_ASM(type,op)     \
    __asm __volatile ("lock; " op : "=m" (*(type *)p) : "ir" (v), "0" (*(type *)p))
#else
#define ATOMIC_ASM(type,op)     \
    __asm __volatile (op : "=m" (*(type *)p) : "ir" (v), "0" (*(type *)p))
#endif

Or (maybe more clearly):

#ifdef SMP
#define SMP_LOCK        "lock; "
#else
#define SMP_LOCK
#endif

#define ATOMIC_ASM(type,op)     \
    __asm __volatile (SMP_LOCK op : "=m" (*(type *)p) : "ir" (v), "0" (*(type *)p))


John-Mark Gurney <[EMAIL PROTECTED]> wrote:
>actually, I'm not so sure, it guarantees that NO other bus operation
>will succeed while this is happening... what happens if a pci bus
>mastering card makes a modification to this value?

This is a valid point, but I don't think it's directly related to this
thread.  I think it's up the the various PCI device driver writers to
ensure that objects shared between a PCI device and driver are
correctly locked.  The mechanism to do this is likely to be device-
specific: Lock prefixes only protect objects no larger than 32 or 64
bits (depending on the processor), cards may require locked accesses
to much larger structures.

I believe the API to the PCI-locking routines should be distinct from
the API for SMP locks - even though the underlying implementation may
be common.


Oliver Fromme <[EMAIL PROTECTED]> wrote:
>In my list of i386 clock cycles, the lock prefix is listed with
>0 (zero) cycles.
My i486 book states 1 cycle, although that cycle can be overlapped with
several other combinations that add a cycle to the basic instruction
execution time.  I don't know about the Pentium and beyond timings.  In
any case, we have real-world timings, which are more useful.


Mike Haertel <[EMAIL PROTECTED]> wrote:
>Although function calls are more expensive than inline code,
>they aren't necessarily a lot more so, and function calls to
>non-locked RMW operations are certainly much cheaper than
>inline locked RMW operations.
Based on my timings below, this is correct, though counter-intuitive.
Given the substantial cost of indirect function calls, I don't
this this would be acceptable, though.  I think compiling modules
separately for UP/SMP is a better choice.


In Message-id: <19990 [EMAIL PROTECTED]>,
Matthew Dillon <[EMAIL PROTECTED]> provided some hard figures
for a dual PIII-450.  Expanding those figures for a range of machines
(all are UP except the PIII-450, which are Matt's SMP figures), and
adding the cost of using indirect function calls (patches to Matt's
code at the end):

        i386SX-25   P-133   PII-266  PIII-450  nproc  locks
mode  0   1950.23    39.65    26.31     9.21     EMPTY
mode  1   3340.59    71.74    24.45    16.48     1      no  tight
mode  2   3237.57    71.18    25.27    23.65     2      no  tight
mode  3   3367.65   282.31   153.29    93.02     1     yes  tight
mode  4   3263.64   285.58   152.94   160.82     2     yes  tight
mode  5   9439.15   446.16    60.40    37.64     1      no  spread
mode  6  10231.96   467.39    60.16    89.28     2      no  spread
mode  7  10660.05   725.80   153.18    88.32     1     yes  spread
mode  8   9990.18   755.87   155.18   161.08     2     yes  spread

mode  9   5544.82   131.31    49.96        ?     EMPTY
mode 10   7234.97   174.20    64.81        ?     1      no  tight
mode 11   7212.14   178.72    64.87        ?     2      no  tight
mode 12   7355.46   304.74   182.75        ?     1     yes  tight
mode 13   6956.54   327.11   180.21        ?     2     yes  tight
mode 14  13603.72   582.02   100.10        ?     1      no  spread
mode 15  13443.54   543.97   101.13        ?     2      no  spread
mode 16  13731.94   717.31   207.12        ?     1     yes  spread
mode 17  13379.62   800.31   207.70        ?     2     yes  spread

Modes 9 through 17 are equivalent to modes 0-8 except that the
operation is performed via a call thru a pointer-to-function.
(Mode 9 is a pointer to a nop).

Apart from the noticable improvement in overall speed from left to
right, this shows that the lock prefix is _very_ expensive on
Pentium and above, even in a UP configuration.  It makes no difference
on a 386.  (I can produce the 486 figures tonight after I get home).
It also suggests that moving to an indirect function call (to allow
run-time UP/SMP selection) will be quite painful.

>    As you can see, the lock prefix creates a stall condition on the locked
>    memory, but does *NOT* stall other memory.
This is at least CPU dependent (and may also depend on the motherboard
chipset).  The i486 states that `all memory is locked'.

>    Therefore I believe the impact will be unnoticeable.  On a duel 
>    450MHz P-III we are talking 37 ns vs 88 ns - an overhead of 50 ns
>    for the one processor case, and an overhead of 72 ns for the two
>    processor case.
Whilst that's true for a P-III, it's definitely not true for most
lesser machines (which are probably more common - and are likely to
remain so for a while).

Based on the impact above, I believe the lock prefixes should not
be inserted until they are necessary - even if it does mean we wind
up with /modules and /modules.smp.  I don't believe that moving
to indirect function pointers is a reasonable work-around.

Finally, my patches to Matt's last code:
--- lock2.c~    Tue Jul 13 07:43:10 1999
+++ lock2.c     Tue Jul 13 08:38:35 1999
@@ -32,6 +32,13 @@
         ATOMIC_ASM_NOLOCK(int, "addl %1,%0");
 }
 
+static void nop(void *p, u_int v) {}
+
+static void (*add_lockp)(void *p, u_int v) = atomic_add_int;
+static void (*add_nolockp)(void *p, u_int v) = atomic_add_int_nolock;
+static void (*nopp)(void *p, u_int v) = nop;
+
+
 volatile int GX[8];    /* note: not shared between processes */
 
 int
@@ -51,7 +58,7 @@
     ftruncate(fd, pgsize);
     ptr = mmap(NULL, pgsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
 
-    for (m = 0; m <= 8; ++m) {
+    for (m = 0; m <= 17; ++m) {
        pid_t pid = -1;
        int nproc = 1;
        const char *lcks = "EMPTY";
@@ -119,6 +126,67 @@
                    ;
            }
            break;
+       case 8+9:
+           pid = fork();
+           nproc = 2;
+           /* fall through */
+       case 7+9:
+           for (i = 0; i < LOOPS; ++i) {
+               (*add_lockp)(ptr, 1);
+               GX[0] = 1;
+               GX[1] = 1;
+               GX[2] = 1;
+               GX[3] = 1;
+               GX[4] = 1;
+               GX[5] = 1;
+               GX[6] = 1;
+               GX[7] = 1;
+           }
+           lcks = "yes";
+           break;
+       case 6+9:
+           pid = fork();
+           nproc = 2;
+           /* fall through */
+       case 5+9:
+           for (i = 0; i < LOOPS; ++i) {
+               (*add_nolockp)(ptr, 1);
+               GX[0] = 1;
+               GX[1] = 1;
+               GX[2] = 1;
+               GX[3] = 1;
+               GX[4] = 1;
+               GX[5] = 1;
+               GX[6] = 1;
+               GX[7] = 1;
+           }
+           lcks = "no";
+           break;
+       case 4+9:
+           pid = fork();
+           nproc = 2;
+           /* fall through */
+       case 3+9:
+           for (i = 0; i < LOOPS; ++i) {
+               (*add_lockp)(ptr, 1);
+           }
+           lcks = "yes";
+           break;
+       case 2+9:
+           pid = fork();
+           nproc = 2;
+           /* fall through */
+       case 1+9:
+           for (i = 0; i < LOOPS; ++i) {
+               (*add_nolockp)(ptr, 1);
+           }
+           lcks = "no";
+           break;
+       case 0+9:
+           for (i = 0; i < LOOPS; ++i) {
+               (*nopp)(ptr, 1);
+           }
+           break;
        default:
            printf("huh?\n");
            exit(1);
@@ -131,7 +199,7 @@
 
        usec = tv2.tv_usec + 1000000 - tv1.tv_usec + (tv2.tv_sec - tv1.tv_sec - 1) * 
1000000;
 
-       printf("mode %d\t%6.2f ns/loop nproc=%d lcks=%s\n", m, (double)usec * 1000.0 / 
(double)LOOPS / (double)nproc, nproc, lcks);
+       printf("mode %2d\t%6.2f ns/loop nproc=%d lcks=%s\n", m, (double)usec * 1000.0 
+/ (double)LOOPS / (double)nproc, nproc, lcks);
     }
     return(0);
 }


Peter


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message
Re: "objtrm" problem probably found

Reply via email to