Doug Rabson <[EMAIL PROTECTED]> wrote:
>We don't need the lock prefix for the current SMP implementation. A lock
>prefix would be needed in a multithreaded implementation but should not be
>added unless the kernel is an SMP kernel otherwise UP performance would
>suffer.
Modulo the issue of UP vs SMP modules, the code would seem to be simply:
#ifdef SMP
#define ATOMIC_ASM(type,op) \
__asm __volatile ("lock; " op : "=m" (*(type *)p) : "ir" (v), "0" (*(type *)p))
#else
#define ATOMIC_ASM(type,op) \
__asm __volatile (op : "=m" (*(type *)p) : "ir" (v), "0" (*(type *)p))
#endif
Or (maybe more clearly):
#ifdef SMP
#define SMP_LOCK "lock; "
#else
#define SMP_LOCK
#endif
#define ATOMIC_ASM(type,op) \
__asm __volatile (SMP_LOCK op : "=m" (*(type *)p) : "ir" (v), "0" (*(type *)p))
John-Mark Gurney <[EMAIL PROTECTED]> wrote:
>actually, I'm not so sure, it guarantees that NO other bus operation
>will succeed while this is happening... what happens if a pci bus
>mastering card makes a modification to this value?
This is a valid point, but I don't think it's directly related to this
thread. I think it's up the the various PCI device driver writers to
ensure that objects shared between a PCI device and driver are
correctly locked. The mechanism to do this is likely to be device-
specific: Lock prefixes only protect objects no larger than 32 or 64
bits (depending on the processor), cards may require locked accesses
to much larger structures.
I believe the API to the PCI-locking routines should be distinct from
the API for SMP locks - even though the underlying implementation may
be common.
Oliver Fromme <[EMAIL PROTECTED]> wrote:
>In my list of i386 clock cycles, the lock prefix is listed with
>0 (zero) cycles.
My i486 book states 1 cycle, although that cycle can be overlapped with
several other combinations that add a cycle to the basic instruction
execution time. I don't know about the Pentium and beyond timings. In
any case, we have real-world timings, which are more useful.
Mike Haertel <[EMAIL PROTECTED]> wrote:
>Although function calls are more expensive than inline code,
>they aren't necessarily a lot more so, and function calls to
>non-locked RMW operations are certainly much cheaper than
>inline locked RMW operations.
Based on my timings below, this is correct, though counter-intuitive.
Given the substantial cost of indirect function calls, I don't
this this would be acceptable, though. I think compiling modules
separately for UP/SMP is a better choice.
In Message-id: <19990 [EMAIL PROTECTED]>,
Matthew Dillon <[EMAIL PROTECTED]> provided some hard figures
for a dual PIII-450. Expanding those figures for a range of machines
(all are UP except the PIII-450, which are Matt's SMP figures), and
adding the cost of using indirect function calls (patches to Matt's
code at the end):
i386SX-25 P-133 PII-266 PIII-450 nproc locks
mode 0 1950.23 39.65 26.31 9.21 EMPTY
mode 1 3340.59 71.74 24.45 16.48 1 no tight
mode 2 3237.57 71.18 25.27 23.65 2 no tight
mode 3 3367.65 282.31 153.29 93.02 1 yes tight
mode 4 3263.64 285.58 152.94 160.82 2 yes tight
mode 5 9439.15 446.16 60.40 37.64 1 no spread
mode 6 10231.96 467.39 60.16 89.28 2 no spread
mode 7 10660.05 725.80 153.18 88.32 1 yes spread
mode 8 9990.18 755.87 155.18 161.08 2 yes spread
mode 9 5544.82 131.31 49.96 ? EMPTY
mode 10 7234.97 174.20 64.81 ? 1 no tight
mode 11 7212.14 178.72 64.87 ? 2 no tight
mode 12 7355.46 304.74 182.75 ? 1 yes tight
mode 13 6956.54 327.11 180.21 ? 2 yes tight
mode 14 13603.72 582.02 100.10 ? 1 no spread
mode 15 13443.54 543.97 101.13 ? 2 no spread
mode 16 13731.94 717.31 207.12 ? 1 yes spread
mode 17 13379.62 800.31 207.70 ? 2 yes spread
Modes 9 through 17 are equivalent to modes 0-8 except that the
operation is performed via a call thru a pointer-to-function.
(Mode 9 is a pointer to a nop).
Apart from the noticable improvement in overall speed from left to
right, this shows that the lock prefix is _very_ expensive on
Pentium and above, even in a UP configuration. It makes no difference
on a 386. (I can produce the 486 figures tonight after I get home).
It also suggests that moving to an indirect function call (to allow
run-time UP/SMP selection) will be quite painful.
> As you can see, the lock prefix creates a stall condition on the locked
> memory, but does *NOT* stall other memory.
This is at least CPU dependent (and may also depend on the motherboard
chipset). The i486 states that `all memory is locked'.
> Therefore I believe the impact will be unnoticeable. On a duel
> 450MHz P-III we are talking 37 ns vs 88 ns - an overhead of 50 ns
> for the one processor case, and an overhead of 72 ns for the two
> processor case.
Whilst that's true for a P-III, it's definitely not true for most
lesser machines (which are probably more common - and are likely to
remain so for a while).
Based on the impact above, I believe the lock prefixes should not
be inserted until they are necessary - even if it does mean we wind
up with /modules and /modules.smp. I don't believe that moving
to indirect function pointers is a reasonable work-around.
Finally, my patches to Matt's last code:
--- lock2.c~ Tue Jul 13 07:43:10 1999
+++ lock2.c Tue Jul 13 08:38:35 1999
@@ -32,6 +32,13 @@
ATOMIC_ASM_NOLOCK(int, "addl %1,%0");
}
+static void nop(void *p, u_int v) {}
+
+static void (*add_lockp)(void *p, u_int v) = atomic_add_int;
+static void (*add_nolockp)(void *p, u_int v) = atomic_add_int_nolock;
+static void (*nopp)(void *p, u_int v) = nop;
+
+
volatile int GX[8]; /* note: not shared between processes */
int
@@ -51,7 +58,7 @@
ftruncate(fd, pgsize);
ptr = mmap(NULL, pgsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
- for (m = 0; m <= 8; ++m) {
+ for (m = 0; m <= 17; ++m) {
pid_t pid = -1;
int nproc = 1;
const char *lcks = "EMPTY";
@@ -119,6 +126,67 @@
;
}
break;
+ case 8+9:
+ pid = fork();
+ nproc = 2;
+ /* fall through */
+ case 7+9:
+ for (i = 0; i < LOOPS; ++i) {
+ (*add_lockp)(ptr, 1);
+ GX[0] = 1;
+ GX[1] = 1;
+ GX[2] = 1;
+ GX[3] = 1;
+ GX[4] = 1;
+ GX[5] = 1;
+ GX[6] = 1;
+ GX[7] = 1;
+ }
+ lcks = "yes";
+ break;
+ case 6+9:
+ pid = fork();
+ nproc = 2;
+ /* fall through */
+ case 5+9:
+ for (i = 0; i < LOOPS; ++i) {
+ (*add_nolockp)(ptr, 1);
+ GX[0] = 1;
+ GX[1] = 1;
+ GX[2] = 1;
+ GX[3] = 1;
+ GX[4] = 1;
+ GX[5] = 1;
+ GX[6] = 1;
+ GX[7] = 1;
+ }
+ lcks = "no";
+ break;
+ case 4+9:
+ pid = fork();
+ nproc = 2;
+ /* fall through */
+ case 3+9:
+ for (i = 0; i < LOOPS; ++i) {
+ (*add_lockp)(ptr, 1);
+ }
+ lcks = "yes";
+ break;
+ case 2+9:
+ pid = fork();
+ nproc = 2;
+ /* fall through */
+ case 1+9:
+ for (i = 0; i < LOOPS; ++i) {
+ (*add_nolockp)(ptr, 1);
+ }
+ lcks = "no";
+ break;
+ case 0+9:
+ for (i = 0; i < LOOPS; ++i) {
+ (*nopp)(ptr, 1);
+ }
+ break;
default:
printf("huh?\n");
exit(1);
@@ -131,7 +199,7 @@
usec = tv2.tv_usec + 1000000 - tv1.tv_usec + (tv2.tv_sec - tv1.tv_sec - 1) *
1000000;
- printf("mode %d\t%6.2f ns/loop nproc=%d lcks=%s\n", m, (double)usec * 1000.0 /
(double)LOOPS / (double)nproc, nproc, lcks);
+ printf("mode %2d\t%6.2f ns/loop nproc=%d lcks=%s\n", m, (double)usec * 1000.0
+/ (double)LOOPS / (double)nproc, nproc, lcks);
}
return(0);
}
Peter
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message