On Mon, 1 Dec 2008, Kostik Belousov wrote:

On Mon, Dec 01, 2008 at 02:07:06PM -0500, John Baldwin wrote:
On Sunday 23 November 2008 10:41:38 am Kostik Belousov wrote:

Below is the updated patch. It includes changes made after private comments
by bde@ and uses symbolic definitions for the bits in the features words.
I thought about accessing a per-CPU word for serialized instruction in the
slow path, but decided that it does not beneficial.\

Is the branch really better than just doing what the atomic operations for
mutexes, etc. do and just use 'lock addl $0,%esp' for a barrier in all cases
on i386 and only bother with using the fancier instructions on amd64?  Even
amd64 doesn't use *fence yet for the atomic ops actually.  I have had a patch
to use it for years, but during testing there was no discernable difference
between the existing 'lock addl' approach vs '*fence'.  I'd much rather just
use 486 code for all i386 machines than add a branch, esp. if
the "optimization" the branch is doing isn't an actual optimization.

Hmm.

I do not insist. The branch is done only for arches that has no fence
instructions. The section for the slow code is put after the main text,
that should, according to Intel documentation, be predicted as not taken
for the first execution.

For reference, below is the latest version of the patch. I postponed
the commit, in particular, because it touches ixgb, and Jack did
not responded.
...
diff --git a/sys/i386/include/atomic.h b/sys/i386/include/atomic.h
index f6bcf0c..dbdc945 100644
--- a/sys/i386/include/atomic.h
+++ b/sys/i386/include/atomic.h
@@ -32,21 +32,47 @@
#error this file needs sys/cdefs.h as a prerequisite
#endif

-
-#if defined(I686_CPU)
-#define mb()   __asm__ __volatile__ ("mfence;": : :"memory")
-#define wmb()  __asm__ __volatile__ ("sfence;": : :"memory")
-#define rmb()  __asm__ __volatile__ ("lfence;": : :"memory")
-#else
-/*
- * do we need a serializing instruction?
- */
-#define mb()
-#define wmb()
-#define rmb()
+#if defined(_KERNEL)

Still have this style bug (if defined() instead of ifdef).

+#include <machine/cpufunc.h>

Namespace pollution.  This should be unnecessary because it is an error
to include <machine/atomic.h> (or <machine/cpufunc.h>) directly, at
least in the kernel.  The correct include is <sys/systm.h> which
supplies pollution by <machine/atomic.h> and <machine/cpufunc.h> and
lots more as standard.  There <machine/atomic.h> is included before
<machine/cpufunc.h>.  This order would now be wrong if you have added
a dependency of <machine/atomic.h> on <machine/cpufunc.h>, but I can't
see such a dependency...

+#include <machine/specialreg.h>

... now I see it -- it is because <machine/specialreg.h> is broken and
has a dependency on <machine/cpufunc.h>.  <machine/specialreg.h> is
supposed to contain only #define's for special registers, but it has
2 inline functions that use functions from <machine/cpufunc.h>, without
even using #defines for the special registers that they use.  Normally
<machine/specialreg.h> can be included without worrying about this
dependency, thanks to the standard pollution, but now the standard
pollution is broken.

Namespace pollution.  This is almost necessary for CPUID_* (it would
be necessary if these macros were inlines, but with macros only
files that use the macros would need the include and there might be
few enough of them for this to be done for each, or <sys/systm.h>
could do it (ugh) and hthe order doesn't matter.

Another advantage of 'lock addl' is that CPUID_* are not needed.
Atomic ops would need the same treatment if they used *fence.

+#define        mb()    __asm __volatile(                       \
+       "testl     %0,cpu_feature                          \n\
+       je      2f                                      \n\
+       mfence                                          \n\
+1:                                                     \n\

Still have this style bug (no empty line before label that is not
fallen into).  I wasn't going to care about this, but then decided
that the non-fall-through here is especially non-obvious.

+       .section .text.offpath                          \n\
+2:     lock                                            \n\

Still have this style bug (label on same line as instruction).

+       addl    $0,cpu_feature                          \n\

I wonder if adding to (%esp) is faster.  It is at least smaller.

+       jmp     1b                                      \n\

Missing empty line as above.

+       .text                                           \n\
+"                                                 \
+       : : "i"(CPUID_SSE2) : "memory")

Missing space after "i" here and elsewhere.

Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to