2008/12/1, John Baldwin <[EMAIL PROTECTED]>: > On Sunday 23 November 2008 10:41:38 am Kostik Belousov wrote: > > On Sun, Nov 23, 2008 at 12:51:58AM +0000, Kip Macy wrote: > > > On Sat, Nov 22, 2008 at 11:08 PM, Scott Long <[EMAIL PROTECTED]> wrote: > > > > Kostik Belousov wrote: > > > >> > > > >> On Sat, Nov 22, 2008 at 03:05:22PM -0700, Scott Long wrote: > > > >>> > > > >>> A neat hack would be for the kernel linker to scan the text and do a > > > >>> drop-in replacement of the opcode that is appropriate for the > platform. > > > >>> I can't see how a CPU_XXX definition would work because it's just a > > > >>> compile time construct, one that can be included with any kernel > > > >>> compile. > > > >> > > > >> Yes, it is possible to do that. Less drastic change is to directly > > > >> check features. I moved slow code to separate section to eliminate > > > >> unconditional jump in fast path. > > > >> Only compile-tested. > > > >> > > > > > > > > As long as it works, I think it's a step in the right direction; I'm > > > > assuming that cpu_feature is a symbol filled in at runtime and not a > > > > macro for the cpuid instruction, right? > > > > > > > > Scott > > > > > > > > > > i386/include/md_var.h: > > > <..> > > > extern u_int cpu_exthigh; > > > extern u_int cpu_feature; > > > extern u_int cpu_feature2; > > > extern u_int amd_feature; > > > extern u_int amd_feature2; > > > <...> > > > > > > I'm not thrilled with it, but we can revisit the issue if it makes a > > > measurable difference on someone's workload. > > > > Below is the updated patch. It includes changes made after private comments > > by bde@ and uses symbolic definitions for the bits in the features words. > > I thought about accessing a per-CPU word for serialized instruction in the > > > slow path, but decided that it does not beneficial.\ > > Is the branch really better than just doing what the atomic operations for > mutexes, etc. do and just use 'lock addl $0,%esp' for a barrier in all cases > on i386 and only bother with using the fancier instructions on amd64? Even > amd64 doesn't use *fence yet for the atomic ops actually. I have had a patch > to use it for years, but during testing there was no discernable difference > between the existing 'lock addl' approach vs '*fence'. I'd much rather just > use 486 code for all i386 machines than add a branch, esp. if > the "optimization" the branch is doing isn't an actual optimization.
This is exactly what I suggest in private and I'm supportive with this. Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "[EMAIL PROTECTED]"