> On Mon, Jul 12, 1999 at 10:38:03PM -0700, Mike Smith wrote:
> > I said:
> > > than indirect function calls on some architectures: inline
> > > branched code. So you still have a global variable selecting
> > > locked/non-locked, but it's a boolean, rather than a pointer.
> > > Your atomic macro
Matthew Dillon <[EMAIL PROTECTED]> wrote:
>:I'm not sure there's any reason why you shouldn't. If you changed the
>:semantics of a stack segment so that memory addresses below the stack
>:pointer were irrelevant, you could implement a small, 0-cycle, on-chip
>:stack (that overflowed into memory).
Before this thread on "cache coherence" and "memory consistency" goes
any further, I'd like to suggest a time-out to read something like
http://www-ece.rice.edu/~sarita/Publications/models_tutorial.ps.
A lot of what I'm reading has a grain of truth but isn't quite
right. This paper appeared as a
Matthew Dillon <[EMAIL PROTECTED]> wrote:
>:[1] A locked instruction implies a synchronous RMW cycle. In order
>:to meet write-ordering guarantees (without which, a locked RMW
>:cycle would be useless as a semaphore primitive), it implies a
>:complete write serialization, and probably
>This is a fairly key statement in context, and an opinion here would
>count for a lot; are function calls likely to become more or less
>expensive in time?
Ambiguous question.
First answer: Assume we're hitting the cache, taking no branch
mispredicts, and everything is generally going at "the
>Second answer: in the real world, we're nearly always hitting the
>cache on stack operations associated with calls and argument passing,
>but not less often on operations in the procedure body. So, in
^^^ typo
Urk. I meant to say "less often", delete the "not".
To Unsubscribe: send mail
:...
I would also like to add a few more notes in regards to write pipelines.
Write pipelines are not used any more, at least not long ones. The
reason is simply the cache coherency issue again. Until the data is
actually written into the L1 cache, it is acoherent.
Acoher
On Mon, Jul 12, 1999 at 10:38:03PM -0700, Mike Smith wrote:
> I said:
> > than indirect function calls on some architectures: inline
> > branched code. So you still have a global variable selecting
> > locked/non-locked, but it's a boolean, rather than a pointer.
> > Your atomic macros are then {
> On Mon, Jul 12, 1999 at 07:09:58PM -0700, Mike Smith wrote:
> > > Although function calls are more expensive than inline code,
> > > they aren't necessarily a lot more so, and function calls to
> > > non-locked RMW operations are certainly much cheaper than
> > > inline locked RMW operations.
>
On Mon, Jul 12, 1999 at 07:09:58PM -0700, Mike Smith wrote:
> > Although function calls are more expensive than inline code,
> > they aren't necessarily a lot more so, and function calls to
> > non-locked RMW operations are certainly much cheaper than
> > inline locked RMW operations.
>
> This is
:
:Based on general computer architecture principles, I'd say that a lock
:prefix is likely to become more expensive[1], whilst a function call
:will become cheaper[2] over time.
:...
:
:[1] A locked instruction implies a synchronous RMW cycle. In order
:to meet write-ordering guarantees (wit
:
:I'm not sure there's any reason why you shouldn't. If you changed the
:semantics of a stack segment so that memory addresses below the stack
:pointer were irrelevant, you could implement a small, 0-cycle, on-chip
:stack (that overflowed into memory). I don't know whether this
:semantic chang
Matthew Dillon <[EMAIL PROTECTED]> wrote:
>The change in code flow used to be the expensive piece, but not any
>more. You typically either see a branch prediction cache (Intel)
>offering a best-case of 0-cycle latency, or a single-cycle latency
>that is slot-fillable (MIPS).
In
Mike Smith <[EMAIL PROTECTED]> wrote:
>> Although function calls are more expensive than inline code,
>> they aren't necessarily a lot more so, and function calls to
>> non-locked RMW operations are certainly much cheaper than
>> inline locked RMW operations.
>
>This is a fairly key statement in c
:I assumed too much in asking the question; I was specifically
:interested in indirect function calls, since this has a direct impact
:on method-style implementations.
Branch prediction caches are typically PC-sensitive. An indirect method
call will never be as fast as a direct call,
>
> :
> :> Although function calls are more expensive than inline code,
> :> they aren't necessarily a lot more so, and function calls to
> :> non-locked RMW operations are certainly much cheaper than
> :> inline locked RMW operations.
> :
> :This is a fairly key statement in context, and an opin
:
:> Although function calls are more expensive than inline code,
:> they aren't necessarily a lot more so, and function calls to
:> non-locked RMW operations are certainly much cheaper than
:> inline locked RMW operations.
:
:This is a fairly key statement in context, and an opinion here would
> Although function calls are more expensive than inline code,
> they aren't necessarily a lot more so, and function calls to
> non-locked RMW operations are certainly much cheaper than
> inline locked RMW operations.
This is a fairly key statement in context, and an opinion here would
count for
In message <[EMAIL PROTECTED]>, John-Mark Gurney writes:
>Matthew Dillon scribbled this message on Jul 12:
>> p.s. I'm pretty sure that the lock prefix costs nothing on a UP system,
>> and probably wouldn't be noticed on an SMP system either because the
>> write-allocation overhead is
> do away with the lock prefix on non-SMP machines. I don't know if the
> SMP variable is accessible from within the i386/include/atomic.h header
> file, though.
>
SMP is globally defined (in opt_global.h).
-lq
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe free
Mike Haertel once wrote:
> Anyway, taking all that into account, I still agree with Dillon that
> it is a better software solution to allow the same loadable drivers to
> work for both UP and MP systems whenever possible.
What's wrong, again with /modules and /modules.smp? If some third party
Here we are:
Empty loop
mode 09.21 ns/loop nproc=1 lcks=EMPTY
Tight loop, 1 and 2 processes, with and without lock prefix
mode 1 16.48 ns/loop nproc=1 lcks=no
mode 2 23.65 ns/loop nproc=2 lcks=no
mode 3 93.02 ns/loop nproc=1 lcks=yes
mod
You might think that, due to MESI state bits in the cache and bus
coherency protocols, that locks are "free".
Unfortunately, the lock prefix has a measurable cost on a UP system,
at least on P6 and later processors. The reason is that the locked
memory operation is an "at-retirement" operation,
:actually, I'm not so sure, it guarantees that NO other bus operation
:will succeed while this is happening... what happens if a pci bus
:mastering card makes a modification to this value? sure, it normally
:won't happen, but it can... and w/o the lock prefix, this CAN happen
:from what I unders
Matthew Dillon scribbled this message on Jul 12:
> p.s. I'm pretty sure that the lock prefix costs nothing on a UP system,
> and probably wouldn't be noticed on an SMP system either because the
> write-allocation overhead is already pretty bad. But I haven't tested
> it.
actuall
:>p.s. I'm pretty sure that the lock prefix costs nothing on a UP system,
:>and probably wouldn't be noticed on an SMP system either because the
:>write-allocation overhead is already pretty bad. But I haven't tested
:>it.
:
:it's actually quite expensive in terms of bus bandwidt
>p.s. I'm pretty sure that the lock prefix costs nothing on a UP system,
>and probably wouldn't be noticed on an SMP system either because the
>write-allocation overhead is already pretty bad. But I haven't tested
>it.
it's actually quite expensive in terms of bus bandwidth bec
:> We don't need the lock prefix for the current SMP implementation. A lock
:> prefix would be needed in a multithreaded implementation but should not be
:> added unless the kernel is an SMP kernel otherwise UP performance would
:> suffer.
:>
:> --
:> Doug Rabson Mail: [
>I was under the impression that a locked instruction was essentially free
>at runtime, with the sole exception of being one byte larger.
No, they are very expensive, at least when done in a minimal loop (8
cycles on my P5/133 UP and 16 cycles on my Celeron/450). ISTR Steve
Passe saying that the
> We don't need the lock prefix for the current SMP implementation. A lock
> prefix would be needed in a multithreaded implementation but should not be
> added unless the kernel is an SMP kernel otherwise UP performance would
> suffer.
>
> --
> Doug Rabson Mail: [EMAIL
Doug Rabson wrote:
> On Mon, 12 Jul 1999, Peter Jeremy wrote:
>
> > Mike Haertel <[EMAIL PROTECTED]> wrote:
> > >Um. FYI on x86, even if the compiler generates the RMW
> > >form "addl $1, foo", it's not atomic. If you want it to
> > >be atomic you have to precede the opcode with a LOCK
> > >pre
Doug Rabson wrote in list.freebsd-current:
> On Mon, 12 Jul 1999, Peter Jeremy wrote:
> > That said, it should be fairly simple to change Matt's new in-line
> > assembler versions to insert LOCK prefixes when building an SMP
> > kernel. (Although I don't know that this is necessary yet, given
On Mon, 12 Jul 1999, Peter Jeremy wrote:
> Mike Haertel <[EMAIL PROTECTED]> wrote:
> >Um. FYI on x86, even if the compiler generates the RMW
> >form "addl $1, foo", it's not atomic. If you want it to
> >be atomic you have to precede the opcode with a LOCK
> >prefix 0xF0.
>
> I'd noticed that p
On Sun, 11 Jul 1999, Mike Haertel wrote:
> >On Sat, 10 Jul 1999, Matthew Dillon wrote:
> >>
> >> The supposedly atomic functions in i386/include/atomic.h are not
> >> as atomic as was previously thought :-):
> >>
> >> #define atomic_add_short(P, V) (*(u_short*)(P) += (V))
> >[.
:
:That said, it should be fairly simple to change Matt's new in-line
:assembler versions to insert LOCK prefixes when building an SMP
:kernel. (Although I don't know that this is necessary yet, given
:the `Big Giant Lock').
:
:There remains the problem of locating all the operations in the kerne
For example, we have a performance anomaly with squid on 3.2 that could
be over-eager pagedaemon behaviour flooding the I/O system.
>Also, try the latest -CURRENT and see if you can still get it stuck in
>objtrm. I haven't had any luck so far in my simulation. If you still
>
Mike Haertel <[EMAIL PROTECTED]> wrote:
>Um. FYI on x86, even if the compiler generates the RMW
>form "addl $1, foo", it's not atomic. If you want it to
>be atomic you have to precede the opcode with a LOCK
>prefix 0xF0.
I'd noticed that point as well. The top of sys/i386/include/atomic.h
_doe
>On Sat, 10 Jul 1999, Matthew Dillon wrote:
>>
>> The supposedly atomic functions in i386/include/atomic.h are not
>> as atomic as was previously thought :-):
>>
>> #define atomic_add_short(P, V) (*(u_short*)(P) += (V))
>[...]
>
>Before I fixed this stuff for the alpha, the += e
On Sun, 11 Jul 1999, Alan Cox wrote:
> On Sun, Jul 11, 1999 at 08:12:52AM +0100, Doug Rabson wrote:
> >
> > What a nightmare. This must be due to egcs compiling things differently
> > from gcc 2.7.1. ...
>
> Yes, at least for the one case in vm_pageout_flush. (I checked
> the analogous code on
Actually, I should have said swap_pager_getpages and not
vm_pageout_flush.
Alan
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message
On Sun, Jul 11, 1999 at 08:12:52AM +0100, Doug Rabson wrote:
>
> What a nightmare. This must be due to egcs compiling things differently
> from gcc 2.7.1. ...
Yes, at least for the one case in vm_pageout_flush. (I checked
the analogous code on a 3.x-STABLE system and it appears to be fine
for t
On Sat, 10 Jul 1999, Matthew Dillon wrote:
>
> The supposedly atomic functions in i386/include/atomic.h are not
> as atomic as was previously thought :-):
>
> #define atomic_add_short(P, V) (*(u_short*)(P) += (V))
>
> I looked at that kinda funny. But C doesn't guarentee
The supposedly atomic functions in i386/include/atomic.h are not
as atomic as was previously thought :-):
#define atomic_add_short(P, V) (*(u_short*)(P) += (V))
I looked at that kinda funny. But C doesn't guarentee a RMW opcode
for a "+=" !!!. Alan found an example s
Yahhh. I was finally able to reproduce the problem - running Stephen's
16MB make -j5 buildworld test overnight.
I haven't found the exact cause yet, and I suspect that Alan's patch will
not fix it (but I'll try it if I exhaust other possibilities). There is
plenty of free me
special kernel config options?
Also, try the latest -CURRENT and see if you can still get it stuck in
objtrm. I haven't had any luck so far in my simulation. If you still
get stuck in objtrm then try Alan's patch and see if that has an effect.
Please try the attached patch.
Alan
Index: vm/vm_object.c
===
RCS file: /home/ncvs/src/sys/vm/vm_object.c,v
retrieving revision 1.158
diff -c -r1.158 vm_object.c
*** vm_object.c 1999/07/01 19:53:42 1.158
--- vm_object.c 1999/07
On Thursday, 8th July 1999, Matthew Dillon wrote:
>There is a way we can find out for sure. For any of you with processes
> stuck in objtrm, see if you can gdb the kernel and get a backtrace
>of that process to see if it might be in a state where a previous
>cal
ng an
interlock situation.
There is a way we can find out for sure. For any of you with processes
stuck in objtrm, see if you can gdb the kernel and get a backtrace
of that process to see if it might be in a state where a previous
call context is holding a PIP count on the object.
g
I can very reliable reproduce a process getting stuck in this state. The
box it is running on is a k6-2 400 with 128MB of ram and 500MB of swap.
When compiling mysql322-server with the compiler option of '-O2' or
'-O3', the build gets up to sql_yacc.cc. It churns on this file and
goes into the o
On Tuesday, 6th July 1999, Andrew Gallatin wrote:
>Yes. say 'proc pidhashtbl[PID & pidhash]->lh_first' in kgdb.
>I suspect that it will be in exit() also..
Magic!
It looks like a plain old exit() to me.
(kgdb) proc pidhashtbl[27157&pidhash]->lh_first
(kgdb) bt
#0 mi_switch () at ../../kern/k
Andrew Gallatin writes:
>Stephen McKay writes:
> > PS I haven't worked out yet how to find the stack of the errant process.
> > Any hints? The stack trace should be helpful.
>
>Yes. say 'proc pidhashtbl[PID & pidhash]->lh_first' in kgdb.
>
it should also work to do ``ps -M -N '' and pick out
On Tue, 6 Jul 1999, Andrew Gallatin wrote:
>
>
> I've occasionally seen systems wedged in a similar state. I reported
> my sighting of this on May 24th. Haven't seen it since.
>
> The one bit of useful info I've learned since my report was that from
> a talk with the program's author, I susp
I've occasionally seen systems wedged in a similar state. I reported
my sighting of this on May 24th. Haven't seen it since.
The one bit of useful info I've learned since my report was that from
a talk with the program's author, I suspect the object in question may
have been created with mmap
You'll want to look primarily in the swap_pager code since it messes with
that (at least it used to - I don't recall what Matt's new code does with it).
There should be various calls to vm_object_pip_* that manipulate the
paging_in_progress number.
-DG
David Greenman
Co-founder/Principal Arch
On Tuesday, 6th July 1999, Stephen McKay wrote:
>the make world hangs with cc1 in "objtrm"...
I'm having a fun old conversation with myself here! ;-)
Here's some concrete info:
(kgdb) p/x *(struct vm_object*) 0xc32ea21c
$13 = {object_list = {tqe_next = 0xc3389e58, tqe_prev = 0xc323fdec},
sh
. Almost everything
>is fine, but one cc1 process is stuck in "objtrm". Oh, and I hung a "cat
>/proc/31624/map", too, trying to get some details (now stuck in "thrd_sleep").
>
>So, am I just tripping over some old long-fixed bug? Or is this a new one
>
I have an old 486 here that I thrash to death occasionally. Well, at least
I try to get it to page to death. I started a make world last week and
forgot about it.
Today I noticed that it's been stuck for most of the week. Almost everything
is fine, but one cc1 process is stuck in &q
On recent -currents, one of our users has managed to wedge a job in
objtrm when its exiting. Anybody know what's causing this?
I've appended a stack trace of the offending process, as well as a
printout of the offending object. The machine seems otherwise
healthy.
I don't really understand
58 matches
Mail list logo