Re: x86 ptep_get_and_clear question

2001-02-16 Thread Ben LaHaise
On Fri, 16 Feb 2001, Linus Torvalds wrote: > This is, actually, a problem that I suspect ends up being _very_ similar > to the zap_page_range() case. zap_page_range() needs to make sure that > everything has been updated by the time the page is actually free'd. While > filemap_sync() needs to mak

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Linus Torvalds
On Fri, 16 Feb 2001, Ben LaHaise wrote: > > Actually, in the filemap_sync case, the flush_tlb_page is redundant -- > there's already a call to flush_tlb_range in filemap_sync after the dirty > bits are cleared. This is not enough. If another CPU has started write-out of one of the dirty page

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Ben LaHaise
On Fri, 16 Feb 2001, Manfred Spraul wrote: > That leaves msync() - it currently does a flush_tlb_page() for every > single dirty page. > Is it possible to integrate that into the mmu gather code? > > tlb_transfer_dirty() in addition to tlb_clear_page()? Actually, in the filemap_sync case, the fl

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Linus Torvalds
On Fri, 16 Feb 2001, Manfred Spraul wrote: > > That leaves msync() - it currently does a flush_tlb_page() for every > single dirty page. > Is it possible to integrate that into the mmu gather code? Not even necessary. The D bit does not have to be coherent. We need to make sure that we flush

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Manfred Spraul
Linus wrote: > > > > > That second pass is what I had in mind. > > > > > * munmap(file): No. Second pass required for correct msync behaviour. > > > > It is? > > Not now it isn't. We just do a msync() + fsync() for msync(MS_SYNC). Which > is admittedly not optimal, but it works. > Ok, munmap()

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Hugh Dickins
On Fri, 16 Feb 2001, Jamie Lokier wrote: > > > And check the Pentium III erratas. There is one with the tlb > > that's only triggered if 4 instruction lie in a certain window and all > > access memory in the same way of the tlb (EFLAGS incorrect if 'andl > > mask,' causes page fault)). > > Nasty

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Jamie Lokier
Manfred Spraul wrote: > A very simple test might be > > cpu 1: > cpu 2: Ben's test uses only one CPU. > Now start with variants: > change to read only instead of not present > a and b in the same way of the tlb, in a different way. > change pte with write, change with lock; > . > . > . > > But

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Manfred Spraul
Jamie Lokier wrote: > > > > Ben, fancy writing a boot-time test? > > > > > I'd never rely on such a test - what if the cpu checks in 99% of the > > cases, but doesn't handle some cases ('rep movd, everything unaligned, > > ...'. > > A good point. The test results are inconclusive. > > > And ch

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Linus Torvalds
On Fri, 16 Feb 2001, Manfred Spraul wrote: > Jamie Lokier wrote: > > > > Linus Torvalds wrote: > > > So the only case that ends up being fairly heavy may be a case that is > > > very uncommon in practice (only for unmapping shared mappings in > > > threaded programs or the lazy TLB case). > >

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Ben LaHaise
On Fri, 16 Feb 2001, Linus Torvalds wrote: > How do you expect to ever see this in practice? Sounds basically > impossible to test for this hardware race. The obvious "try to dirty as > fast as possible on one CPU while doing an atomic get-and-clear on the > other" thing is not valid - it's in fa

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Linus Torvalds
On Fri, 16 Feb 2001, Ben LaHaise wrote: > On Fri, 16 Feb 2001, Jamie Lokier wrote: > > > It should be fast on known CPUs, correct on unknown ones, and much > > simpler than "gather" code which may be completely unnecessary and > > rather difficult to test. > > > > If anyone reports the message

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Jamie Lokier
> > Ben, fancy writing a boot-time test? > > > I'd never rely on such a test - what if the cpu checks in 99% of the > cases, but doesn't handle some cases ('rep movd, everything unaligned, > ...'. A good point. The test results are inconclusive. > And check the Pentium III erratas. There is on

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Linus Torvalds
On Fri, 16 Feb 2001, Jamie Lokier wrote: > Manfred Spraul wrote: > > Ok, Is there one case were your pragmatic solutions is vastly faster? > > > * mprotect: No. The difference is at most one additional locked > > instruction for each pte. > > Oh, what instruction is that? The "set_pte()" thi

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Ben LaHaise
On Fri, 16 Feb 2001, Jamie Lokier wrote: > It should be fast on known CPUs, correct on unknown ones, and much > simpler than "gather" code which may be completely unnecessary and > rather difficult to test. > > If anyone reports the message, _then_ we think about the problem some more. > > Ben, f

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Jamie Lokier
Manfred Spraul wrote: > Ok, Is there one case were your pragmatic solutions is vastly faster? > * mprotect: No. The difference is at most one additional locked > instruction for each pte. Oh, what instruction is that? > * munmap(anon): No. We must handle delayed accessed anyway (don't call > fr

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Manfred Spraul
Jamie Lokier wrote: > > Manfred Spraul wrote: > > The other cpu writes the dirty bit - we just overwrite it ;-) > > After the ptep_get_and_clear(), before the set_pte(). > > Ah, I see. The other CPU does an atomic *pte |= _PAGE_DIRTY, without > checking the present bit. ('scuse me for temporar

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Jamie Lokier
Manfred Spraul wrote: > The other cpu writes the dirty bit - we just overwrite it ;-) > After the ptep_get_and_clear(), before the set_pte(). Ah, I see. The other CPU does an atomic *pte |= _PAGE_DIRTY, without checking the present bit. ('scuse me for temporary brain failure). How about a prag

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Manfred Spraul
Jamie Lokier wrote: > > And how does that lose a dirty bit? > > For the other processor to not write a dirty bit, it must have a dirty ^^^ > TLB entry already which, along with the locked cycle in > ptep_get_and_clear, means that `entry' will have _PAGE_DIRTY

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Jamie Lokier
Manfred Spraul wrote: > > entry = ptep_get_and_clear(pte); > > set_pte(pte, pte_modify(entry, newprot)); > > > > I.e. the only code with the race condition is code which explicitly > > clears the dirty bit, in vmscan.c. > > > > Do you see any possibility of losing a dirty bit her

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Manfred Spraul
Jamie Lokier wrote: > > /* mprotect.c */ > entry = ptep_get_and_clear(pte); > set_pte(pte, pte_modify(entry, newprot)); > > I.e. the only code with the race condition is code which explicitly > clears the dirty bit, in vmscan.c. > > Do you see any possibility of losing a dirty b

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Jamie Lokier
Manfred Spraul wrote: > > I can think of one case where performance is considered quite important: > > mprotect() is used by several garbage collectors, including threaded > > ones. Maybe mprotect() isn't the best primitive for those anyway, but > > it's what they have to work with atm. > > Does

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Manfred Spraul
Jamie Lokier wrote: > > Linus Torvalds wrote: > > So the only case that ends up being fairly heavy may be a case that is > > very uncommon in practice (only for unmapping shared mappings in > > threaded programs or the lazy TLB case). > The lazy tlb case is quite fast: lazy tlb thread never write

Re: x86 ptep_get_and_clear question

2001-02-16 Thread Jamie Lokier
Linus Torvalds wrote: > So the only case that ends up being fairly heavy may be a case that is > very uncommon in practice (only for unmapping shared mappings in > threaded programs or the lazy TLB case). I can think of one case where performance is considered quite important: mprotect() is used

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Linus Torvalds
On Thu, 15 Feb 2001, Manfred Spraul wrote: > > > Now, I will agree that I suspect most x86 _implementations_ will not do > > this. TLB's are too timing-critical, and nobody tends to want to make > > them bigger than necessary - so saving off the source address is > > unlikely. Also, setting the

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Linus Torvalds
On Fri, 16 Feb 2001, Jamie Lokier wrote: > > If you want to take it really far, it _could_ be that the TLB data > contains both the pointer and the original pte contents. Then "mark > dirty" becomes > >val |= D >write *ptr No. This is forbidden by the intel documentation. Fir

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Jamie Lokier
Linus Torvalds wrote: > It _could_ be that the TLB data actually also contains the pointer to > the place where it was fetched, and a "mark dirty" becomes > > read *ptr locked > val |= D > write *ptr unlock If you want to take it really far, it _could_ be that the TLB data cont

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Manfred Spraul
Manfred Spraul wrote: > > I just benchmarked a single flush_tlb_page(). > > Pentium II 350: ~ 2000 cpu ticks. > Pentium III 850: ~ 3000 cpu ticks. > I forgot the important part: SMP, including a smp_call_function() IPI. IIRC Ingo wrote that a local 'invplg' is around 100 ticks. -- Manf

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Manfred Spraul
Linus Torvalds wrote: > > In article <[EMAIL PROTECTED]>, > Jamie Lokier <[EMAIL PROTECTED]> wrote: > >> > << lock; > >> > read pte > >> > if (!present(pte)) > >> >do_page_fault(); > >> > pte |= dirty > >> > write pte. > >> > >> end lock; > >> > >> No, it is a little more complicated. You al

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Linus Torvalds
In article <[EMAIL PROTECTED]>, Jamie Lokier <[EMAIL PROTECTED]> wrote: >> > << lock; >> > read pte >> > if (!present(pte)) >> >do_page_fault(); >> > pte |= dirty >> > write pte. >> > >> end lock; >> >> No, it is a little more complicated. You also have to include in the >> tlb state into th

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Linus Torvalds
In article <[EMAIL PROTECTED]>, Kanoj Sarcar <[EMAIL PROTECTED]> wrote: >> >> Will you please go off and prove that this "problem" exists on some x86 >> processor before continuing this rant? None of the PII, PIII, Athlon, > >And will you please stop behaving like this is not an issue? This i

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Kanoj Sarcar
> > On Thu, 15 Feb 2001, Kanoj Sarcar wrote: > > > No. All architectures do not have this problem. For example, if the > > Linux "dirty" (not the pte dirty) bit is managed by software, a fault > > will actually be taken when processor 2 tries to do the write. The fault > > is solely to make sure

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Jamie Lokier
Kanoj Sarcar wrote: > > Is the sequence > > << lock; > > read pte > > pte |= dirty > > write pte > > >> end lock; > > or > > << lock; > > read pte > > if (!present(pte)) > > do_page_fault(); > > pte |= dirty > > write pte. > > >> end lock; > > No, it is a little more complicated. You also hav

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Kanoj Sarcar
> > Kanoj Sarcar wrote: > > > > Okay, I will quote from Intel Architecture Software Developer's Manual > > Volume 3: System Programming Guide (1997 print), section 3.7, page 3-27: > > > > "Bus cycles to the page directory and page tables in memory are performed > > only when the TLBs do not con

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Jamie Lokier
Manfred Spraul wrote: > Is the sequence > << lock; > read pte > pte |= dirty > write pte > >> end lock; > or > << lock; > read pte > if (!present(pte)) > do_page_fault(); > pte |= dirty > write pte. > >> end lock; or more generally << lock; read pte if (!present(pte) || !writable(pte))

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Ben LaHaise
On Thu, 15 Feb 2001, Kanoj Sarcar wrote: > No. All architectures do not have this problem. For example, if the > Linux "dirty" (not the pte dirty) bit is managed by software, a fault > will actually be taken when processor 2 tries to do the write. The fault > is solely to make sure that the Linux

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Kanoj Sarcar
> > Kanoj Sarcar wrote: > > > Here's the important part: when processor 2 wants to set the pte's dirty > > > bit, it *rereads* the pte and *rechecks* the permission bits again. > > > Even though it has a non-dirty TLB entry for that pte. > > > > > > That is how I read Ben LaHaise's description,

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Manfred Spraul
Kanoj Sarcar wrote: > > Okay, I will quote from Intel Architecture Software Developer's Manual > Volume 3: System Programming Guide (1997 print), section 3.7, page 3-27: > > "Bus cycles to the page directory and page tables in memory are performed > only when the TLBs do not contain the translat

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Jamie Lokier
Kanoj Sarcar wrote: > > Here's the important part: when processor 2 wants to set the pte's dirty > > bit, it *rereads* the pte and *rechecks* the permission bits again. > > Even though it has a non-dirty TLB entry for that pte. > > > > That is how I read Ben LaHaise's description, and his test pr

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Kanoj Sarcar
> > [Added Linus and linux-kernel as I think it's of general interest] > > Kanoj Sarcar wrote: > > Whether Jamie was trying to illustrate a different problem, I am not > > sure. > > Yes, I was talking about pte_test_and_clear_dirty in the earlier post. > > > Look in mm/mprotect.c. Look at the

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Kanoj Sarcar
> > [Added Linus and linux-kernel as I think it's of general interest] > > Kanoj Sarcar wrote: > > Whether Jamie was trying to illustrate a different problem, I am not > > sure. > > Yes, I was talking about pte_test_and_clear_dirty in the earlier post. > > > Look in mm/mprotect.c. Look at the

Re: x86 ptep_get_and_clear question

2001-02-15 Thread Jamie Lokier
[Added Linus and linux-kernel as I think it's of general interest] Kanoj Sarcar wrote: > Whether Jamie was trying to illustrate a different problem, I am not > sure. Yes, I was talking about pte_test_and_clear_dirty in the earlier post. > Look in mm/mprotect.c. Look at the call sequence change_