On Fri, 16 Feb 2001, Linus Torvalds wrote:
> This is, actually, a problem that I suspect ends up being _very_ similar
> to the zap_page_range() case. zap_page_range() needs to make sure that
> everything has been updated by the time the page is actually free'd. While
> filemap_sync() needs to mak
On Fri, 16 Feb 2001, Ben LaHaise wrote:
>
> Actually, in the filemap_sync case, the flush_tlb_page is redundant --
> there's already a call to flush_tlb_range in filemap_sync after the dirty
> bits are cleared.
This is not enough.
If another CPU has started write-out of one of the dirty page
On Fri, 16 Feb 2001, Manfred Spraul wrote:
> That leaves msync() - it currently does a flush_tlb_page() for every
> single dirty page.
> Is it possible to integrate that into the mmu gather code?
>
> tlb_transfer_dirty() in addition to tlb_clear_page()?
Actually, in the filemap_sync case, the fl
On Fri, 16 Feb 2001, Manfred Spraul wrote:
>
> That leaves msync() - it currently does a flush_tlb_page() for every
> single dirty page.
> Is it possible to integrate that into the mmu gather code?
Not even necessary.
The D bit does not have to be coherent. We need to make sure that we flush
Linus wrote:
>
> >
> > That second pass is what I had in mind.
> >
> > > * munmap(file): No. Second pass required for correct msync behaviour.
> >
> > It is?
>
> Not now it isn't. We just do a msync() + fsync() for msync(MS_SYNC). Which
> is admittedly not optimal, but it works.
>
Ok, munmap()
On Fri, 16 Feb 2001, Jamie Lokier wrote:
>
> > And check the Pentium III erratas. There is one with the tlb
> > that's only triggered if 4 instruction lie in a certain window and all
> > access memory in the same way of the tlb (EFLAGS incorrect if 'andl
> > mask,' causes page fault)).
>
> Nasty
Manfred Spraul wrote:
> A very simple test might be
>
> cpu 1:
> cpu 2:
Ben's test uses only one CPU.
> Now start with variants:
> change to read only instead of not present
> a and b in the same way of the tlb, in a different way.
> change pte with write, change with lock;
> .
> .
> .
>
> But
Jamie Lokier wrote:
>
> > > Ben, fancy writing a boot-time test?
> > >
> > I'd never rely on such a test - what if the cpu checks in 99% of the
> > cases, but doesn't handle some cases ('rep movd, everything unaligned,
> > ...'.
>
> A good point. The test results are inconclusive.
>
> > And ch
On Fri, 16 Feb 2001, Manfred Spraul wrote:
> Jamie Lokier wrote:
> >
> > Linus Torvalds wrote:
> > > So the only case that ends up being fairly heavy may be a case that is
> > > very uncommon in practice (only for unmapping shared mappings in
> > > threaded programs or the lazy TLB case).
> >
On Fri, 16 Feb 2001, Linus Torvalds wrote:
> How do you expect to ever see this in practice? Sounds basically
> impossible to test for this hardware race. The obvious "try to dirty as
> fast as possible on one CPU while doing an atomic get-and-clear on the
> other" thing is not valid - it's in fa
On Fri, 16 Feb 2001, Ben LaHaise wrote:
> On Fri, 16 Feb 2001, Jamie Lokier wrote:
>
> > It should be fast on known CPUs, correct on unknown ones, and much
> > simpler than "gather" code which may be completely unnecessary and
> > rather difficult to test.
> >
> > If anyone reports the message
> > Ben, fancy writing a boot-time test?
> >
> I'd never rely on such a test - what if the cpu checks in 99% of the
> cases, but doesn't handle some cases ('rep movd, everything unaligned,
> ...'.
A good point. The test results are inconclusive.
> And check the Pentium III erratas. There is on
On Fri, 16 Feb 2001, Jamie Lokier wrote:
> Manfred Spraul wrote:
> > Ok, Is there one case were your pragmatic solutions is vastly faster?
>
> > * mprotect: No. The difference is at most one additional locked
> > instruction for each pte.
>
> Oh, what instruction is that?
The "set_pte()" thi
On Fri, 16 Feb 2001, Jamie Lokier wrote:
> It should be fast on known CPUs, correct on unknown ones, and much
> simpler than "gather" code which may be completely unnecessary and
> rather difficult to test.
>
> If anyone reports the message, _then_ we think about the problem some more.
>
> Ben, f
Manfred Spraul wrote:
> Ok, Is there one case were your pragmatic solutions is vastly faster?
> * mprotect: No. The difference is at most one additional locked
> instruction for each pte.
Oh, what instruction is that?
> * munmap(anon): No. We must handle delayed accessed anyway (don't call
> fr
Jamie Lokier wrote:
>
> Manfred Spraul wrote:
> > The other cpu writes the dirty bit - we just overwrite it ;-)
> > After the ptep_get_and_clear(), before the set_pte().
>
> Ah, I see. The other CPU does an atomic *pte |= _PAGE_DIRTY, without
> checking the present bit. ('scuse me for temporar
Manfred Spraul wrote:
> The other cpu writes the dirty bit - we just overwrite it ;-)
> After the ptep_get_and_clear(), before the set_pte().
Ah, I see. The other CPU does an atomic *pte |= _PAGE_DIRTY, without
checking the present bit. ('scuse me for temporary brain failure).
How about a prag
Jamie Lokier wrote:
>
> And how does that lose a dirty bit?
>
> For the other processor to not write a dirty bit, it must have a dirty
^^^
> TLB entry already which, along with the locked cycle in
> ptep_get_and_clear, means that `entry' will have _PAGE_DIRTY
Manfred Spraul wrote:
> > entry = ptep_get_and_clear(pte);
> > set_pte(pte, pte_modify(entry, newprot));
> >
> > I.e. the only code with the race condition is code which explicitly
> > clears the dirty bit, in vmscan.c.
> >
> > Do you see any possibility of losing a dirty bit her
Jamie Lokier wrote:
>
> /* mprotect.c */
> entry = ptep_get_and_clear(pte);
> set_pte(pte, pte_modify(entry, newprot));
>
> I.e. the only code with the race condition is code which explicitly
> clears the dirty bit, in vmscan.c.
>
> Do you see any possibility of losing a dirty b
Manfred Spraul wrote:
> > I can think of one case where performance is considered quite important:
> > mprotect() is used by several garbage collectors, including threaded
> > ones. Maybe mprotect() isn't the best primitive for those anyway, but
> > it's what they have to work with atm.
>
> Does
Jamie Lokier wrote:
>
> Linus Torvalds wrote:
> > So the only case that ends up being fairly heavy may be a case that is
> > very uncommon in practice (only for unmapping shared mappings in
> > threaded programs or the lazy TLB case).
>
The lazy tlb case is quite fast: lazy tlb thread never write
Linus Torvalds wrote:
> So the only case that ends up being fairly heavy may be a case that is
> very uncommon in practice (only for unmapping shared mappings in
> threaded programs or the lazy TLB case).
I can think of one case where performance is considered quite important:
mprotect() is used
On Thu, 15 Feb 2001, Manfred Spraul wrote:
>
> > Now, I will agree that I suspect most x86 _implementations_ will not do
> > this. TLB's are too timing-critical, and nobody tends to want to make
> > them bigger than necessary - so saving off the source address is
> > unlikely. Also, setting the
On Fri, 16 Feb 2001, Jamie Lokier wrote:
>
> If you want to take it really far, it _could_ be that the TLB data
> contains both the pointer and the original pte contents. Then "mark
> dirty" becomes
>
>val |= D
>write *ptr
No. This is forbidden by the intel documentation. Fir
Linus Torvalds wrote:
> It _could_ be that the TLB data actually also contains the pointer to
> the place where it was fetched, and a "mark dirty" becomes
>
> read *ptr locked
> val |= D
> write *ptr unlock
If you want to take it really far, it _could_ be that the TLB data
cont
Manfred Spraul wrote:
>
> I just benchmarked a single flush_tlb_page().
>
> Pentium II 350: ~ 2000 cpu ticks.
> Pentium III 850: ~ 3000 cpu ticks.
>
I forgot the important part:
SMP, including a smp_call_function() IPI.
IIRC Ingo wrote that a local 'invplg' is around 100 ticks.
--
Manf
Linus Torvalds wrote:
>
> In article <[EMAIL PROTECTED]>,
> Jamie Lokier <[EMAIL PROTECTED]> wrote:
> >> > << lock;
> >> > read pte
> >> > if (!present(pte))
> >> >do_page_fault();
> >> > pte |= dirty
> >> > write pte.
> >> > >> end lock;
> >>
> >> No, it is a little more complicated. You al
In article <[EMAIL PROTECTED]>,
Jamie Lokier <[EMAIL PROTECTED]> wrote:
>> > << lock;
>> > read pte
>> > if (!present(pte))
>> >do_page_fault();
>> > pte |= dirty
>> > write pte.
>> > >> end lock;
>>
>> No, it is a little more complicated. You also have to include in the
>> tlb state into th
In article <[EMAIL PROTECTED]>,
Kanoj Sarcar <[EMAIL PROTECTED]> wrote:
>>
>> Will you please go off and prove that this "problem" exists on some x86
>> processor before continuing this rant? None of the PII, PIII, Athlon,
>
>And will you please stop behaving like this is not an issue?
This i
>
> On Thu, 15 Feb 2001, Kanoj Sarcar wrote:
>
> > No. All architectures do not have this problem. For example, if the
> > Linux "dirty" (not the pte dirty) bit is managed by software, a fault
> > will actually be taken when processor 2 tries to do the write. The fault
> > is solely to make sure
Kanoj Sarcar wrote:
> > Is the sequence
> > << lock;
> > read pte
> > pte |= dirty
> > write pte
> > >> end lock;
> > or
> > << lock;
> > read pte
> > if (!present(pte))
> > do_page_fault();
> > pte |= dirty
> > write pte.
> > >> end lock;
>
> No, it is a little more complicated. You also hav
>
> Kanoj Sarcar wrote:
> >
> > Okay, I will quote from Intel Architecture Software Developer's Manual
> > Volume 3: System Programming Guide (1997 print), section 3.7, page 3-27:
> >
> > "Bus cycles to the page directory and page tables in memory are performed
> > only when the TLBs do not con
Manfred Spraul wrote:
> Is the sequence
> << lock;
> read pte
> pte |= dirty
> write pte
> >> end lock;
> or
> << lock;
> read pte
> if (!present(pte))
> do_page_fault();
> pte |= dirty
> write pte.
> >> end lock;
or more generally
<< lock;
read pte
if (!present(pte) || !writable(pte))
On Thu, 15 Feb 2001, Kanoj Sarcar wrote:
> No. All architectures do not have this problem. For example, if the
> Linux "dirty" (not the pte dirty) bit is managed by software, a fault
> will actually be taken when processor 2 tries to do the write. The fault
> is solely to make sure that the Linux
>
> Kanoj Sarcar wrote:
> > > Here's the important part: when processor 2 wants to set the pte's dirty
> > > bit, it *rereads* the pte and *rechecks* the permission bits again.
> > > Even though it has a non-dirty TLB entry for that pte.
> > >
> > > That is how I read Ben LaHaise's description,
Kanoj Sarcar wrote:
>
> Okay, I will quote from Intel Architecture Software Developer's Manual
> Volume 3: System Programming Guide (1997 print), section 3.7, page 3-27:
>
> "Bus cycles to the page directory and page tables in memory are performed
> only when the TLBs do not contain the translat
Kanoj Sarcar wrote:
> > Here's the important part: when processor 2 wants to set the pte's dirty
> > bit, it *rereads* the pte and *rechecks* the permission bits again.
> > Even though it has a non-dirty TLB entry for that pte.
> >
> > That is how I read Ben LaHaise's description, and his test pr
>
> [Added Linus and linux-kernel as I think it's of general interest]
>
> Kanoj Sarcar wrote:
> > Whether Jamie was trying to illustrate a different problem, I am not
> > sure.
>
> Yes, I was talking about pte_test_and_clear_dirty in the earlier post.
>
> > Look in mm/mprotect.c. Look at the
>
> [Added Linus and linux-kernel as I think it's of general interest]
>
> Kanoj Sarcar wrote:
> > Whether Jamie was trying to illustrate a different problem, I am not
> > sure.
>
> Yes, I was talking about pte_test_and_clear_dirty in the earlier post.
>
> > Look in mm/mprotect.c. Look at the
[Added Linus and linux-kernel as I think it's of general interest]
Kanoj Sarcar wrote:
> Whether Jamie was trying to illustrate a different problem, I am not
> sure.
Yes, I was talking about pte_test_and_clear_dirty in the earlier post.
> Look in mm/mprotect.c. Look at the call sequence change_
41 matches
Mail list logo