On Mon, 1 Aug 2005, Hugh Dickins wrote: > > > Aside, that brings up an interesting question - why should readonly > > mappings of writeable files (with VM_MAYWRITE set) disallow ptrace > > write access while readonly mappings of readonly files not? Or am I > > horribly confused? > > Either you or I. You'll have to spell that out to me in more detail, > I don't see it that way.
We have always just done a COW if it's read-only - even if it's shared. The point being that if a process mapped did a read-only mapping, and a tracer wants to modify memory, the tracer is always allowed to do so, but it's _not_ going to write anything back to the filesystem. Writing something back to an executable just because the user happened to mmap it with MAP_SHARED (but read-only) _and_ the user had the right to write to that fd is _not_ ok. So VM_MAYWRITE is totally immaterial. We _will_not_write_ (and must not do so) to the backing store through ptrace unless it was literally a writable mapping (in which case VM_WRITE will be set, and the page table should be marked writable in the first case). So we have two choices: - not allow the write at all in ptrace (which I think we did at some point) This ends up being really inconvenient, and people seem to really expect to be able to write to readonly areas in debuggers. And doign "MAP_SHARED, PROT_READ" seems to be a common thing (Linux has supported that pretty much since day #1 when mmap was supported - long before writable shared mappings were supported, Linux accepted MAP_SHARED + PROT_READ not just because we could, but because Unix apps do use it). or - turn a shared read-only page into a private page on ptrace write This is what we've been doing. It's strange, and it _does_ change semantics (it's not shared any more, so the debugger writing to it means that now you don't see changes to that file by others), so it's clearly not "correct" either, but it's certainly a million times better than writing out breakpoints to shared files.. At some point (for the longest time), when a debugger was used to modify a read-only page, we also made it writable to the user, which was much easier from a VM standpoint. Now we have this "maybe_mkwrite()" thing, which is part of the reason for this particular problem. Using the dirty flag for a "page is _really_ writable" is admittedly kind of hacky, but it does have the advantage of working even when the -real- write bit isn't set due to "maybe_mkwrite()". If it forces the s390 people to add some more hacks for their strange VM, so be it.. [ Btw, on a totally unrelated note: anybody who is a git user and looks for when this maybe_mkwrite() thing happened, just doing git-whatchanged -p -Smaybe_mkwrite mm/memory.c in the bkcvs conversion pinpoints it immediately. Very useful git trick in case you ever have that kind of question. ] I added Martin Schwidefsky to the Cc: explicitly, so that he can ping whoever in the s390 team needs to figure out what the right thing is for s390 and the dirty bit semantic change. Thanks for pointing it out. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/