(5/10/12 11:01 PM), Jerome Glisse wrote: > On Thu, May 10, 2012 at 10:51 PM, KOSAKI Motohiro > <kosaki.motohiro at gmail.com> wrote: >> (5/10/12 8:50 PM), Minchan Kim wrote: >>> >>> Hi KOSAKI, >>> >>> On 05/11/2012 02:53 AM, KOSAKI Motohiro wrote: >>> >>>>>>> let's assume that one application want to allocate user space memory >>>>>>> region using malloc() and then write something on the region. as you >>>>>>> may know, user space buffer doen't have real physical pages once >>>>>>> malloc() call so if user tries to access the region then page fault >>>>>>> handler would be triggered >>>>>> >>>>>> >>>>>> >>>>>> Understood. >>>>>> >>>>>>> and then in turn next process like swap in to fill physical frame >>>>>>> number >>>>>> >>>>>> into entry of the page faulted. >>>>>> >>>>>> >>>>>> Sorry, I can't understand your point due to my poor English. >>>>>> Could you rewrite it easiliy? :) >>>>>> >>>>> >>>>> Simply saying, handle_mm_fault would be called to update pte after >>>>> finding >>>>> vma and checking access right. and as you know, there are many cases to >>>>> process page fault such as COW or demand paging. >>>> >>>> >>>> Hmm. If I understand correctly, you guys misunderstand mlock. it doesn't >>>> page pinning >>>> nor prevent pfn change. It only guarantee to don't make swap out. e.g. >>> >>> >>> >>> Symantic point of view, you're right but the implementation makes sure >>> page pinning. >>> >>>> memory campaction >>>> feature may automatically change page physical address. >>> >>> >>> >>> I tried it last year but decided drop by realtime issue. >>> https://lkml.org/lkml/2011/8/29/295 >>> >>> so I think mlock is a kind of page pinning. If elsewhere I don't realized >>> is doing, that place should be fixed. >>> Or my above patch should go ahead. >> >> >> Thanks pointing out. I didn't realized your patch didn't merged. I think it >> should go ahead. think autonuma case, >> if mlock disable autonuma migration, that's bug. I don't think we can >> promise mlock don't change physical page. >> I wonder if any realtime guys page migration is free lunch. they should >> disable both auto migration and compaction. >> >> And, think if application explictly use migrate_pages(2) or admins uses >> cpusets. driver code can't assume such scenario >> doesn't occur, yes? >> >> > > I am ok with patch being merge as is if you add restriction for the > ioctl to be root only and a big comment stating that user ptr thing is > just abusing the kernel API and that it should not be replicated by > other driver except if fully understanding that all hell might break > loose with it.
Oh, apology. I didn't intend to assist as is merge. Basically I agree with minchan. Is should be replaced get_user_pages(). I only intended to clarify pros/cons and where is original author's intention. If I understand correctly, MADV_DONT_FORK is best solution for this case. > If you know it's only the ddx that will use it and that their wont be > fork that better to not worry about but again state it in the comment > about the ioctl. > > I really wish there was some magical VM_DRIVER_MAPPED flags that would > add the proper restriction to other memory code while keeping fork > behavior consistant (ie cow). But such things would need massive > chirurgy of the linux mm code. > > Cheers, > Jerome