On Mon, Mar 05, 2018 at 10:27:32PM +0300, Ilya Smith wrote: > > On 5 Mar 2018, at 19:23, Matthew Wilcox <wi...@infradead.org> wrote: > > On Mon, Mar 05, 2018 at 04:09:31PM +0300, Ilya Smith wrote: > >> I’m analysing that approach and see much more problems: > >> - each time you call mmap like this, you still increase count of vmas as > >> my > >> patch did > > > > Umm ... yes, each time you call mmap, you get a VMA. I'm not sure why > > that's a problem with my patch. I was trying to solve the problem Daniel > > pointed out, that mapping a guard region after each mmap cost twice as > > many VMAs, and it solves that problem. > > > The issue was in VMAs count as Daniel mentioned. > The more count, the harder walk tree. I think this is fine
The performance problem Daniel was mentioning with your patch was not with the number of VMAs but with the scattering of addresses across the page table tree. > >> - the entropy you provide is like 16 bit, that is really not so hard to > >> brute > > > > It's 16 bits per mapping. I think that'll make enough attacks harder > > to be worthwhile. > > Well yes, its ok, sorry. I just would like to have 32 bit entropy maximum > some day :) We could put 32 bits of padding into the prot argument on 64-bit systems (and obviously you need a 64-bit address space to use that many bits). The thing is that you can't then put anything else into those pages (without using MAP_FIXED). > >> - if you unmap/remap one page inside region, field vma_guard will show > >> head > >> or tail pages for vma, not both; kernel don’t know how to handle it > > > > There are no head pages. The guard pages are only placed after the real > > end. > > Ok, we have MG where G = vm_guard, right? so when you do vm_split, > you may come to situation - m1g1m2G, how to handle it? I mean when M is > split with only one page inside this region. How to handle it? I thought I covered that in my earlier email. Using one letter per page, and a five-page mapping with two guard pages: MMMMMGG. Now unmap the fourth page, and the VMA gets split into two. You get: MMMGMGG. > > I can't agree with that. The user has plenty of opportunities to get > > randomness; from /dev/random is the easiest, but you could also do timing > > attacks on your own cachelines, for example. > > I think the usual case to use randomization for any mmap or not use it at all > for whole process. So here I think would be nice to have some variable > changeable with sysctl (root only) and ioctl (for greedy processes). I think this functionality can just as well live inside libc as in the kernel. > Well, let me summary: > My approach chose random gap inside gap range with following strings: > > + addr = get_random_long() % ((high - low) >> PAGE_SHIFT); > + addr = low + (addr << PAGE_SHIFT); > > Could be improved limiting maximum possible entropy in this shift. > To prevent situation when attacker may massage allocations and > predict chosen address, I randomly choose memory region. I’m still > like my idea, but not going to push it anymore, since you have yours now. > > Your idea just provide random non-mappable and non-accessable offset > from best-fit region. This consumes memory (1GB gap if random value > is 0xffff). But it works and should work faster and should resolve the issue. umm ... 64k * 4k is a 256MB gap, not 1GB. And it consumes address space, not memory. > My point was that current implementation need to be changed and you > have your own approach for that. :) > Lets keep mine in the mind till better times (or worse?) ;) > Will you finish your approach and upstream it? I'm just putting it out there for discussion. If people think this is the right approach, then I'm happy to finish it off. If the consensus is that we should randomly pick addresses instead, I'm happy if your approach gets merged.