On Fri, Feb 26, 2010 at 10:58:09AM +0500, Anton Maksimenkov wrote:

> 2010/2/20 Anton Maksimenkov <[email protected]>:
> > it was started discussion about uvm_map.c improvements. I prepared the
> > diff and trying to explain it.
> 
> In addition to that diff I want to discuss two problems.
> 
> 1.
> AFAIK, the userspace malloc() uses mmap (sys_mmap) to do it's job. As
> we can see, sys_mmap() uses uvm_map_hint() to obtain some virtual
> address. The uvm_map_hint() may generate some random "offset" up to
> 256Mb.
> 
> The problem is that allocation may fail sometimes, even when there are
> enough space.
> Let's imagine, that most of virtual space is allocated and used, but
> there are free space from MAXDSIZ (or so) upto, say, MAXDSIZ + 200Mb.
> Let it be 200Mb gap.
> Then malloc() of 100Mb occure, and uvm_map_hint() generate address,
> say, MAXDSIZ + 201Mb.
> Then mmap (sys_mmap, uvm_map_p) uses this address as *startp, and
> supply it as a "hint" to uvm_map_findspace().
> Of course, uvm_map_findspace() will find that there are no space after
> MAXDSIZ + 201Mb. But there are still 200Mb free in reality!
> 
> I think that if uvm_map_findspace() can't find space from "hint" then
> it must retry searching _without_ hint (say, setting it to
> map->min_offset). In that case it may find last free holes and use
> them.

Yes, uvm hackers know about this potential problem. But none came up
with a diff yet. 

> 2.
> So the malloc() can generate random "holes" in free space and fragment
> it. The shmat (sys_shmat) also uses malloc.
> I'm scared that this lead to bad results when programs do their
> allocations in big chunks of memory. Like PostgreSQL.
> It allocates one rather big (we want something about 1,5-1.7G) piece
> of shared memory. And it wants malloc() some other big chunks in
> normal lifecycle. For example, maintenance_work_mem may be set to,
> say, 256Mb. There are work_mem which may be big and may be allocated
> twice or even more.
> So when freespace is fragmented with not so small holes we can
> "develop" strange situation: even we got big free space in sum, there
> are no any contiguous space big enough to allocate, say, 256M, or
> 128M.
> 
> I think that we may introduce some variable - sysctl variable or
> malloc.conf flag - which will prevent sys_mmap() or uvm_map_hint()
> from generating such big randoms but limit it by some couple of pages.
> So in worst cases we will get not so big gaps - couple of pages, not
> many Megabytes.
> 
> There are cases when we need memory rather than randomization.
> Dedicated database server is such case. Postgres can give more
> performance if it can use as much memory as userspace can allocate.
> Many big random holes quickly leads to fails of allocations. So then
> we must dramatically decrease postgres's memory limits to prevent it
> from failing to allocate memory.
> Let it be sysadmin's choise to "shrinked" randomization  - I think it
> will be very useful with postgres and same cases.
> 
> What do you think?
> -- 
> antonvm

If fragmentation really is a problem for you, my first reaction is:
use a machine with a bigger adress spacce.

IMO, a flag for malloc(3) is not a good idea. Memory layout policy is a
kernel task.

As for reducing the gaps: if gaps area few pages max, in effect you
are reducing randomness and increasing predictability of memory
layout. I would hate too loose that. 

        -Otto

Reply via email to