2010/2/20 Anton Maksimenkov <[email protected]>:
> it was started discussion about uvm_map.c improvements. I prepared the
> diff and trying to explain it.

In addition to that diff I want to discuss two problems.

1.
AFAIK, the userspace malloc() uses mmap (sys_mmap) to do it's job. As
we can see, sys_mmap() uses uvm_map_hint() to obtain some virtual
address. The uvm_map_hint() may generate some random "offset" up to
256Mb.

The problem is that allocation may fail sometimes, even when there are
enough space.
Let's imagine, that most of virtual space is allocated and used, but
there are free space from MAXDSIZ (or so) upto, say, MAXDSIZ + 200Mb.
Let it be 200Mb gap.
Then malloc() of 100Mb occure, and uvm_map_hint() generate address,
say, MAXDSIZ + 201Mb.
Then mmap (sys_mmap, uvm_map_p) uses this address as *startp, and
supply it as a "hint" to uvm_map_findspace().
Of course, uvm_map_findspace() will find that there are no space after
MAXDSIZ + 201Mb. But there are still 200Mb free in reality!

I think that if uvm_map_findspace() can't find space from "hint" then
it must retry searching _without_ hint (say, setting it to
map->min_offset). In that case it may find last free holes and use
them.

2.
So the malloc() can generate random "holes" in free space and fragment
it. The shmat (sys_shmat) also uses malloc.
I'm scared that this lead to bad results when programs do their
allocations in big chunks of memory. Like PostgreSQL.
It allocates one rather big (we want something about 1,5-1.7G) piece
of shared memory. And it wants malloc() some other big chunks in
normal lifecycle. For example, maintenance_work_mem may be set to,
say, 256Mb. There are work_mem which may be big and may be allocated
twice or even more.
So when freespace is fragmented with not so small holes we can
"develop" strange situation: even we got big free space in sum, there
are no any contiguous space big enough to allocate, say, 256M, or
128M.

I think that we may introduce some variable - sysctl variable or
malloc.conf flag - which will prevent sys_mmap() or uvm_map_hint()
from generating such big randoms but limit it by some couple of pages.
So in worst cases we will get not so big gaps - couple of pages, not
many Megabytes.

There are cases when we need memory rather than randomization.
Dedicated database server is such case. Postgres can give more
performance if it can use as much memory as userspace can allocate.
Many big random holes quickly leads to fails of allocations. So then
we must dramatically decrease postgres's memory limits to prevent it
from failing to allocate memory.
Let it be sysadmin's choise to "shrinked" randomization  - I think it
will be very useful with postgres and same cases.

What do you think?
-- 
antonvm

Reply via email to