Well,
It would be really nice if some1 on this list have a RHEL(/CentOS)5 at hand
with >=4GB RAM to test it. (Hetz?)

- Noam

On 5/14/07, Shachar Shemesh <[EMAIL PROTECTED]> wrote:

Noam Meltzer wrote:
> So, is it possible that PAE technology, in a way, replaces the hugemem?
Seems extremely unlikely to me.

A few words about the technologies (since you brought up the
distinction, I'm surprised it is relevant).

On a 32 bit platform each process can address, at most, 4GB of linear
memory. Ever since the move to 32 bit the "segment registers" are no
longer used for addressing, and thus are irrelevant for address
extension. Let's call the 4GB addressable memory the "virtual memory
space". This memory is, of course, mapped to physical memory by means of
the MMU, generating page faults whenever an illegal page (whether
because it is unmapped or because it is with invalid permissions) is
accessed. This allows the kernel to swap out some physical memory area,
and replace it with a new area from disk. This is how virtual memory
works.

PAE is but an extension to the virtual memory technique, but using
unaddressable memory instead of the disk. The machine has 64GB of
physical memory, but can only actually address 4GB at a time. Pages of
physical memory are swapped in and out of the addressable PHYSICAL range
by means of using the PAE, and then, using the MMU, into virtual space.
So each of the 64GB physical memory is given a 4GB physical address (not
concurrently, of course), and then given a 4GB virtual address for the
sake of the actual running processes.

Except we have a problem. Each time we need to switch between user space
and kernel space, we need to have the kernel ready and available to us.
This must be the case so we can actually handle whatever it is that
triggered the move (hardware interrupt, software interrupt or trap). The
way we do that is by keeping the entire memory allocated to the kernel
(code + data) mapped to the top area of the virtual memory addresses, no
matter where we are in the system. Whether we are in kernel space, or
each and every running user space process, we always keep the kernel at
the same addresses. Of course, if we are in user space we mark the
addresses as non-readable, non-writeable, but that's ok, because we can
tell the MMU that a certain page is only read/writeable if the CPU is in
Ring 0, and the CPU automatically enters ring 0 in case of an interrupt
(of any kind). Problem solved.

Except there is one problem with this scenario. This scenario means that
whatever memory we reserve for the kernel is subtracted from the
*virtual* address space available to user space. Once we decide that the
kernel reserves 1GB of addresses for its own use, no matter what program
is running, these addresses can never be used for anything else,
regardless of how much physical memory is available in the machine.

So, what have we got so far? We have 4GB of address space, which
represents the absolute maximum any user space program can hope to
address simultaneously. No matter how much the actual machine has, a
single process cannot hope to address (directly) more than 4GB of
memory. We further reduce this number by allocating some of the
addresses to the kernel, creating a split between user space and kernel
space *address space*.

How much do we split? Windows splits at 2GB boundary by default. This
means that each user space program has a maximum of 2GB of memory
available, and the kernel has 2GB of memory as well. We call this a 2/2
split. Linux, by default, allocates 1GB to the kernel, which leaves 3GB
to each user space program. We call this a 3/1 split.

Now here's the problem. Sometimes, when there is too much memory in the
machine (using PAE), it may turn out that 1GB is not enough to keep
track of what virtual address for which process belongs to which
physical address. Merely managing the physical memory requires an
overhead, and with too much overhead, 1GB is not enough.

There are two possible solutions to this problem. The first is to
increase the amount of memory allocated to the kernel. We could, for
example, switch from allocating 1GB to the kernel in a 3/1 split to
allocating 2GB to the kernel in a 2/2 split (like Windows). This,
however, leads to the following absurd: the more physical memory you,
the less memory each user space program can use!

To avoid this problem, the 4/4 split was invented. What it does,
basically, is to not keep the kernel's memory mapped during user space
execution. In other words, each time an interrupt arrives, the kernel
switches (I haven't looked at the actual code, but I'm assuming through
a tiny piece of code that is constantly mapped) the MMU tables, as if a
context switch occurred. This, of course, results in higher costs for
calling kernel code, but allows us to allocate the entire 4GB address
space to the kernel, while allocating the entire 4GB address space to
each user space program. According to Noam, this is what "hugemem" means
(I don't know whether that is the case). This is called the 4/4 split.

Luckily, it is fairly simple to test whether a 4/4 split patch is
installed in your kernel. All you have to do is try and allocate memory
from the area usually reserved for the kernel image. Attached is a small
program to tell you just that. If you compile and run it on 4/4
machines, or on machines with 64 bit kernels, it will tell you it
managed to allocate the memory. If you run it on 3/1 machines, it will
give you an address that is lower than the address you requested.
> - Noam
Shachar


Reply via email to