Well, It would be really nice if some1 on this list have a RHEL(/CentOS)5 at hand with >=4GB RAM to test it. (Hetz?)
- Noam On 5/14/07, Shachar Shemesh <[EMAIL PROTECTED]> wrote:
Noam Meltzer wrote: > So, is it possible that PAE technology, in a way, replaces the hugemem? Seems extremely unlikely to me. A few words about the technologies (since you brought up the distinction, I'm surprised it is relevant). On a 32 bit platform each process can address, at most, 4GB of linear memory. Ever since the move to 32 bit the "segment registers" are no longer used for addressing, and thus are irrelevant for address extension. Let's call the 4GB addressable memory the "virtual memory space". This memory is, of course, mapped to physical memory by means of the MMU, generating page faults whenever an illegal page (whether because it is unmapped or because it is with invalid permissions) is accessed. This allows the kernel to swap out some physical memory area, and replace it with a new area from disk. This is how virtual memory works. PAE is but an extension to the virtual memory technique, but using unaddressable memory instead of the disk. The machine has 64GB of physical memory, but can only actually address 4GB at a time. Pages of physical memory are swapped in and out of the addressable PHYSICAL range by means of using the PAE, and then, using the MMU, into virtual space. So each of the 64GB physical memory is given a 4GB physical address (not concurrently, of course), and then given a 4GB virtual address for the sake of the actual running processes. Except we have a problem. Each time we need to switch between user space and kernel space, we need to have the kernel ready and available to us. This must be the case so we can actually handle whatever it is that triggered the move (hardware interrupt, software interrupt or trap). The way we do that is by keeping the entire memory allocated to the kernel (code + data) mapped to the top area of the virtual memory addresses, no matter where we are in the system. Whether we are in kernel space, or each and every running user space process, we always keep the kernel at the same addresses. Of course, if we are in user space we mark the addresses as non-readable, non-writeable, but that's ok, because we can tell the MMU that a certain page is only read/writeable if the CPU is in Ring 0, and the CPU automatically enters ring 0 in case of an interrupt (of any kind). Problem solved. Except there is one problem with this scenario. This scenario means that whatever memory we reserve for the kernel is subtracted from the *virtual* address space available to user space. Once we decide that the kernel reserves 1GB of addresses for its own use, no matter what program is running, these addresses can never be used for anything else, regardless of how much physical memory is available in the machine. So, what have we got so far? We have 4GB of address space, which represents the absolute maximum any user space program can hope to address simultaneously. No matter how much the actual machine has, a single process cannot hope to address (directly) more than 4GB of memory. We further reduce this number by allocating some of the addresses to the kernel, creating a split between user space and kernel space *address space*. How much do we split? Windows splits at 2GB boundary by default. This means that each user space program has a maximum of 2GB of memory available, and the kernel has 2GB of memory as well. We call this a 2/2 split. Linux, by default, allocates 1GB to the kernel, which leaves 3GB to each user space program. We call this a 3/1 split. Now here's the problem. Sometimes, when there is too much memory in the machine (using PAE), it may turn out that 1GB is not enough to keep track of what virtual address for which process belongs to which physical address. Merely managing the physical memory requires an overhead, and with too much overhead, 1GB is not enough. There are two possible solutions to this problem. The first is to increase the amount of memory allocated to the kernel. We could, for example, switch from allocating 1GB to the kernel in a 3/1 split to allocating 2GB to the kernel in a 2/2 split (like Windows). This, however, leads to the following absurd: the more physical memory you, the less memory each user space program can use! To avoid this problem, the 4/4 split was invented. What it does, basically, is to not keep the kernel's memory mapped during user space execution. In other words, each time an interrupt arrives, the kernel switches (I haven't looked at the actual code, but I'm assuming through a tiny piece of code that is constantly mapped) the MMU tables, as if a context switch occurred. This, of course, results in higher costs for calling kernel code, but allows us to allocate the entire 4GB address space to the kernel, while allocating the entire 4GB address space to each user space program. According to Noam, this is what "hugemem" means (I don't know whether that is the case). This is called the 4/4 split. Luckily, it is fairly simple to test whether a 4/4 split patch is installed in your kernel. All you have to do is try and allocate memory from the area usually reserved for the kernel image. Attached is a small program to tell you just that. If you compile and run it on 4/4 machines, or on machines with 64 bit kernels, it will tell you it managed to allocate the memory. If you run it on 3/1 machines, it will give you an address that is lower than the address you requested. > - Noam Shachar