-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/18/11 20:52, Lars Heidieker wrote: > On 04/20/11 03:22, YAMAMOTO Takashi wrote: >> hi, > >> Hi, > >> On 04/14/11 09:05, YAMAMOTO Takashi wrote: >>>>> why do you want to make subr_kmem use uvm_km directly? >>>>> to simplify the code? >>>>> i don't want to see that change, unless there's a clear benefit. >>>>> >> The reason was to simplify the code, yes, and reduce redundancy >> because in the current implementation the vmem allocates PAGE_SIZE >> memory from the uvm_km backend for requests <= PAGE_SIZE not utilizing >> the vacache and more importantly vmem is essentially just taking the >> address allocations made by uvm_map. >> With the changes I see about 15% less kernel map entries. >>>>> let me explain some background. currently there are a number of >>>>> kernel_map related problems: >>>>> >>>>> A-1. vm_map_entry is unnecessarily large for KVA allocation purpose. >>>>> >>>>> A-2. kernel-map-entry-merging is there to solve A-1. but it introduced >>>>> the allocate-for-free problem. ie. to free memory, you might need to >>>>> split map-entries thus allocate some memory. >>>>> >>>>> A-3. to solve A-2, there is map-entry-reservation mechanism. it's >> complicated >>>>> and broken. >>>>> >>>>> B. kernel fault handling is complicated because it needs memory allocation >>>>> (eg. vm_anon) which needs some trick to avoid deadlock. >>>>> >>>>> C. KVA allocation is complicated because it needs memory allocation >>>>> (eg. vm_map_entry) which needs some trick to avoid deadlock. >>>>> >>>>> the most of the above can be solved by separating KVA allocation and >>>>> kernel fault handling. (except C, which will be merely moved to a >>>>> different place.) >>>>> >> A-1 with vmem_btag being slightly less then half the size of >> vm_map_entry... >> A-2 solves A1 but A-3 solves A2 with the pitfall of reintroducing a >> part of A1 as we still have less map entries in the map but we don't >> save memory as all the entries not in the map cached aside for >> potential merging. >> In this sense it seems broken to me and that it is complicated. >> Reducing the overall allocated map_entries will help here, as vacaches do. > >> C seems to be inevitable it's only a question where it happens... > >> B is a result of having pageable memory, which can fault and >> non-pageable memory in the same map, with the need to allocated >> non-pageable memory in the event of a page fault. > >>>>> i implemented subr_vmem so that eventually it can be used as the primary >>>>> KVA allocator. ie. when allocating from kernel_map, allocate KVA from >>>>> kernel_va_arena first and then, if and only if necessary, register it to >>>>> kernel_map for fault handling. it probably allows us to remove VACACHE >>>>> stuff, too. kmem_alloc will be backed by a vmem arena which is backed by >>>>> kernel_va_arena. >>>>> >> Originally I thought about two options with option one being what my >> patch does and two: > >> If vmem is made the primary kva allocator, we should carve out a >> kernel heap entirely controlled by vmem, probably one special >> vm_map_entry in the kernel_map that spans the heap or a submap that >> never has any map_entries. >> Essentially separating pageable and non-pageable memory allocations, >> this would allow for removing the vacaches in the kernel-maps as well >> as the map-entry-reservation mechanism. > >> Questions that follow: >> - how to size it probably..... > >>> is this about limiting total size for a particular allocation? > >> - this might be the kmem_map? or two heaps an interrupt safe one and >> one non interrupt safe? > >>> becuase kernel_va_arena would be quantum cache disabled, >>> most users would use another arena stacked on it. >>> (like what we currently have as kmem_arena.) >>> interrupt-safe allocations can either use kernel_va_arena directly or >>> have another arena eg. kmem_arena_intrsafe. > > >> I think having two "allocators" (vmem and the vm_map_(entries) itself) >> controlling the kernel_map isn't a good idea as both have to be in >> sync, at least every allocation that is made by vm_map_entries need to >> be made in vmem as well. There is no clear responsibility for either. > >>> i agree that having two allocator for KVA is bad. >>> my idea is having just one. (kernel_va_arena) >>> no allocation would be made by vm_map_entries for kernel_map. >>> kernel_map is kept merely for fault handling. > >>> essentially kva allocation would be: > >>> va = vmem_alloc(kernel_va_arena, ...); >>> if (pageable) >>> create kernel_map entry for the va >>> else >>> ... >>> return va; > > >> Option two is more challenging and will solve problems B and As while >> option one solves most of the As leaving B untouched. > >>> sure, it's more challenging and involves more work. >>> (so it hasn't finished yet. :-) > >>> YAMAMOTO Takashi > > >> Lars > > > Hi, > > I've made some progress in exploring both options further. > Two patches implementing either option: > a) http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-uvm-extent.patch > b) > http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-vmem-uvm-extent.patch > > Option a has extended kva caches for both kernel_map and kmem_map with > interfaces to it that are used by kmem(9), malloc(9) and pool(9) with > the exception that the pool_allocator_meta goes directly to the > kmem_map. (This means malloc(9) and kmem(9) use kva caches resulting in > a lower vm_map_entry count) > > Option b has one vm_map_entry in the kernel_map spawning the > kernel_heap, which in turn is controlled by vmem(9). > There are the heap_arena from wich the heap_va_arena (with quantum > caches) imports as well as a internal arena for vmems meta data. > On top of the heap_va_arena are interfaces used by kmem(9), malloc(9) > and pool(9) with the pool meta data allocator going to the vmems meta arena. > Originally I had another arena on top of the heap_va_arena, with backed > the virtual memory with physical pages on import and from with > malloc(9), kmem(9) and pool(9) allocated, lets call this option c. > I replaced this arena with interface functions for efficiency reasons. > > Findings after having run the system for a while and having about 1.1gig > in the pool(9)s: > Option a: about 30000 allocated kernel map_entries (not in the map but > allocated) > Option b: about 100000 allocated boundary tags. > Option c: about 400000 allocated boundary tags. > > With boundary tags beeing about half the size of vm_map_entries the vmem > version uses slightly more memory but not so much. > > Both versions use a modified kmem(9) that interfaces either with vmem or > the extended kva caches, which has page_aligned memory for allocations > of page_size and larger and cache_line aligned allocations for > allocations between cache_line size and page_size. > This should resolve some problems xen-kernels do have. > > The vmem versions isn't quit finished the vmem_size function required by > zfs needs to be adapted etc. (And malloc(9) is just replaced by some > arena and not gathering statistics anymore...) > > So far the status report. > > Greetings, > Lars
Hi, I suggest to use option a for the time being and once option b is ready to replace the uvm_km* and it's kva-caches with the vmem implementation. This will give use the benefits of fewer vm_map_entries and a kmem(9) that does page_aligned alloctions. Lars - -- - ------------------------------------ Mystische Erklärungen: Die mystischen Erklärungen gelten für tief; die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind. -- Friedrich Nietzsche [ Die Fröhliche Wissenschaft Buch 3, 126 ] -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk3XZjgACgkQcxuYqjT7GRYn8wCfavqCfWGyQcxMpxVHJGRVZWZc 5MEAoKWH96l5euHeoe1NVVE7CEwKGEtV =NqCo -----END PGP SIGNATURE-----
