-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/20/11 03:22, YAMAMOTO Takashi wrote: > hi, > > Hi, > > On 04/14/11 09:05, YAMAMOTO Takashi wrote: >>>> why do you want to make subr_kmem use uvm_km directly? >>>> to simplify the code? >>>> i don't want to see that change, unless there's a clear benefit. >>>> > The reason was to simplify the code, yes, and reduce redundancy > because in the current implementation the vmem allocates PAGE_SIZE > memory from the uvm_km backend for requests <= PAGE_SIZE not utilizing > the vacache and more importantly vmem is essentially just taking the > address allocations made by uvm_map. > With the changes I see about 15% less kernel map entries. >>>> let me explain some background. currently there are a number of >>>> kernel_map related problems: >>>> >>>> A-1. vm_map_entry is unnecessarily large for KVA allocation purpose. >>>> >>>> A-2. kernel-map-entry-merging is there to solve A-1. but it introduced >>>> the allocate-for-free problem. ie. to free memory, you might need to >>>> split map-entries thus allocate some memory. >>>> >>>> A-3. to solve A-2, there is map-entry-reservation mechanism. it's > complicated >>>> and broken. >>>> >>>> B. kernel fault handling is complicated because it needs memory allocation >>>> (eg. vm_anon) which needs some trick to avoid deadlock. >>>> >>>> C. KVA allocation is complicated because it needs memory allocation >>>> (eg. vm_map_entry) which needs some trick to avoid deadlock. >>>> >>>> the most of the above can be solved by separating KVA allocation and >>>> kernel fault handling. (except C, which will be merely moved to a >>>> different place.) >>>> > A-1 with vmem_btag being slightly less then half the size of > vm_map_entry... > A-2 solves A1 but A-3 solves A2 with the pitfall of reintroducing a > part of A1 as we still have less map entries in the map but we don't > save memory as all the entries not in the map cached aside for > potential merging. > In this sense it seems broken to me and that it is complicated. > Reducing the overall allocated map_entries will help here, as vacaches do. > > C seems to be inevitable it's only a question where it happens... > > B is a result of having pageable memory, which can fault and > non-pageable memory in the same map, with the need to allocated > non-pageable memory in the event of a page fault. > >>>> i implemented subr_vmem so that eventually it can be used as the primary >>>> KVA allocator. ie. when allocating from kernel_map, allocate KVA from >>>> kernel_va_arena first and then, if and only if necessary, register it to >>>> kernel_map for fault handling. it probably allows us to remove VACACHE >>>> stuff, too. kmem_alloc will be backed by a vmem arena which is backed by >>>> kernel_va_arena. >>>> > Originally I thought about two options with option one being what my > patch does and two: > > If vmem is made the primary kva allocator, we should carve out a > kernel heap entirely controlled by vmem, probably one special > vm_map_entry in the kernel_map that spans the heap or a submap that > never has any map_entries. > Essentially separating pageable and non-pageable memory allocations, > this would allow for removing the vacaches in the kernel-maps as well > as the map-entry-reservation mechanism. > > Questions that follow: > - how to size it probably..... > >> is this about limiting total size for a particular allocation? > > - this might be the kmem_map? or two heaps an interrupt safe one and > one non interrupt safe? > >> becuase kernel_va_arena would be quantum cache disabled, >> most users would use another arena stacked on it. >> (like what we currently have as kmem_arena.) >> interrupt-safe allocations can either use kernel_va_arena directly or >> have another arena eg. kmem_arena_intrsafe. > > > I think having two "allocators" (vmem and the vm_map_(entries) itself) > controlling the kernel_map isn't a good idea as both have to be in > sync, at least every allocation that is made by vm_map_entries need to > be made in vmem as well. There is no clear responsibility for either. > >> i agree that having two allocator for KVA is bad. >> my idea is having just one. (kernel_va_arena) >> no allocation would be made by vm_map_entries for kernel_map. >> kernel_map is kept merely for fault handling. > >> essentially kva allocation would be: > >> va = vmem_alloc(kernel_va_arena, ...); >> if (pageable) >> create kernel_map entry for the va >> else >> ... >> return va; > > > Option two is more challenging and will solve problems B and As while > option one solves most of the As leaving B untouched. > >> sure, it's more challenging and involves more work. >> (so it hasn't finished yet. :-) > >> YAMAMOTO Takashi > > > Lars >
Hi, I've made some progress in exploring both options further. Two patches implementing either option: a) http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-uvm-extent.patch b) http://ftp.netbsd.org/pub/NetBSD/misc/para/kmem-pool-vmem-uvm-extent.patch Option a has extended kva caches for both kernel_map and kmem_map with interfaces to it that are used by kmem(9), malloc(9) and pool(9) with the exception that the pool_allocator_meta goes directly to the kmem_map. (This means malloc(9) and kmem(9) use kva caches resulting in a lower vm_map_entry count) Option b has one vm_map_entry in the kernel_map spawning the kernel_heap, which in turn is controlled by vmem(9). There are the heap_arena from wich the heap_va_arena (with quantum caches) imports as well as a internal arena for vmems meta data. On top of the heap_va_arena are interfaces used by kmem(9), malloc(9) and pool(9) with the pool meta data allocator going to the vmems meta arena. Originally I had another arena on top of the heap_va_arena, with backed the virtual memory with physical pages on import and from with malloc(9), kmem(9) and pool(9) allocated, lets call this option c. I replaced this arena with interface functions for efficiency reasons. Findings after having run the system for a while and having about 1.1gig in the pool(9)s: Option a: about 30000 allocated kernel map_entries (not in the map but allocated) Option b: about 100000 allocated boundary tags. Option c: about 400000 allocated boundary tags. With boundary tags beeing about half the size of vm_map_entries the vmem version uses slightly more memory but not so much. Both versions use a modified kmem(9) that interfaces either with vmem or the extended kva caches, which has page_aligned memory for allocations of page_size and larger and cache_line aligned allocations for allocations between cache_line size and page_size. This should resolve some problems xen-kernels do have. The vmem versions isn't quit finished the vmem_size function required by zfs needs to be adapted etc. (And malloc(9) is just replaced by some arena and not gathering statistics anymore...) So far the status report. Greetings, Lars -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk3UFYQACgkQcxuYqjT7GRby8QCfX+aS5U4PdfLcPTzsCP7LSww6 LJkAoLn+KcK+51I575vLnyX1P83gmyHi =QwUo -----END PGP SIGNATURE-----
