On 25 Mar 2003, at 19:28, Terry Lambert wrote: > Basically, you don't really care about pv_entry_t's, you care > about KVA space, and running out of it. > > In a previous posting, you suggested increasing KVA_PAGES fixed > the problem, but caused a pthreads problem.
Will running out of KVA space indirectly cause PV Entries to hit its limit as shown in sysctl vm.zone? To my knowledge, I've never seen a panic on this system directly resulting from running out of KVA space. They've all been traced back to running out of available PV Entries. I'm invariably hitting the panic in pmap_insert_entry() and I only get the panic when I run out of available PV Entries. I've seen nothing to indicate that running out of KVA space is causing the panics, though I'm still learning the ropes of the BSD memory management code and recognize that there are many interactions with different portions of the memory management code that could have unforeseen results. Regarding the other thread you mentioned, increasing KVA_PAGES was just a way to make it possible to squeeze a higher PV Entry limit out of the system because it would allow a higher value for PMAP_SHPGPERPROC while still allowing the system to boot. I have not determined if it "fixed the problem" because I had to revert to an old kernel when MySQL wigged out on boot, apparently due to the threading issue in 4.7 that shows up with increased KVA_PAGES. I never got a chance to increase PMAP_SHPGPERPROC after increasing KVA_PAGES because MySQL is an important service on this system and I had to get it back up and running. > What you meant to say is that it caused a Linux threads kernel > module mailbox location problem for the user space Linux threads > library. In other words, it's because you are using the Linux > threads implementation, that you have this problem, not > FreeBSD's pthreads. I may have misspoken in the previous thread about pthreads having a problem when KVA_PAGES was increased. I was referencing a previous thread in which the author stated pthreads had a problem when KVA_PAGES was increased and had assumed that the author knew what he was talking about. At any rate, this was apparently patched and included into the RELENG_4 tree after 4.7- RELEASE. I plan on grabbing RELENG_4_8 once it's officially released. That should give me room to play with KVA_PAGES, if necessary, without breaking MySQL. Also worth reiterating is that resource usage by Apache is the source of the panics. The version I'm using is 1.3.27, so it doesn't even make use of threading, at least not like Apache 2.0. I would just switch to Apache 2.0, but it doesn't support all the modules we need yet. Threads were only an issue with MySQL when KVA_PAGES>256, which doesn't appear to be related to the panics happening while KVA_PAGES=256. > In any case, the problem you are having is because the uma_zalloc() > (UMA) allocator is feeling KVA space pressure. > > One way to move this pressure somewhere else, rather than dealing with > it in an area which results in a panic on you because the code was not > properly retrofit for the limitations of UMA, is to decide to > preallocate the UMA region used for the "PV ENTRY" zone. I haven't read up on that section of the source, but I'll go do so now and determine if the changes you suggested would help in this case. I know in some other posts you're a strong advocate for mapping all physical RAM into KVA right up front rather than messing around with some subset of physical RAM getting mapped into KVA. That approach seems to make sense, at least for large memory systems, if I understand all the dynamics of the situation correctly. > The way to do this is to modify /usr/src/sys/i386/i386/pmap.c > at about line 122, where it says: > > #define MINPV 2048 > If I read the code correctly in pmap.c, MINPV just guarantees that the system will have at least *some* PV Entries available by preallocating the KVA (28 bytes each on my system) for those PV Entries specified by MINPV. See the section of /usr/src/sys/i386/i386/pmap.c labelled "init the pv free list". I'm not certain it makes a lot of sense to preallocate KVA space for 11,113,502 PV Entries when we don't appear to be completely KVA starved. As I understand it (and as you seem to have suggested), increasing MINPV would only be useful if we were running out of KVA due to other KVA consumers (like buffers, cache, mbuf clusters, and etc.) before we could get enough PV Entries on the "free" list. I don't believe that is what's happening here. Here's some sysctl's that are pertinent: vm.zone_kmem_kvaspace: 350126080 vm.kvm_size: 1065353216 vm.kvm_free: 58720256 vm.zone_kmem_kvaspace indicates (if I understand it correctly) that kmem_alloc() allocated about 334MB of KVA at boot. vm.kvm_free indicates that KVM is only pressured after the system has been running awhile. The sysctl's above were read after running for about 90 minutes after a reboot during non-peak usage hours. At that time, there were 199MB allocated to buffers, 49MB allocated to cache, and 353MB wired. During peak usage, we will typically have 199MB allocated to buffers, ~150MB allocated to cache, and 500MB to 700MB wired. If I understand things correctly, that would mean we're peaking around the 1GB KVM mark and there's probably some recycling of memory used by cache to free up KVM for other uses when necessary. However, I don't believe we're putting so much pressure on KVA/KVM as to run out of 28 byte chunks for PV Entries to be made. Assuming, once again, that I understand things correctly, if we were putting that much pressure on KVA/KVM, cache would go nearer to zero while the system attempted to make room for those 28 byte PV Entries. Even during peak usage and just prior to panic, the system still has over 100MB of cache showing. I have a 'systat -vm' from a few seconds prior to one of the panics that showed over 200MB of KVM free. So, I don't think the memory allocation in KVA/KVM associated with PV Entries is the culprit of our panics. Here's a copy of one of the panics and the trace I did on it. Fatal trap 12: page fault while in kernel mode mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 fault virtual address = 0x4 fault code = supervisor write, page not present instruction pointer = 0x8:0xc02292bd stack pointer = 0x10:0xed008e0c frame pointer = 0x10:0xed008e1c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 61903 (httpd) interrupt mask = net tty bio cam <- SMP: XXX trap number = 12 panic: page fault mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 boot() called on cpu#1 Instruction pointer trace: # nm -n /kernel | grep c02292bd # nm -n /kernel | grep c02292b # nm -n /kernel | grep c02292 c022929c t pmap_insert_entry exact line number of instruction: ---------------------------------- (kgdb) l *pmap_insert_entry+0x21 0xc02292bd is in pmap_insert_entry (/usr/src/sys/i386/i386/pmap.c:1636). 1631 int s; 1632 pv_entry_t pv; 1633 1634 s = splvm(); 1635 pv = get_pv_entry(); 1636 pv->pv_va = va; 1637 pv->pv_pmap = pmap; 1638 pv->pv_ptem = mpte; 1639 1640 TAILQ_INSERT_TAIL(&pmap->pm_pvlist, pv, pv_plist); The instruction pointer is always the same on these panics and is almost invariably in a httpd process during the panic. My interpretation is that it is actually failing on line 1635 of pmap.c in get_pv_entry(). Here's the code for get_pv_entry(): get_pv_entry(void) { pv_entry_count++; if (pv_entry_high_water && (pv_entry_count > pv_entry_high_water) && (pmap_pagedaemon_waken == 0)) { pmap_pagedaemon_waken = 1; wakeup (&vm_pages_needed); } return zalloci(pvzone); } Now, unless it does it somewhere else, there is no bounds checking on pv_entry_count in that function. So, when the pv_entry_count exceeds the limit on PV Entries (pv_entry_max as defined in pmap_init2() in pmap.c), it just panics with a "page not present" when it goes to process line 1636 because it is impossible for a page to be present for a PV Entry with that pv_entry_count number being greater than pv_entry_max as defined in pmap_init2() in pmap.c. I suppose, that if nobody is worried about this issue, then a quick and dirty way to handle it would be to add bounds checking to pv_entry_count in get_pv_entry() and if pv_entry_count is outside the bounds, then produce a panic with a more informative message. At least, with a useful panic, the problem would be readily identified on other systems and you guys would have a better opportunity to see how many other people run into this issue. Now, that's my synopsis of the problem, though I'm still a newb with regard to my understanding of the BSD memory management system. Based on the information I've given you, do you still think this panic was caused by running out of KVA/KVM? If I'm wrong, I'd love to know it so I can revise my understanding of what is going on to cause the panic. For now, I've solved the problem by limiting the number of Apache processes that are allowed to run based on my calculations of how many PV Entries are required by each child process, but it's painful to have all that RAM and not be able to put it to use because of an issue in the memory management code that shows up on large memory systems (>2GB). IMHO, Apache shouldn't be able crash an OS before it ever starts using swap. The only reason the problem doesn't show on systems with the typical amounts of RAM (2GB or less) is that if those systems ran Apache like we do, they'd spiral to a crash as swap usage increased and eventually swap was completely filled. Sincerely, Andrew Kinney President and Chief Technology Officer Advantagecom Networks, Inc. http://www.advantagecom.net _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"