Re: NFS Client error
On Monday 08 March 2010 5:59:29 pm vol...@vwsoft.com wrote: > On 03/08/10 12:16, Giulio Ferro wrote: > > Freebsd 8 stable amd64 > > > > It mounts different file systems by NFS (with locking) on a > > data server directly connected (gigabit) to the server > > > > Apache running in a several jails on those nfs folders. > > > > Now and then I get huge slow-down. When I look in the logs > > I get thousand of lines like these: > > Mar 5 11:50:52 virt2 kernel: vm_fault: pager read error, pid 46487 (httpd) > > Mar 5 11:50:52 virt2 kernel: pid 46487 (httpd), uid 80: exited on > > signal 11 > > > > > > What should I do? > > Giulio, > > it seems this is anyhow not related to network (nfs) operations. It's > looking like a problem in the VM. I think it makes sense to have a look > at the httpd.core file if the binary has been linked with debugging > symbols turned on. Also I think at first, it may not hurt to look at > vmstat -m output. > > You may want to change ${subject} and post to stable@ to drive more > attention to your problem. That's not quite true. If you take a page fault on a mmap'd file backed by NFS (e.g. an executable or shared library) and an NFS READ RPC to satisfy the page fault fails, then you could get this error. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NFS Client error
On 03/09/10 13:44, John Baldwin wrote: > On Monday 08 March 2010 5:59:29 pm vol...@vwsoft.com wrote: >> On 03/08/10 12:16, Giulio Ferro wrote: >>> Freebsd 8 stable amd64 >>> >>> It mounts different file systems by NFS (with locking) on a >>> data server directly connected (gigabit) to the server >>> >>> Apache running in a several jails on those nfs folders. >>> >>> Now and then I get huge slow-down. When I look in the logs >>> I get thousand of lines like these: >>> Mar 5 11:50:52 virt2 kernel: vm_fault: pager read error, pid 46487 > (httpd) >>> Mar 5 11:50:52 virt2 kernel: pid 46487 (httpd), uid 80: exited on >>> signal 11 >>> >>> >>> What should I do? >> Giulio, >> >> it seems this is anyhow not related to network (nfs) operations. It's >> looking like a problem in the VM. I think it makes sense to have a look >> at the httpd.core file if the binary has been linked with debugging >> symbols turned on. Also I think at first, it may not hurt to look at >> vmstat -m output. >> >> You may want to change ${subject} and post to stable@ to drive more >> attention to your problem. > > That's not quite true. If you take a page fault on a mmap'd file backed by > NFS (e.g. an executable or shared library) and an NFS READ RPC to satisfy the > page fault fails, then you could get this error. > John, thank you for pointing that out. I've forgotten the mmap'ing of files over nfs as a possible source of that problem. With 8-stable I'm seeing mbufs leaking with nfs operation. It may or may not be related to Giulio's problem. Volker ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Extremely slow boot on VMWare with Opteron 2352 (acpi?)
I'm troubleshooting a pretty weird problem with running FreeBSD 8.0 (amd64) inside VMware ESX/ESXi servers. We've got a wide range of physical servers running identical copies of VMware and identical FreeBSD virtual machines. Everything works fine on all of our servers for Windows and Linux VMs, but FreeBSD running on Opteron 2352 physical servers takes an average of about 20 minutes to boot. Normally I would chalk this up to being a VMware bug, but the situation that's actually occurring is somewhat interesting. If I boot up on an Opteron 2218 system, it boots normally. If I boot the exact same VM moved to a 2352, I get: acpi0: on motherboard PCIe: Memory Mapped configuration base @ 0xe000 (very long pause) ioapic0: routing intpin 9 (ISA IRQ 9) to lapic 0 vector 48 acpi0: [MPSAFE] acpi0: [ITHREAD] then booting normally. The pause is between 1 and 60 minutes (somewhat variable, and i'm not sure on what). After it eventually boots, everything seems fine. Doing a diff between the 2218(-) and 2352(+) servers' verbose boots, I see: -ACPI timer: 0/90 0/23 1/1 1/1 1/1 1/1 0/12 1/1 1/1 1/1 -> 7 +ACPI timer: 0/162 0/1546 0/1119 0/150 0/165 0/778 0/123 0/203 0/83 0/93 -> 0 This looks more interesting. If I'm reading acpi_timer_test() correctly, the ACPI timer isn't particularly good on the 2218, but is flat out unusable on the 2352. I don't know if this is a symptom or the actual cause of the problem, though. Disabling ACPI results in an instant boot up, but the SCSI PCI device isn't found which makes it kinda pointless. Before I delve too deeply into this, does this ring any bells to anyone? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
southbridge recognition
Hi! For some driver enhancements, I need to decide (by code) which southbridge (Intel, AMD is all that matters) the driver is facing. What's the best (portable wise) way to distinguish the chipset? Intel supports two pages of 128 byte CMOS RAM, AMD supports 1 page of 256 byte (addressing is different). Is there any way to query the CMOS RAM size? I failed to find a way while reading Intel & AMD chipset documentation. Older chipsets supported only 64 (very old) or 128 byte. Recent (as of for the last 15 years or so) chipsets supports more. As our current nvram(4) driver only works with 128 byte RAM size, is anybody interested in seeing the nvram(4) driver enhanced for extended memory areas? I do have working code but that assumes an Intel ICH or 440LX chipset (fails for SB{67]xx for some reason :). Thank you for any pointers! Volker ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: southbridge recognition
Can you look at the device/vendor IDs of pci device 0:31:0? That's always worked for me, but I've only ever done it on Intel platforms so I'm not sure if it works on AMD chipsets. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Extremely slow boot on VMWare with Opteron 2352 (acpi?)
On Tuesday 09 March 2010 3:40:26 pm Kevin Day wrote: > > I'm troubleshooting a pretty weird problem with running FreeBSD 8.0 (amd64) inside VMware ESX/ESXi servers. We've got a wide range of physical servers running identical copies of VMware and identical FreeBSD virtual machines. Everything works fine on all of our servers for Windows and Linux VMs, but FreeBSD running on Opteron 2352 physical servers takes an average of about 20 minutes to boot. Normally I would chalk this up to being a VMware bug, but the situation that's actually occurring is somewhat interesting. > > If I boot up on an Opteron 2218 system, it boots normally. If I boot the exact same VM moved to a 2352, I get: > > acpi0: on motherboard > PCIe: Memory Mapped configuration base @ 0xe000 >(very long pause) > ioapic0: routing intpin 9 (ISA IRQ 9) to lapic 0 vector 48 > acpi0: [MPSAFE] > acpi0: [ITHREAD] > > then booting normally. It's probably worth adding some printfs to narrow down where the pause is happening. This looks to be all during the acpi_attach() routine, so maybe you can start there. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Extremely slow boot on VMWare with Opteron 2352 (acpi?)
On Mar 9, 2010, at 4:27 PM, John Baldwin wrote: > On Tuesday 09 March 2010 3:40:26 pm Kevin Day wrote: >> >> >> If I boot up on an Opteron 2218 system, it boots normally. If I boot the > exact same VM moved to a 2352, I get: >> >> acpi0: on motherboard >> PCIe: Memory Mapped configuration base @ 0xe000 >> (very long pause) >> ioapic0: routing intpin 9 (ISA IRQ 9) to lapic 0 vector 48 >> acpi0: [MPSAFE] >> acpi0: [ITHREAD] >> >> then booting normally. > > It's probably worth adding some printfs to narrow down where the pause is > happening. This looks to be all during the acpi_attach() routine, so maybe > you can start there. Okay, good pointer. This is what I've narrowed down: acpi_enable_pcie() calls pcie_cfgregopen(). It's called here with pcie_cfgregopen(0xe000, 0, 255). inside pcie_cfgregopen, the pause starts here: /* XXX: We should make sure this really fits into the direct map. */ pcie_base = (vm_offset_t)pmap_mapdev(base, (maxbus + 1) << 20); pmap_mapdev calls pmap_mapdev_attr, and in there this evaluates to true: /* * If the specified range of physical addresses fits within the direct * map window, use the direct map. */ if (pa < dmaplimit && pa + size < dmaplimit) { so we call pmap_change_attr which called pmap_change_attr_locked. It's changing 0x1000 bytes starting at 0xff00e000. The very last line before returning from pmap_change_attr_locked is: pmap_invalidate_cache_range(base, tmpva); And this is where the delay is. This is calling MFENCE/CLFLUSH in a loop 8 million times. We actually had a problem with CLFLUSH causing panics on these same CPUs under Xen, which is partially why we're looking at VMware now. (see kern/138863). I'm wondering if VMware didn't encounter the same problem and replace CLFLUSH with a software emulated version that is far slower... based on the speed is probably invalidating the entire cache. A quick change to pmap_invalidate_cache_range to just clear the entire cache if the area being cleared is over 8MB seems to have fixed it. i.e.: else if (cpu_feature & CPUID_CLFSH) { to else if ((cpu_feature & CPUID_CLFSH) && ((eva-sva) < (2<<22))) { However, I'm a little blurry on if everything leading to this point is correct. It's ending up with 256MB of memory for the pci area, which seems really excessive. Is the problem just that it wants room for 256 busses, or...? Anyone know this code path well enough to know if this is deviating from the norm? -- Kevin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Extremely slow boot on VMWare with Opteron 2352 (acpi?)
256MB is correct. The PCI standard allows for up to 256 buses, each with up to 32 slots, and each slot can have up to 8 functions. PCIe devices have a full 4096 bytes worth of configuration registers. Multiply all that and you get 256MB. Also, keep in mind that it's not allocating 256MB of memory; it's allocating 256MB of address space and memory-mapping the configuration registers in that space. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
VMDirectPath with FreeBSD 8 VM under ESXi 4.0
I wasn't 100% on where this should go, but -hackers seemed like a reasoned starting point. My overall objective is to setup a ZFS fileserver VM, and for my first attempt I am trying to use VMDirectPath (ie: PCI pass-through) with a FreeBSD 8.0 VM under ESXi 4 to pass-through the moterboard chipset SATA controller (and when I expand in the future, the SAS controller I get). Unfortunately, whenever I add the mapped PCI device to the VM, it powers itself off about halfway through the boot sequence. I have confirmed it's not a fundamental problem by trying Linux and OpenSolaris VMs - both can see the PCI device (an Intel 3420 chipset SATA controller) and the drives attached to it. This problem only occurs with the FreeBSD 8.0 (and 7.3RC2) VMs. I've also tried booting up the FreeBSD installer DVD on the bare hardware, to make sure it's not a problem with that particular controller. The relevant part of the vmware.log that is generated is: Code: Sep 19 05:19:26.676: vcpu-0| PCIPassthru: 000:31.2 : barSize: 2048 is not pgsize multiple Sep 19 05:19:26.677: vcpu-0| PCIPassthru: 000:31.2 : barSize: 2048 is not pgsize multiple Sep 19 05:19:26.677: vcpu-0| ASSERT bora/vmcore/vmx/main/physMem.c:2148 bugNr=254266 Sep 19 05:19:30.295: vcpu-0| Backtrace: Sep 19 05:19:30.295: vcpu-0| Backtrace[0] 0x5e521d88 eip 0xbbf58ed Sep 19 05:19:30.295: vcpu-0| Backtrace[1] 0x5e5221c8 eip 0xb7f405c Sep 19 05:19:30.295: vcpu-0| Backtrace[2] 0x5e522218 eip 0xb9cafca Sep 19 05:19:30.295: vcpu-0| Backtrace[3] 0x5e522248 eip 0xb9b929e Sep 19 05:19:30.295: vcpu-0| Backtrace[4] 0x5e5222a8 eip 0xb9e92fd Sep 19 05:19:30.295: vcpu-0| Backtrace[5] 0x5e5222d8 eip 0xb9e9442 Sep 19 05:19:30.295: vcpu-0| Backtrace[6] 0x5e5222e8 eip 0xb9b8c5d Sep 19 05:19:30.295: vcpu-0| Backtrace[7] 0x5e5223c8 eip 0xb8efea1 Sep 19 05:19:30.295: vcpu-0| Backtrace[8] 0x5e5224b8 eip 0x173a24fb Sep 19 05:19:30.295: vcpu-0| Backtrace[9] eip 0x17489e3e Sep 19 05:19:30.295: vcpu-0| SymBacktrace[0] 0x5e521d88 eip 0xbbf58ed in function (null) in object /bin/vmx loaded at 0xb795000 Sep 19 05:19:30.295: vcpu-0| SymBacktrace[1] 0x5e5221c8 eip 0xb7f405c in function Panic in object /bin/vmx loaded at 0xb795000 Sep 19 05:19:30.295: vcpu-0| SymBacktrace[2] 0x5e522218 eip 0xb9cafca in function (null) in object /bin/vmx loaded at 0xb795000 Sep 19 05:19:30.295: vcpu-0| SymBacktrace[3] 0x5e522248 eip 0xb9b929e in function (null) in object /bin/vmx loaded at 0xb795000 Sep 19 05:19:30.295: vcpu-0| SymBacktrace[4] 0x5e5222a8 eip 0xb9e92fd in function (null) in object /bin/vmx loaded at 0xb795000 Sep 19 05:19:30.295: vcpu-0| SymBacktrace[5] 0x5e5222d8 eip 0xb9e9442 in function (null) in object /bin/vmx loaded at 0xb795000 Sep 19 05:19:30.295: vcpu-0| SymBacktrace[6] 0x5e5222e8 eip 0xb9b8c5d in function (null) in object /bin/vmx loaded at 0xb795000 Sep 19 05:19:30.295: vcpu-0| SymBacktrace[7] 0x5e5223c8 eip 0xb8efea1 in function (null) in object /bin/vmx loaded at 0xb795000 Sep 19 05:19:30.295: vcpu-0| SymBacktrace[8] 0x5e5224b8 eip 0x173a24fb in function (null) in object /lib/libpthread.so.0 loaded at 0x1739d000 Sep 19 05:19:30.295: vcpu-0| SymBacktrace[9] eip 0x17489e3e in function clone in object /lib/libc.so.6 loaded at 0x173b8000 Sep 19 05:19:30.295: vcpu-0| Msg_Post: Error Sep 19 05:19:30.295: vcpu-0| [msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-0) Sep 19 05:19:30.295: vcpu-0| ASSERT bora/vmcore/vmx/main/physMem.c:2148 bugNr=254266 Sep 19 05:19:30.295: vcpu-0| [msg.panic.haveLog] A log file is available in "/vmfs/volumes/4aaf3595-47d35fcc-a053-0030489f04bf/FreeBSD 8.0/vmware.log". [msg.panic.haveCore] A core file is available in "/vmfs/volumes/4aaf3595-47d35fcc-a053-0030489f04bf/FreeBSD 8.0/vmx-zdump.003". [msg.panic.requestSupport.withLogAndCore] Please request support and include the contents of the log file and core file. [msg.panic.requestSupport.vmSupport.vmx86] Sep 19 05:19:30.296: vcpu-0| To collect data to submit to VMware support, run "vm-support". Sep 19 05:19:30.296: vcpu-0| [msg.panic.response] We will respond on the basis of your support entitlement. Sep 19 05:19:30.296: vcpu-0| Sep 19 05:19:30.396: vmx| VTHREAD watched thread 4 "vcpu-0" died Sep 19 05:19:30.498: mks| VTHREAD watched thread 0 "vmx" died Sep 19 05:19:30.804: vcpu-1| VTHREAD watched thread 0 "vmx" died The troubleshooting steps I have already tried are: * Using only a single vCPU * Choosing "ACPI Disabled" from the boot menu * Choosing "Safe Mode" from the boot menu There seems to be at least one other person having this problem (http://communities.vmware.com/message/1490705), and given it is a very different PCI device, it seems to me this is probably a generic issue using PCi passthrough and FreeBSD. Does anyone out there have any ideas ? Cheers, CS ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To un
Re: VMDirectPath with FreeBSD 8 VM under ESXi 4.0
In the last episode (Mar 09), Christopher Smith said: > I wasn't 100% on where this should go, but -hackers seemed like a reasoned > starting point. > > My overall objective is to setup a ZFS fileserver VM, and for my first > attempt I am trying to use VMDirectPath (ie: PCI pass-through) with a > FreeBSD 8.0 VM under ESXi 4 to pass-through the moterboard chipset SATA > controller (and when I expand in the future, the SAS controller I get). > Unfortunately, whenever I add the mapped PCI device to the VM, it powers > itself off about halfway through the boot sequence. > > I have confirmed it's not a fundamental problem by trying Linux and > OpenSolaris VMs - both can see the PCI device (an Intel 3420 chipset SATA > controller) and the drives attached to it. This problem only occurs with > the FreeBSD 8.0 (and 7.3RC2) VMs. > > I've also tried booting up the FreeBSD installer DVD on the bare hardware, > to make sure it's not a problem with that particular controller. > > The relevant part of the vmware.log that is generated is: > Code: > > Sep 19 05:19:26.676: vcpu-0| PCIPassthru: 000:31.2 : barSize: 2048 is not > pgsize multiple > Sep 19 05:19:26.677: vcpu-0| PCIPassthru: 000:31.2 : barSize: 2048 is not > pgsize multiple > Sep 19 05:19:26.677: vcpu-0| ASSERT bora/vmcore/vmx/main/physMem.c:2148 > bugNr=254266 > Sep 19 05:19:30.295: vcpu-0| Backtrace: > Sep 19 05:19:30.295: vcpu-0| Backtrace[0] 0x5e521d88 eip 0xbbf58ed > Sep 19 05:19:30.295: vcpu-0| Backtrace[1] 0x5e5221c8 eip 0xb7f405c > Sep 19 05:19:30.295: vcpu-0| Backtrace[2] 0x5e522218 eip 0xb9cafca > Sep 19 05:19:30.295: vcpu-0| Backtrace[3] 0x5e522248 eip 0xb9b929e > Sep 19 05:19:30.295: vcpu-0| Backtrace[4] 0x5e5222a8 eip 0xb9e92fd > Sep 19 05:19:30.295: vcpu-0| Backtrace[5] 0x5e5222d8 eip 0xb9e9442 > Sep 19 05:19:30.295: vcpu-0| Backtrace[6] 0x5e5222e8 eip 0xb9b8c5d > Sep 19 05:19:30.295: vcpu-0| Backtrace[7] 0x5e5223c8 eip 0xb8efea1 > Sep 19 05:19:30.295: vcpu-0| Backtrace[8] 0x5e5224b8 eip 0x173a24fb > Sep 19 05:19:30.295: vcpu-0| Backtrace[9] eip 0x17489e3e > Sep 19 05:19:30.295: vcpu-0| SymBacktrace[0] 0x5e521d88 eip 0xbbf58ed in > function (null) in object /bin/vmx loaded at 0xb795000 > Sep 19 05:19:30.295: vcpu-0| SymBacktrace[1] 0x5e5221c8 eip 0xb7f405c in > function Panic in object /bin/vmx loaded at 0xb795000 > Sep 19 05:19:30.295: vcpu-0| SymBacktrace[2] 0x5e522218 eip 0xb9cafca in > function (null) in object /bin/vmx loaded at 0xb795000 > Sep 19 05:19:30.295: vcpu-0| SymBacktrace[3] 0x5e522248 eip 0xb9b929e in > function (null) in object /bin/vmx loaded at 0xb795000 > Sep 19 05:19:30.295: vcpu-0| SymBacktrace[4] 0x5e5222a8 eip 0xb9e92fd in > function (null) in object /bin/vmx loaded at 0xb795000 > Sep 19 05:19:30.295: vcpu-0| SymBacktrace[5] 0x5e5222d8 eip 0xb9e9442 in > function (null) in object /bin/vmx loaded at 0xb795000 > Sep 19 05:19:30.295: vcpu-0| SymBacktrace[6] 0x5e5222e8 eip 0xb9b8c5d in > function (null) in object /bin/vmx loaded at 0xb795000 > Sep 19 05:19:30.295: vcpu-0| SymBacktrace[7] 0x5e5223c8 eip 0xb8efea1 in > function (null) in object /bin/vmx loaded at 0xb795000 > Sep 19 05:19:30.295: vcpu-0| SymBacktrace[8] 0x5e5224b8 eip 0x173a24fb in > function (null) in object /lib/libpthread.so.0 loaded at 0x1739d000 > Sep 19 05:19:30.295: vcpu-0| SymBacktrace[9] eip 0x17489e3e in > function clone in object /lib/libc.so.6 loaded at 0x173b8000 > Sep 19 05:19:30.295: vcpu-0| Msg_Post: Error > Sep 19 05:19:30.295: vcpu-0| [msg.log.error.unrecoverable] VMware ESX > unrecoverable error: (vcpu-0) > Sep 19 05:19:30.295: vcpu-0| ASSERT > bora/vmcore/vmx/main/physMem.c:2148 bugNr=254266 > Sep 19 05:19:30.295: vcpu-0| [msg.panic.haveLog] A log file is > available in "/vmfs/volumes/4aaf3595-47d35fcc-a053-0030489f04bf/FreeBSD > 8.0/vmware.log". [msg.panic.haveCore] A core file is available in > "/vmfs/volumes/4aaf3595-47d35fcc-a053-0030489f04bf/FreeBSD > 8.0/vmx-zdump.003". [msg.panic.requestSupport.withLogAndCore] Please > request support and include the contents of the log file and core > file. [msg.panic.requestSupport.vmSupport.vmx86] > Sep 19 05:19:30.296: vcpu-0| To collect data to submit to VMware > support, run "vm-support". > Sep 19 05:19:30.296: vcpu-0| [msg.panic.response] We will respond on > the basis of your support entitlement. Looks like you crashed VMWare. That shouldn't happen. Send the log and core file to VMWare support and they should be able to help you. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"