Random 'Connection reset' issues between jails on same host
Hi all, We're trying to implement our puppet infrastructure, and have discovered something strange about TCP connections between jails on the same host. As our jails haven't generally been doing a lot of connections between each other, this issue hasn't popped up before. We have two 100% equal host systems, on FreeBSD 8.2-RELEASE-p4. These are 8-core Intel systems, with 16GB RAM each. I have just upgraded one of the two systems to 9.0-RELEASE, and it shows the same problem. When the puppetmaster jail is running on the same host as the jail running puppet agent, connections from the puppet agent randomly fails with 'Connection reset by peer'. This happens at random stages of configuration sync. Now if either of the jails are moved to another system (jail stop, zfs snaphot, zfs send/recv, jail start) on the same physical network, there are no such problems. It is not a hardware issue, as this happens no matter which of the two hosts we use. If both puppetmaster and puppet agent reside on the same physical box, the errors will show up. There used to be a somewhat similar problem with FTP between jails on the same host, but this was taken care of some time after 8.0-RELEASE IIRC. That problem manifested itself in a combination of random connection failures (had to try 2-3 times to establish a connection) and very slow transfer rates (at most 150kbyte/s between jails on the same host, but >50mbyte/s between jails on different hosts on the same network). Has anyone seen this before? Is there anything I have missed, sysctls I should set/adjust? The /etc/rc.conf settings for the jails are very simple - the following differing from the default: jail_sysvipc_allow="YES" jail_mount_enable="YES" jail_devfs_enable="YES" /etc/sysctl.conf contains the following jail-related: security.jail.enforce_statfs=0 security.jail.mount_allowed=1 security.jail.allow_raw_sockets=1 Thanks, /Eirik___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Random 'Connection reset' issues between jails on same host
On Jan 15, 2012, at 18:44, Eirik Øverby wrote: > Hi all, > > We're trying to implement our puppet infrastructure, and have discovered > something strange about TCP connections between jails on the same host. As > our jails haven't generally been doing a lot of connections between each > other, this issue hasn't popped up before. > > We have two 100% equal host systems, on FreeBSD 8.2-RELEASE-p4. These are > 8-core Intel systems, with 16GB RAM each. I have just upgraded one of the two > systems to 9.0-RELEASE, and it shows the same problem. > > When the puppetmaster jail is running on the same host as the jail running > puppet agent, connections from the puppet agent randomly fails with > 'Connection reset by peer'. This happens at random stages of configuration > sync. Now if either of the jails are moved to another system (jail stop, zfs > snaphot, zfs send/recv, jail start) on the same physical network, there are > no such problems. It is not a hardware issue, as this happens no matter which > of the two hosts we use. If both puppetmaster and puppet agent reside on the > same physical box, the errors will show up. Replying to myself here: Assignig a cpuset with a single CPU to the jail with puppetmaster seems to cure the symptom. I've made a few thousand connects now and no failures so far. Repeatable on 8 and 9. This is obviously only a workaround - but may give some hints as to where the problem is. /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Random 'Connection reset' issues between jails on same host
Hi all, We're trying to implement our puppet infrastructure, and have discovered something strange about TCP connections between jails on the same host. As our jails haven't generally been doing a lot of connections between each other, this issue hasn't popped up before. We have two 100% equal host systems, on FreeBSD 8.2-RELEASE-p4. These are 8-core Intel systems, with 16GB RAM each. When the puppetmaster jail is running on the same host as the jail running puppet agent, connections from the puppet agent randomly fails with 'Connection reset by peer'. This happens at random stages of configuration sync. Now if either of the jails are moved to another system (jail stop, zfs snaphot, zfs send/recv, jail start) on the same physical network, there are no such problems. It is not a hardware issue, as this happens no matter which of the two hosts we use. If both puppetmaster and puppet agent reside on the same physical box, the errors will show up. There used to be a somewhat similar problem with FTP between jails on the same host, but this was taken care of some time after 8.0-RELEASE IIRC. That problem manifested itself in a combination of random connection failures (had to try 2-3 times to establish a connection) and very slow transfer rates (at most 150kbyte/s between jails on the same host, but >50mbyte/s between jails on different hosts on the same network). I am going to try to repeat this on 9.0-RELEASE - but in the meantime, has anyone seen this before? Is there anything I have missed, sysctls I should set/adjust? The /etc/rc.conf settings for the jails are very simple - the following differing from the default: jail_sysvipc_allow="YES" jail_mount_enable="YES" jail_devfs_enable="YES" /etc/sysctl.conf contains the following jail-related: security.jail.enforce_statfs=0 security.jail.mount_allowed=1 security.jail.allow_raw_sockets=1 Thanks, /Eirik___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mbuf leakage with nfs/zfs?
On 27. feb. 2010, at 20.38, Jeremy Chadwick wrote: > On Sat, Feb 27, 2010 at 08:21:05PM +0100, Gerrit Kühn wrote: >> On Sat, 27 Feb 2010 15:15:52 +0100 Willem Jan Withagen >> wrote about Re: mbuf leakage with nfs/zfs?: >> >> WJW> > 81492/2613/84105 mbufs in use (current/cache/total) >> WJW> > 80467/2235/82702/128000 mbuf clusters in use >> WJW> > (current/cache/total/max) 80458/822 mbuf+clusters out of packet >> WJW> > secondary zone in use (current/cache) >> >> WJW> Over the night I only had rsync and FreeBSD nfs traffic. >> WJW> >> WJW> 45337/2828/48165 mbufs in use (current/cache/total) >> WJW> 44708/1902/46610/262144 mbuf clusters in use (current/cache/total/max) >> WJW> 44040/888 mbuf+clusters out of packet secondary zone in use >> WJW> (current/cache) >> >> After about 24h I now have >> >> 128320/2630/130950 mbufs in use (current/cache/total) >> 127294/1200/128494/512000 mbuf clusters in use (current/cache/total/max) >> 127294/834 mbuf+clusters out of packet secondary zone in use (current/cache) > > Follow-up regarding my server statistics shown here: > > http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055458.html > > I just pulled the statistics on the same servers for comparison (then > vs. now). > > RELENG_7 amd64 2010/01/09 -- primary HTTP, pri DNS, SSH server + ZFS > > 515/1930/2445 mbufs in use (current/cache/total) > 512/540/1052/25600 mbuf clusters in use (current/cache/total/max) > 1152K/6394K/7547K bytes allocated to network (current/cache/total) > > RELENG_7 amd64 2010/01/11 -- secondary DNS, MySQL, dev box + ZFS > > 514/1151/1665 mbufs in use (current/cache/total) > 512/504/1016/25600 mbuf clusters in use (current/cache/total/max) > 1152K/2203K/3356K bytes allocated to network (current/cache/total) > > RELENG_7 i386 2008/04/19 -- secondary HTTP, SSH server, heavy memory I/O > > 515/820/1335 mbufs in use (current/cache/total) > 513/631/1144/25600 mbuf clusters in use (current/cache/total/max) > 1154K/2615K/3769K bytes allocated to network (current/cache/total) > > RELENG_8 amd64 2010/02/02 -- central backups + NFS+ZFS-based filer > > 1572/3423/4995 mbufs in use (current/cache/total) > 1539/3089/4628/25600 mbuf clusters in use (current/cache/total/max) > 3471K/7449K/10920K bytes allocated to network (current/cache/total) > > So, not much difference. > > I should point out that the NFS+ZFS-based filer doesn't actually do its > backups using NFS; it uses rsnapshot (rsync) over SSH. There is intense > network I/O during backup time though, depending on how much data there > is to back up. The NFS mounts (on the clients) are only used to provide > a way for people to get access to their nightly backups in a convenient > way; it isn't used very heavily. > > I can do something NFS-intensive on any of the above clients if people > want me to kind of testing. Possibly an rsync with a source of the NFS > mount and a destination of the local disk would be a good test? Let me > know if anyone's interested in me testing that. I've had a discussion with some folks on this for a while. I can easily reproduce this situation by mounting a FreeBSD ZFS filesystem via NFS-UDP from an OpenBSD machine. Telling the OpenBSD machine to use TCP instead of UDP makes the problem go away. Other FreeBSD systems mounting the same share, either using UDP or TCP, does not cause the problem to show up. A patch was suggested by Rick Macklem, but that did not solve the issue: http://lists.freebsd.org/pipermail/freebsd-current/2009-December/014181.html /Eirik > -- > | Jeremy Chadwick j...@parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mbuf leakage with nfs/zfs?
On 27. feb. 2010, at 22.38, Gerrit Kühn wrote: > On Sat, 27 Feb 2010 21:32:39 +0100 Eirik Øverby wrote > about Re: mbuf leakage with nfs/zfs?: > > E> I've had a discussion with some folks on this for a while. I can easily > E> reproduce this situation by mounting a FreeBSD ZFS filesystem via > E> NFS-UDP from an OpenBSD machine. Telling the OpenBSD machine to use TCP > E> instead of UDP makes the problem go away. > > So we see this problem with udp clients from OpenBSD and Linux. I have not had the opportunity to test with Linux or anything else. Could try from Windows, but not sure I want to get my hands THAT dirty. > E> Other FreeBSD systems mounting the same share, either using UDP or TCP, > E> does not cause the problem to show up. > > As Daniel reported he saw the problem with FBSD 8-stable: Which version > was the FBSD-client that worked for you with udp? 7.1, 7.2, 8.0-RCsomething and 8.0-RELEASE - no problems with either. > E> A patch was suggested by Rick Macklem, but that did not solve the issue: > E> > http://lists.freebsd.org/pipermail/freebsd-current/2009-December/014181.html > > Yeah, I also found and tried this on Friday - unfortunately without any > success, the leakage is still there. > > > cu > Gerrit > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.0 RC1/SPARC64 panic in boot
Hi list, by disabling the isp driver (set hint.isp.o.disabled=1), the system comes up. This of course denies us access to the external disk array hosted by the internal QLogic controller, but pinpoints the problem. We tried setting hint.isp.0.prefer_iomap=1, which made no difference (though by reading the code, I don't see that it ever used this). Can anyone help us out here? Thanks, /Eirik On Jan 21, 2008, at 11:23 AM, Anders Gulden Olstad wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 SUN Ultra 2 (2x400Mhz USII, 1500MB RAM) Got the following panic during boot panic: trap: fast data access mmu miss cpuid = 0 This happened after upgrade from 6.2 -> 7.0 RC1. Tried to boot from the CDROM as well, with same result = = = = = = = = = = == Console log: {0} ok boot cdrom Boot device: /sbus/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0:f File and args: FreeBSD/sparc64 boot block Boot path: /[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0:f Boot loader: /boot/loader Consoles: Open Firmware console Booting with sun4u support. Boot path set to /[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0:a FreeBSD/sparc64 bootstrap loader, Revision 1.0 ([EMAIL PROTECTED], Mon Dec 24 10:09:43 UTC 2007) bootpath="/[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0:a" Loading /boot/defaults/loader.conf /boot/kernel/kernel data=0x6eee48+0x72c68 syms=[0x8+0x76878+0x8+0x6663e] \ Hit [Enter] to boot immediately, or any other key for command prompt. Booting [/boot/kernel/kernel]... nothing to autoload yet. jumping to kernel entry at 0xc007. stray vector interrupt 2033 Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-RC1 #0: Tue Dec 25 02:17:08 UTC 2007 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC real memory = 1610612736 (1536 MB) avail memory = 1550393344 (1478 MB) cpu0: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) cpu1: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413, REGOPS_FUNC) nexus0: sbus0: mem 0x1fe-0x1fe7fff irq 2036,2037,2038,2021,2026,2039 on nexus0 sbus0: clock 25.000 MHz sbus dvma: DVMA map: 0xfc00 to 0x sbus0: [GIANT-LOCKED] sbus0: [ITHREAD] sbus0: [GIANT-LOCKED] sbus0: [ITHREAD] initializing counter-timer Timecounter "counter-timer" frequency 100 Hz quality 100 auxio0: mem 0x190 on sbus0 sbus0: mem 0xc00-0xc0001ff irq 2020 type unknown (no driver attached) sbus0: mem 0-0x7,0x138-0x13f type unknown (no driver attached) sbus0: mem 0x140-0x147 irq 2025 type block (no driver attached) eeprom0: mem 0x120-0x1201fff on sbus0 eeprom0: model mk48t59 scc0: mem 0x110-0x113 irq 2024 on sbus0 scc0: [FILTER] uart0: on scc0 uart0: [FILTER] uart0: console (9600,n,8,1) uart1: on scc0 uart1: [FILTER] scc1: mem 0x100-0x103 irq 2024 on sbus0 scc1: [FILTER] uart2: on scc1 uart2: [FILTER] uart2: keyboard (1200,n,8,1) uart2: keyboard not present uart3: on scc1 uart3: [FILTER] sbus0: mem 0x130-0x137 type unknown (no driver attached) sbus0: mem 0x1304000-0x1304002 type unknown (no driver attached) esp0: mem 0x880-0x88f,0x881-0x881003f irq 2016 on sbus0 esp0: [ITHREAD] esp0: FAS366/HME, 40MHz, SCSI ID 7 hme0: mem 0x8c0-0x8c00107,0x8c02000-0x8c03fff,0x8c04000-0x8c05fff, 0x8c06000-0x8c07fff,0x8c07000-0x8c0701f irq 2017 on sbus0 miibus0: on hme0 nsphy0: PHY 1 on miibus0 nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto hme0: Ethernet address: 08:00:20:91:d2:79 hme0: [ITHREAD] sbus0: mem 0xc80-0xc80001b irq 2018 type unknown (no driver attached) isp0 mem 0x1-0x1044f irq 2003 on sbus0 isp0: [ITHREAD] panic: trap: fast data access mmu miss cpuid = 0 Uptime: 1s Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... Resetting ... -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHlHKUMVyOPWVstbURAlkZAKC26W5268Q/+cJc6a3ImsqG8kvAIACfUFvP mElTmJup2GOa5GCcVhOKXFs= =7rUk -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To
Re: 7.0 RC1/SPARC64 panic in boot
On Jan 22, 2008, at 7:23 PM, Marius Strobl wrote: On Tue, Jan 22, 2008 at 07:16:16AM +0100, Eirik verby wrote: Hi list, by disabling the isp driver (set hint.isp.o.disabled=1), the system comes up. This of course denies us access to the external disk array hosted by the internal QLogic controller, but pinpoints the problem. We tried setting hint.isp.0.prefer_iomap=1, which made no difference (though by reading the code, I don't see that it ever used this). Can anyone help us out here? Scott, could this be due to a missing MFC of isp_sbus.c rev. 1.36? If that would be the case I'd be most happy to hear that. I'll also be more than happy to test, and can do so on relatively short notice (at least for another few hours). We have, for the record, gone through some basic troubleshooting: Replaced memory (as this error also can show up under Solaris and is usually an indicator of bad memory), replaced SCSI controller with another one (still isp driven), and testing various device hints - suffice to say we have wasted our time so far ;) Holding breath... /Eirik Marius On Jan 21, 2008, at 11:23 AM, Anders Gulden Olstad wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 SUN Ultra 2 (2x400Mhz USII, 1500MB RAM) Got the following panic during boot panic: trap: fast data access mmu miss cpuid = 0 This happened after upgrade from 6.2 -> 7.0 RC1. Tried to boot from the CDROM as well, with same result = = = = = = = = = = = = Console log: {0} ok boot cdrom Boot device: /sbus/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0:f File and args: FreeBSD/sparc64 boot block Boot path: /[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0:f Boot loader: /boot/loader Consoles: Open Firmware console Booting with sun4u support. Boot path set to /[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0:a FreeBSD/sparc64 bootstrap loader, Revision 1.0 ([EMAIL PROTECTED], Mon Dec 24 10:09:43 UTC 2007) bootpath="/[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0:a" Loading /boot/defaults/loader.conf /boot/kernel/kernel data=0x6eee48+0x72c68 syms=[0x8+0x76878+0x8+0x6663e] \ Hit [Enter] to boot immediately, or any other key for command prompt. Booting [/boot/kernel/kernel]... nothing to autoload yet. jumping to kernel entry at 0xc007. stray vector interrupt 2033 Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-RC1 #0: Tue Dec 25 02:17:08 UTC 2007 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC real memory = 1610612736 (1536 MB) avail memory = 1550393344 (1478 MB) cpu0: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) cpu1: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set registered firmware set ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413, REGOPS_FUNC) nexus0: sbus0: mem 0x1fe-0x1fe7fff irq 2036,2037,2038,2021,2026,2039 on nexus0 sbus0: clock 25.000 MHz sbus dvma: DVMA map: 0xfc00 to 0x sbus0: [GIANT-LOCKED] sbus0: [ITHREAD] sbus0: [GIANT-LOCKED] sbus0: [ITHREAD] initializing counter-timer Timecounter "counter-timer" frequency 100 Hz quality 100 auxio0: mem 0x190 on sbus0 sbus0: mem 0xc00-0xc0001ff irq 2020 type unknown (no driver attached) sbus0: mem 0-0x7,0x138-0x13f type unknown (no driver attached) sbus0: mem 0x140-0x147 irq 2025 type block (no driver attached) eeprom0: mem 0x120-0x1201fff on sbus0 eeprom0: model mk48t59 scc0: mem 0x110-0x113 irq 2024 on sbus0 scc0: [FILTER] uart0: on scc0 uart0: [FILTER] uart0: console (9600,n,8,1) uart1: on scc0 uart1: [FILTER] scc1: mem 0x100-0x103 irq 2024 on sbus0 scc1: [FILTER] uart2: on scc1 uart2: [FILTER] uart2: keyboard (1200,n,8,1) uart2: keyboard not present uart3: on scc1 uart3: [FILTER] sbus0: mem 0x130-0x137 type unknown (no driver attached) sbus0: mem 0x1304000-0x1304002 type unknown (no driver attached) esp0: mem 0x880-0x88f,0x881-0x881003f irq 2016 on sbus0 esp0: [ITHREAD] esp0: FAS366/HME, 40MHz, SCSI ID 7 hme0: mem 0x8c0-0x8c00107,0x8c02000-0x8c03fff,0x8c04000-0x8c05fff, 0x8c06000-0x8c07fff,0x8c07000-0x8c0701f irq 2017 on sbus0 miibus0: on hme0 nsphy0: PHY 1 on miibus0 nsphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto hme0: Ethernet address: 08:00:20:91:d2:79 hme0: [ITHREAD] sbus
Re: 7.0 RC1/SPARC64 panic in boot
Will apply the patch and reboot in an hour or two. The isp interface is only used for an external array, so we disable it and boot from internal drives on esp. Thanks! /Eirik On Jan 23, 2008, at 7:32 AM, Scott Long wrote: Eirik Øverby wrote: On Jan 22, 2008, at 7:23 PM, Marius Strobl wrote: On Tue, Jan 22, 2008 at 07:16:16AM +0100, Eirik verby wrote: Hi list, by disabling the isp driver (set hint.isp.o.disabled=1), the system comes up. This of course denies us access to the external disk array hosted by the internal QLogic controller, but pinpoints the problem. We tried setting hint.isp.0.prefer_iomap=1, which made no difference (though by reading the code, I don't see that it ever used this). Can anyone help us out here? Scott, could this be due to a missing MFC of isp_sbus.c rev. 1.36? If that would be the case I'd be most happy to hear that. I'll also be more than happy to test, and can do so on relatively short notice (at least for another few hours). We have, for the record, gone through some basic troubleshooting: Replaced memory (as this error also can show up under Solaris and is usually an indicator of bad memory), replaced SCSI controller with another one (still isp driven), and testing various device hints - suffice to say we have wasted our time so far ;) Are you able to compile a new kernel without having to install first? if so, apply the attached patch and let me know if it works. Scott Index: isp_sbus.c === RCS file: /usr1/ncvs/src/sys/dev/isp/isp_sbus.c,v retrieving revision 1.35 retrieving revision 1.36 diff -u -r1.35 -r1.36 --- isp_sbus.c 11 May 2007 13:47:28 - 1.35 +++ isp_sbus.c 5 Nov 2007 11:22:18 - 1.36 @@ -29,7 +29,7 @@ */ #include -__FBSDID("$FreeBSD: src/sys/dev/isp/isp_sbus.c,v 1.35 2007/05/11 13:47:28 mjacob Exp $"); +__FBSDID("$FreeBSD: src/sys/dev/isp/isp_sbus.c,v 1.36 2007/11/05 11:22:18 scottl Exp $"); #include #include @@ -327,21 +327,26 @@ /* * Make sure we're in reset state. */ + ISP_LOCK(isp); isp_reset(isp); if (isp->isp_state != ISP_RESETSTATE) { isp_uninit(isp); + ISP_UNLOCK(isp); goto bad; } isp_init(isp); if (isp->isp_role != ISP_ROLE_NONE && isp->isp_state != ISP_INITSTATE) { isp_uninit(isp); + ISP_UNLOCK(isp); goto bad; } isp_attach(isp); if (isp->isp_role != ISP_ROLE_NONE && isp->isp_state != ISP_RUNSTATE) { isp_uninit(isp); + ISP_UNLOCK(isp); goto bad; } + ISP_UNLOCK(isp); return (0); bad: ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 7.0 RC1/SPARC64 panic in boot [SOLVED]
On Jan 23, 2008, at 7:32 AM, Scott Long wrote: Eirik Øverby wrote: On Jan 22, 2008, at 7:23 PM, Marius Strobl wrote: On Tue, Jan 22, 2008 at 07:16:16AM +0100, Eirik verby wrote: Hi list, by disabling the isp driver (set hint.isp.o.disabled=1), the system comes up. This of course denies us access to the external disk array hosted by the internal QLogic controller, but pinpoints the problem. We tried setting hint.isp.0.prefer_iomap=1, which made no difference (though by reading the code, I don't see that it ever used this). Can anyone help us out here? Scott, could this be due to a missing MFC of isp_sbus.c rev. 1.36? If that would be the case I'd be most happy to hear that. I'll also be more than happy to test, and can do so on relatively short notice (at least for another few hours). We have, for the record, gone through some basic troubleshooting: Replaced memory (as this error also can show up under Solaris and is usually an indicator of bad memory), replaced SCSI controller with another one (still isp driven), and testing various device hints - suffice to say we have wasted our time so far ;) Are you able to compile a new kernel without having to install first? if so, apply the attached patch and let me know if it works. Works very well, thanks a bunch! Will this make it into 7-RELEASE? /Eirik Scott Index: isp_sbus.c === RCS file: /usr1/ncvs/src/sys/dev/isp/isp_sbus.c,v retrieving revision 1.35 retrieving revision 1.36 diff -u -r1.35 -r1.36 --- isp_sbus.c 11 May 2007 13:47:28 - 1.35 +++ isp_sbus.c 5 Nov 2007 11:22:18 - 1.36 @@ -29,7 +29,7 @@ */ #include -__FBSDID("$FreeBSD: src/sys/dev/isp/isp_sbus.c,v 1.35 2007/05/11 13:47:28 mjacob Exp $"); +__FBSDID("$FreeBSD: src/sys/dev/isp/isp_sbus.c,v 1.36 2007/11/05 11:22:18 scottl Exp $"); #include #include @@ -327,21 +327,26 @@ /* * Make sure we're in reset state. */ + ISP_LOCK(isp); isp_reset(isp); if (isp->isp_state != ISP_RESETSTATE) { isp_uninit(isp); + ISP_UNLOCK(isp); goto bad; } isp_init(isp); if (isp->isp_role != ISP_ROLE_NONE && isp->isp_state != ISP_INITSTATE) { isp_uninit(isp); + ISP_UNLOCK(isp); goto bad; } isp_attach(isp); if (isp->isp_role != ISP_ROLE_NONE && isp->isp_state != ISP_RUNSTATE) { isp_uninit(isp); + ISP_UNLOCK(isp); goto bad; } + ISP_UNLOCK(isp); return (0); bad: ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64 To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Highpoint drivers on 7.0
Hi all, did anyone try the Highpoint RocetRaid drivers (hptmv6.ko) on 7-RC1 or later? I'm considering upgrading one of my servers here, but I need to know if my RAID-controller will work after reinstall.. A shame HPT doesn't release the driver to the community... Thanks, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Highpoint drivers on 7.0
On Jan 25, 2008, at 11:32 PM, Steven Hartland wrote: I would advise contacting them. There support was helpful when I last contacted them and for the card that was involved the did release the code for the driver when enabled us to fix the issues. Actually, the new(?) hptrr driver seems to handle my 2220 just fine! Too bad it's still giant-locked.. /Eirik Regards Steve - Original Message - From: "Alfred Perlstein" <[EMAIL PROTECTED] > * Eirik ?verby <[EMAIL PROTECTED]> [080125 12:53] wrote: Hi all, did anyone try the Highpoint RocetRaid drivers (hptmv6.ko) on 7- RC1 or later? I'm considering upgrading one of my servers here, but I need to know if my RAID-controller will work after reinstall.. A shame HPT doesn't release the driver to the community... This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: /usr/bin/objformat is missing
On Jan 29, 2008, at 4:49 PM, Chris H. wrote: Quoting pluknet <[EMAIL PROTECTED]>: On 29/01/2008, Chris H. <[EMAIL PROTECTED]> wrote: Quoting Peter Jeremy <[EMAIL PROTECTED]>: > On Mon, Jan 28, 2008 at 02:41:56PM -0800, Chris H. wrote: >> In case you're wondering, objformat /is/ required - at leas for >> www/apache13-ssl. > touching objformat is not a good way. Try this instead, last time it helped me (taken from memory): --- Makefile.orig 2008-01-29 13:38:43.0 +0300 +++ Makefile2008-01-29 13:41:19.0 +0300 @@ -5,7 +5,7 @@ # and apache-ssl port by Mark Murray <[EMAIL PROTECTED] >. # Oh, and with a little bit of help from Ben :) # -# $FreeBSD: ports/www/apache13-ssl/Makefile,v 1.121 2007/06/17 16:59:26 anders Exp $ +# $FreeBSD$ PORTNAME= apache+ssl PORTVERSION= ${APACHE_VERSION}.${APACHE_SSL_VERSION} @@ -48,7 +48,7 @@ APACHE_HARD_SERVER_LIMIT?= 512 -CFLAGS+= -I${OPENSSLINC}/openssl +CFLAGS+= -I${OPENSSLINC}/openssl -Wl, I noticed this arg in another thread regarding this issue: --export-dynamic Thank you for posting this. Although I had success building and running the apache13-ssl port after applying my objformat /hackery/. I'm now running into troubles adding all of the php5 extensions I need to use. I had no difficulties with php5 itself. But after a certain point in the list, apache exits on signal 11 (core dumped). Ermm... this was exactly the same trouble I started with, with the exception that it was on signal 10. I have had problems with PHP modules in the past; often they can end up crashing when loaded in the wrong order, for instance. I also had major trouble getting the imagick module to work at all lately. Try re-ordering things in your extensions.ini, maybe commenting out all modules and re-enabling one at a time. /Eirik So, with any luck (fingers crossed), I'll get past this limitation with your patch and /yet/ another make deinstall apache13-ssl && all-added-mod_whatevers && all-php5-extensions && php5. make install everything-all-over-again. :/ Looks like the bugfest mark announced earlier isn't over just yet. :) Thanks again for taking the time to respond and share your patch. --Chris H CONFIGURE_ARGS+= \ --prefix=${PREFIX} \ --server-uid=www \ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " -- panic: kernel trap (ignored) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Kernel panic on 7-PRERELEASE
Hi, Like on 6.x, I'm seeing frequent kernel panics when using my bge NICs. If I plug the cable into the fxp NIC all is fine. Dual opteron, Tyan K8S Pro (2882) board. I cannot see any pattern as to what is causing the panics, however I have obtained kernel dumps on a freshly built kernel (with -g), unfortunately without WITNESS or INVARIANTS. I've attached a screenshot of the KVM console after the crash. Given that I have a kernel dump, what do I do to extract useful information, if at all possible? Thanks! /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 4.8 -> 4.11 in-place upgrade ?
On Jan 30, 2008, at 3:36 PM, Jeremy Chadwick wrote: On Wed, Jan 30, 2008 at 09:01:40AM -0500, Robin Blanchard wrote: I just inherited a remote 4.8 box...Having not used RELENG_4 in eons, just wanted to check if it's safe to "live upgrade" (make installworld/kernel ; mergemaster) directly to 4.11 (world/kernel already built; waiting to install). /usr/src/UPDATING doesn't seem to indicate this is out of the question. It would be best for you to just schedule a time to update the box entirely to RELENG_7, or at least RELENG_6. This may take less time and induce less pain than any oddities which might appear from a 4.8->4.11 upgrade. FWIW; I did a 4.7->4.11 upgrade not too long ago, and didn't bump into any issues. YMMV of course. Whether or not it's worth it is another question entirely, and one you'll have to figure out for yourself :) I moved everything I could to 6-RCsomething when that was teh h0tness, and haven't looked back since (did some poking into 5-land as well, went back to 4.x badly burned). /Eirik -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
WITNESS weirdness
Hi, not sure if this is a problem, but: # sysctl -a | grep witness debug.witness.child_cnt: 161 debug.witness.child_free_cnt: 3935 debug.witness.sleep_cnt: 235 debug.witness.spin_cnt: 0 debug.witness.free_cnt: 789 debug.witness.skipspin: 1 debug.witness.trace: 1 debug.witness.kdb: 1 debug.witness.watch: 1 # sysctl debug.witness.watch=0 debug.witness.watch: 1 -> 0 # sysctl debug.witness.watch=1 debug.witness.watch: 0 sysctl: debug.witness.watch: Invalid argument Am I supposed to be able to turn witness off runtime, but not back on again? /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: WITNESS weirdness
So I need to reboot. Brilliant :) And I thought I was being clever... Using WITNESS to try and help figuring out why bge is crapping out on me all the time, but with WITNESS it's been stable, but oh-so-slow :P /Eirik On Jan 30, 2008, at 9:10 PM, Kris Kennaway wrote: Eirik Øverby wrote: # sysctl debug.witness.watch=1 debug.witness.watch: 0 sysctl: debug.witness.watch: Invalid argument Am I supposed to be able to turn witness off runtime, but not back on again? Yes, that is working as designed. Witness needs to run continuously to track state. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
7.0, amd64: Wrong files installed into jails?
Hi, I've created some jails on FreeBSD 7-RC* now, and I realized there must be some kind of problem when I tried to install and run diablo- jdk 1.5 from the freebsdfoundation packages. It complains about /libexec/ld-elf.so.1: /usr/local/lib/compat/pkg/libz.so.3: unsupported file layout and file(1) returns /usr/local/lib/compat/pkg/libz.so.3: ELF 32-bit LSB shared object, Intel 80386, version 1 (FreeBSD), stripped On the host, which has been upgraded from 6.2 to 7.0-RC1 using cvsup++ +, Java runs just fine, and finds its libraries in /lib (for instance). Presumably because they are still left there: ls -la /lib/libz.* -r--r--r-- 1 root wheel 79824 Jun 16 2005 /lib/libz.so.2 -r--r--r-- 1 root wheel 81448 Apr 28 2007 /lib/libz.so.3 -r--r--r-- 1 root wheel 83648 Jan 28 09:02 /lib/libz.so.4 There are no compat6x-packages installed anywhere, and even installing the compat6x-amd64 package in the jail does not change anything. Does installworld to a "clean" target install the i386 binaries instead of the amd64 binaries to the /usr/local/lib/compat/ tree?? With best regards, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: UFS snapshot weirdness
On Feb 12, 2008, at 1:41 PM, Daniel O'Connor wrote: On Tue, 12 Feb 2008, Eirik Øverby wrote: I am at a total loss here. Is it re-using the first snapshot I ever made of this filesystem, even though I've removed it? Didn't I understand how to create/remove snapshots? Is this a bug? Sure the old md isn't hanging around by mistake or some such? Yes, I am absolutely sure of this. I considered using the snapshot tool, however I need to reduce dependencies to an absolute minimum (as one target environment is very strict on allowing additional software installs).. I use the snapshots to get a consistent file-backup with history. This one puzzles me to no end. /Eirik I have had people recover many files using the snapshot tool in ports (plus a small symlink maker for samba access) and haven't noticed issues like this. On the otherhand I find it can take a long time to make a snapshot (during which time no FS access is allowed). -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
UFS snapshot weirdness
Hi all, I've been making a wrapper script for the backup tool 'duplicity', allowing me to create config files for each resource, wherein I define whether a snapshot should be made prior to backing up the resource or not. Now I find that my snapshots never change The script creates a snapshot, creates md device, mounts it, runs backup against the mounted snapshot, dismounts, removes md device, and rm -f's the snapshot file. The problem is .. Whenever I look into the mounted snapshot, a given directory looks like so: drwxr-xr-x 3 root wheel 512 Jan 29 15:25 . drwxr-xr-x 18 root wheel 512 Jan 29 13:49 .. -rw--- 1 root wheel 1281 Jan 31 17:12 .bash_history -rw-r--r-- 2 root wheel 786 Jan 29 13:00 .cshrc -rw-r--r-- 1 root wheel 143 Jan 29 13:00 .k5login -rw-r--r-- 1 root wheel 293 Jan 29 13:00 .login -rw-r--r-- 2 root wheel 253 Jan 29 13:00 .profile drwxr-xr-x 2 root wheel 512 Jan 29 13:00 .ssh However, when looking into the same directory outside the snapshot, it looks like so: -rw--- 1 root wheel 2961 Feb 12 00:39 .bash_history -rw-r--r-- 2 root wheel786 Jan 29 13:00 .cshrc -rw-r--r-- 1 root wheel143 Jan 29 13:00 .k5login drwx-- 2 root wheel512 Feb 11 16:23 .links -rw-r--r-- 1 root wheel293 Jan 29 13:00 .login -rw-r--r-- 2 root wheel253 Jan 29 13:00 .profile drwxr-xr-x 2 root wheel512 Jan 29 13:00 .ssh -rw-r--r-- 1 root wheel 948424 Feb 11 13:14 bsd-jdk16- patches-3.tar.bz2 -rw-r--r-- 1 root wheel 46938731 Feb 11 16:23 diablo-jdk- freebsd6.amd64.1.5.0.07.01.tbz -rw-r--r-- 1 root wheel2116124 Feb 11 13:11 jdk-6u3-fcs-bin-b05- jrl-24_sep_2007.jar -rw-r--r-- 1 root wheel8608204 Feb 11 13:11 jdk-6u3-fcs- mozilla_headers-b05-unix-24_sep_2007.jar -rw-r--r-- 1 root wheel 116791442 Feb 11 13:15 jdk-6u3-fcs-src-b05- jrl-24_sep_2007.jar The snapshot was made just now, long after those additional files were placed in the snapshot. I am at a total loss here. Is it re-using the first snapshot I ever made of this filesystem, even though I've removed it? Didn't I understand how to create/remove snapshots? Is this a bug? Any input is appreciated. Thanks, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: UFS snapshot weirdness
On Feb 13, 2008, at 9:21 AM, Daniel O'Connor wrote: On Wed, 13 Feb 2008, Eirik Øverby wrote: Yes, I am absolutely sure of this. I considered using the snapshot tool, however I need to reduce dependencies to an absolute minimum (as one target environment is very strict on allowing additional software installs).. I use the snapshots to get a consistent file-backup with history. This one puzzles me to no end. Hmm, that is very odd.. Maybe the FS is stuffed somehow :( I read somewhere else about NFS issues on 7-RC* where snapshots have been used. In particular - and this is something I'm seeing too - changing the exports file or reloading mountd gives the following in messages log: Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for /tmp: Invalid argument Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for /usr: Cross-device link Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for /var: Cross-device link Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for /export/ home: Cross-device link Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for /opt: Cross-device link Can this be related? I'm starting to worry here - what will be the long-term consequences if snapshots are stuck around in this "invisible" state? /Eirik -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: UFS snapshot weirdness
On Feb 23, 2008, at 4:46 PM, Guido Falsi wrote: Eirik Øverby wrote: I read somewhere else about NFS issues on 7-RC* where snapshots have been used. In particular - and this is something I'm seeing too - changing the exports file or reloading mountd gives the following in messages log: Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for / tmp: Invalid argument Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for / usr: Cross-device link Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for / var: Cross-device link Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for / export/home: Cross-device link Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for / opt: Cross-device link Can this be related? I'm starting to worry here - what will be the long-term consequences if snapshots are stuck around in this "invisible" state? Ok there is definitely something VERY fishy going on here. I have just removed a lot of data from one of the partitions where I HAD snapshots (they are all gone now, since days). So freespace initially goes up a lot, as expected, then drops to around what it was before the deletion took place. There IS a snapshot being maintained somewhere, even though I have deleted it (using rm -f). What can I do, short of rebooting or remounting the filesystem?? This behavior is also seen on 6.2-RELEASE by the way; entirely different hardware (32bit vs 64bit, scsi vs ide, etc.) /Eirik I have been experiencing these too. But it looks more like a bug in mountd, since it shows up only is snapshots are created with mount. If snapshots are created with mksnap_ffs this does not seem to show up. I still have to make more in depth experiments, but before experimenting by myself I'd like to have some more informed directions on what to experiment. -- Guido Falsi <[EMAIL PROTECTED]> ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED] " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Upgrading to 7.0 - stupid requirements
On Mar 23, 2008, at 08:28, Matthew Seaman wrote: Freddie Cash wrote: All that's really needed is a more formalised process for handling upgrading config files, with as much as possible managed via the ports framework itself. Something that dictates the name of the config file, and that compares the config file from the port against the installed config file (or against an md5 of the port config file) and only replaces it if it is unchanged. Something that is part of the make system. Most ports that install configuration files actually do this already. It's generally why you'll find that a sample configuration file is considered part of the port, but the actuall live configuration file is not. The port will only feel free to meddle with the config file if it is still identical to the sample file. There are a few exceptions to this rule: The courier authdaemon ports, for instance, are notorious for overwriting my carefully-crafted configuration files when upgrading. I loathe those ports (or apps - not sure who's to blame) for that reason alone. In fact, it not only installs a config.dist file (which is fine), but it ALSO overwrites the current config. A cardinal sin, if there ever were any.. Now I must say I'm with the people who think that one should follow the one-port-one-configfile approach; however for a somewhat different reason: The closer a port sticks with the "default" configuration files, or samples if you will, of the software in question, the less FreeBSD-specific knowledge needs to be built to manage the port. If debian splits up the config into a forest of includefiles and symlinks, that might be good for a particular purpose, but it's something I'd prefer to do myself if the need is there. I've done similiar things on some occations, but that is, and IMO should be, "homebrew". Also, making ports adhere to a much stricter configuration regime would make the uptake of new ports slow down considerably. I believe (though I have no numbers to back this up, so it is of course pure speculation) that the large number of ports available is at least partly due to the fact that making an initial port is relatively easy and straight forward. Just my 2 cents. /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Hardware - Sun workstation Ultra 20 and others
On Apr 11, 2008, at 23:07, Peter Jeremy wrote: On Fri, Apr 11, 2008 at 12:57:53PM +0200, Ivan Voras wrote: Does anyone have experience with running Sun's Opteron-based workstation, Ultra 20, 25, 40? Both with FreeBSD and other systems (Linux)? Are they stable, all the drivers are present, etc? I've not used any of these but: 1) The Ultra 25 is UltraSPARC IIIi based. This CPU is not supported by FreeBSD as Sun will not release necessary documentation. I thought this was resolved a while back? In any case, OpenBSD has had USiii support for some time now. I could get my hands on some USiii (and possibly IV) hardware to make available if.. ;) /Eirik 2) Sun states they support both RHEL and SuSE ES on both the U20 and U40 so I would expect they are stable and all hardware supported, at least on those Linuxes. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
carpX: incorrect hash with IP aliases
Hi, whenever I configure an extra IP on one of my CARP interfaces, traffic on that particular subnet slows to a crawl (the primary IP of the interface is the gateway IP), and I get lots of carp4: incorrect hash in dmesg. I see this issue referenced also in http://lists.freebsd.org/pipermail/freebsd-net/2008-March/017160.html and there are suggestions this is a known issue - however I still see it in FreeBSD 7.1 (pfSense 1.2.3-prerelease). I cannot find a PR on this, but my searching skills may be inadequate.. Am I doing something wrong? I tried assigning the alias with both /32 and /24 netmasks. /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: carpX: incorrect hash with IP aliases
On Mar 3, 2009, at 19:23, Scott Ullrich wrote: On Tue, Mar 3, 2009 at 12:52 PM, Max Laier wrote: [snip] Make sure that you are configuring the same aliases with the same netmasks on all members of the carp group - preferably before bringing the interface up for the first time (though it should properly recalculate the hashes as you add aliases). As you seem to be using pfsense you might want to check with them to make sure they have the fix in their build - though I recall it was a joined effort back then. 1.2.3 is based on 7.1 so this patch should be in the base system now. Excellent. And I just found that my second cluster member was not on 1.2.3 ... I'm updating both to the latest snapshot and will be trying again. Thanks. /Eirik Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: carpX: incorrect hash with IP aliases
On Mar 3, 2009, at 19:23, Scott Ullrich wrote: On Tue, Mar 3, 2009 at 12:52 PM, Max Laier wrote: [snip] Make sure that you are configuring the same aliases with the same netmasks on all members of the carp group - preferably before bringing the interface up for the first time (though it should properly recalculate the hashes as you add aliases). As you seem to be using pfsense you might want to check with them to make sure they have the fix in their build - though I recall it was a joined effort back then. 1.2.3 is based on 7.1 so this patch should be in the base system now. Just tested, and this seems to work. Now I just need to figure out how to make sure both carp nodes have the IPs added/removed at ~exactly the same time.. /Eirik Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org " ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ugen and Gemplus SC reader
Hi, whenever I try to use openct/opensc to use my gemplus USB smartcard readers, I get the following in dmesg: ugenioctl: USB_SET_SHORT_XFER, no pipe The readers work fine on MacOS X and (reportedly) Linux, and the driver included in openct should support it. I can't find any PC/SC driver bundles for it though, only the serial readers. Is this a problem in ugen? With best regards, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Performance issues in 5.3-RELEASE.
On Wed, 2004-11-17 at 10:40 +0100, Krzysztof Kowalik wrote: > Hello, > >Recently I took some time to upgrade my home 4.9 system to > 5.3-RELEASE (fortunately, taking full system dump before, so I can > easily get back). In fact just after upgrading I ran into the weird > issue during installation for firefox port. > > When firefox-1.0-source.tar.bz2 is getting untared, the system starts to > be *slow*: any music starts to be jittered and the cursor in X stalls > from time to time for ~1 second. I have reported this on *several* occations over the last year or so, as have others. However, apart from some time early this year/late last year, it doesn't seem to have received due attention. This is a serious problem for anyone using 5.3 on a desktop - which I have very much legitimate reasons for doing - though I haven't really noticed problems on servers due to this. It seems to be linked to the system CPU load (not user CPU load), which would be logical. Everyone seemed to think it was ULE related (and this showed up long before PREEMPTION became an issue), but it is not... Here's to hoping someone looks into it - I certainly ain't capable of doing so ;) /Eirik PS: Apart from that I have to agree with what others have stated: 5.3 is a truly wonderful release... > And I never had this issue before with 4.x serie. > > I tried to boot with an without ACPI, with GENERIC kernel, with my "own" > kernel configuration (GENERIC with removed unused SCSI/RAID/NIC drivers) > both with and without PREEMPTION[1]. Without any visible change in system's > behaviour. > > %uname -a > FreeBSD bzzzt.borys.lan 5.3-RELEASE FreeBSD 5.3-RELEASE #2: Wed Nov 17 > 00:19:56 CET 2004 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/BZZZT i386 > > # atacontrol list > ATA channel 0: > Master: ad0 ATA/ATAPI revision 7 > Slave: ad1 ATA/ATAPI revision 6 > ATA channel 1: > Master: acd0 ATA/ATAPI revision 5 > Slave: acd1 ATA/ATAPI revision 0 > ATA channel 2: > Master: ad4 ATA/ATAPI revision 5 > Slave: ad5 ATA/ATAPI revision 5 > ATA channel 3: > Master: no device present > Slave: no device present > > # atacontrol mode 0 > Master = UDMA100 > Slave = UDMA100 > # atacontrol mode 1 > Master = UDMA33 > Slave = UDMA33 > # atacontrol mode 2 > Master = UDMA100 > Slave = UDMA100 > > dmesgs from ACPI boot on "custom" kernel attached. > > Is there anything I missed and therefore I should try/tune or any > other informations that are needed and I missed them? > > [1] yes, SCHED_4BSD > > Regards, > Krzysztof Kowalik > ___ > [EMAIL PROTECTED] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "[EMAIL PROTECTED]" ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
graid3 - requirements or manpage wrong?
Hi, to the best of my ability I have been investigating the 'real' requirements of a raid-3 array, and cannot see that the following text from graid3(8) cannot possibly be correct - and if it is, then the implementation must be wrong or incomplete (emphasis added): label Create a RAID3 device. The last given component will contain parity data, all the rest - regular data. ***Number of compo- nents has to be equal to 3, 5, 9, 17, etc. (2^n + 1).*** I might be wrong, but I cannot see how a raid-3 array should require (2^n + 1) drives - I am fairly certain I have seen raid-3 arrays consisting of four drives, for example. This is also what I had hoped to accomplish. Anyone care to shed a light on this? I'd prefer to use graid3 (or 5, if there was one) instead of gvinum.. Thanks, /Eirik ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: graid3 - requirements or manpage wrong?
On 24. Nov 2004, at 18:11, Pawel Jakub Dawidek wrote: On Wed, Nov 24, 2004 at 10:54:07AM +0100, Eirik ?verby wrote: +> to the best of my ability I have been investigating the 'real' +> requirements of a raid-3 array, and cannot see that the following text +> from graid3(8) cannot possibly be correct - and if it is, then the +> implementation must be wrong or incomplete (emphasis added): +> +> label Create a RAID3 device. The last given component will contain +> parity data, all the rest - regular data. ***Number of +> compo- +> nents has to be equal to 3, 5, 9, 17, etc. (2^n + 1).*** +> +> I might be wrong, but I cannot see how a raid-3 array should require +> (2^n + 1) drives - I am fairly certain I have seen raid-3 arrays +> consisting of four drives, for example. This is also what I had hoped to +> accomplish. This requirement is because we want sectorsize to be power of 2 (UFS needs it). In RAID3 we want to send every I/O request to all components at once, that's why we need sector size to be N*512, where N is a power of 2 value AND because graid3 uses one parity component we need N+1 providers. OK I see, makes sense. So it's not really a raid3 issue, but an implementation issue. The only problem then is - gvinum being in a completely unusable state (for raid5 anyway), what are my alternatives? I have four 160gb IDE drives, and I want capacity+redundancy. Performance is a non-issue, really. What do I do - in software? /Eirik -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! On 24. Nov 2004, at 18:11, Pawel Jakub Dawidek wrote: On Wed, Nov 24, 2004 at 10:54:07AM +0100, Eirik ?verby wrote: +> to the best of my ability I have been investigating the 'real' +> requirements of a raid-3 array, and cannot see that the following text +> from graid3(8) cannot possibly be correct - and if it is, then the +> implementation must be wrong or incomplete (emphasis added): +> +> label Create a RAID3 device. The last given component will contain +> parity data, all the rest - regular data. ***Number of +> compo- +> nents has to be equal to 3, 5, 9, 17, etc. (2^n + 1).*** +> +> I might be wrong, but I cannot see how a raid-3 array should require +> (2^n + 1) drives - I am fairly certain I have seen raid-3 arrays +> consisting of four drives, for example. This is also what I had hoped to +> accomplish. This requirement is because we want sectorsize to be power of 2 (UFS needs it). In RAID3 we want to send every I/O request to all components at once, that's why we need sector size to be N*512, where N is a power of 2 value AND because graid3 uses one parity component we need N+1 providers. OK I see, makes sense. So it's not really a raid3 issue, but an implementation issue. The only problem then is - gvinum being in a completely unusable state (for raid5 anyway), what are my alternatives? I have four 160gb IDE drives, and I want capacity+redundancy. Performance is a non-issue, really. What do I do - in software? /Eirik -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
asr on amd64
Hi! Daring as I am, here's another attempt at having someone look into the asr driver and why it doesn't work on amd64. I have such a Zero-Channel RAID card laying around collecting dust, whereas it was planned installed in a server here long time ago. I know Scott Long looked into it long ago, and he seems to have been the last one to touch the driver. He indicated a few months ago that he had little time; perhaps things look brighter now that 5.3 is out? I'll cross-post to -current in a few days if I don't hear anything.. Thanks, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Panic: spin lock smp rendezvous ... held too long
Hi all, I just installed 6.2-RELEASE on a Supermicro 6013P-8 server, a dual P4-Xeon 2.4ghz with 4GB ECC memory and an asr driven SCSI RAID controller. It has been working OK (although I suspect the asr driven, being giant-locked, is very inefficient) for a little while, but as I was extracting a bunch of tarballs it paniced like so: spin lock smp rendezvous held by 0xc9d54600 for > 5 seconds panic: spin lock held too long cpuid = 0 I don't have a dump device (though I'm setting that up for the next reboot). However, I have tried turning off HT, to see if that might help. Does this look familiar to anyone? Or do I need to produce more data if it happens again? Thanks, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Panic: spin lock smp rendezvous ... held too long
On Mar 9, 2007, at 03:41, Kris Kennaway wrote: On Fri, Mar 09, 2007 at 12:44:03AM +0100, Eirik ?verby wrote: Hi all, I just installed 6.2-RELEASE on a Supermicro 6013P-8 server, a dual P4-Xeon 2.4ghz with 4GB ECC memory and an asr driven SCSI RAID controller. It has been working OK (although I suspect the asr driven, being giant-locked, is very inefficient) for a little while, but as I was extracting a bunch of tarballs it paniced like so: spin lock smp rendezvous held by 0xc9d54600 for > 5 seconds panic: spin lock held too long cpuid = 0 I don't have a dump device (though I'm setting that up for the next reboot). However, I have tried turning off HT, to see if that might help. Does this look familiar to anyone? Or do I need to produce more data if it happens again? It can mean that something deadlocked. Turning on WITNESS may help to debug this, although it has a large performance impact. I can't turn on WITNESS here "just like that", as I'll need some time to find a replacement server for some critical applications. However, the strange thing is that this server has been running solid as a rock (not one single crash) for 2 years with FreeBSD 4.x on it, so I am fairly sure there is no hardware issue. It crashed today, and I have obtained a dump. I am running 6.2- RELEASE with the stock SMP kernel, and haven't recompiled yet, so I can't seem to find a kernel.debug, but I'm building one now with the 6.2-RELEASE sources, as supplied on the CD. I'm assuming this will give me a useable kernel.debug. Anything in particular I should look for if/when I'm able to peek into the dump with kgdb? thanks, /Eirik Kris PGP.sig Description: This is a digitally signed message part
Re: Panic: spin lock smp rendezvous ... held too long
On Mar 9, 2007, at 03:41, Kris Kennaway wrote: On Fri, Mar 09, 2007 at 12:44:03AM +0100, Eirik ?verby wrote: Hi all, I just installed 6.2-RELEASE on a Supermicro 6013P-8 server, a dual P4-Xeon 2.4ghz with 4GB ECC memory and an asr driven SCSI RAID controller. It has been working OK (although I suspect the asr driven, being giant-locked, is very inefficient) for a little while, but as I was extracting a bunch of tarballs it paniced like so: spin lock smp rendezvous held by 0xc9d54600 for > 5 seconds panic: spin lock held too long cpuid = 0 I don't have a dump device (though I'm setting that up for the next reboot). However, I have tried turning off HT, to see if that might help. Does this look familiar to anyone? Or do I need to produce more data if it happens again? It can mean that something deadlocked. Turning on WITNESS may help to debug this, although it has a large performance impact. Just opened the vmcore file, and this is what I see, with a bt at the end: Unread portion of the kernel message buffer: dev = da0s1f, block = 3802920, fs = /usr panic: ffs_blkfree: freeing free block cpuid = 1 Uptime: 23h59m46s Dumping 3967 MB (3 chunks) chunk 0: 1MB (158 pages) ... ok chunk 1: 3966MB (1015280 pages) 3950 3934 3918 3902 3886 3870 3854 3838 3822 3806 3790 3774 3758 3742 3726 3710 3694 3678 3662 3646 3630 3614 3598 3582 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 06 fault virtual address = 0x18c fault code = supervisor read, page not present instruction pointer = 0x20:0xc04542f4 stack pointer = 0x28:0xe98a3c88 frame pointer = 0x28:0xe98a3c90 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 18 (swi2: cambio) trap number = 12 panic: page fault cpuid = 1 3566 3550 3534 3518 3502 3486 3470 3454 3438 3422 3406 3390 3374 3358 3342 3326 3310 3294 3278 3262 3246 3230 3214 3198 3182 3166 3150 3134 3118 3102 3086 3070 3054 3038 3022 3006 2990 2974 2958 2942 2926 2910 2894 2878 2862 2846 2830 2814 2798 2782 2766 2750 2734 2718 2702 2686 2670 2654 2638 2622 2606 2590 2574 2558 2542 2526 2510 2494 2478 2462 2446 2430 2414 2398 2382 2366 2350 2334 2318 2302 2286 2270 2254 2238 2206 2190 2174 2158 2142 2126 2110 2094 2078 2062 2046 2030 2014 1998 1982 1966 1950 1934 1918 1902 1886 1870 1854 1838 1822 1806 1790 1774 1758 1742 1726 1710 1694 1678 1662 1646 1630 1614 1598 1582 1566 1550 1534 1518 1502 1486 1470 1454 1438 1422 1406 1390 1374 1358 1342 1326 1310 1294 1278 1262 1246 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14 ... ok chunk 2: 1MB (128 pages) #0 doadump () at pcpu.h:165 165 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:165 #1 0xc067550a in boot (howto=260) at /usr/src/sys/kern/ kern_shutdown.c:409 #2 0xc0675831 in panic (fmt=0xc0911dc2 "ffs_blkfree: freeing free block") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0xc07b375e in ffs_blkfree (ump=0xc9607c00, fs=0xc93c5800, devvp=0xc9625110, bno=3802920, size=16384, inum=39400) at /usr/src/sys/ufs/ffs/ffs_alloc.c:1869 #4 0xc07c38c6 in indir_trunc (freeblks=0xcb2bfa00, dbn=15210848, level=0, lbn=12, countp=0xe98b8c6c) at /usr/src/sys/ufs/ffs/ffs_softdep.c:2894 #5 0xc07c32f6 in handle_workitem_freeblocks (freeblks=0xcb2bfa00, flags=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:2744 #6 0xc07c01b1 in process_worklist_item (mp=0xc95d87c8, flags=0) at / usr/src/sys/ufs/ffs/ffs_softdep.c:967 #7 0xc07bfeb2 in softdep_process_worklist (mp=0xc95d87c8, full=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:851 #8 0xc07bfc08 in softdep_flush () at /usr/src/sys/ufs/ffs/ ffs_softdep.c:762 #9 0xc065ec4d in fork_exit (callout=0xc07bfa6c , arg=0x0, frame=0xe98b8d38) at /usr/src/sys/kern/kern_fork.c:821 #10 0xc0879dac in fork_trampoline () at /usr/src/sys/i386/i386/ exception.s:208 PGP.sig Description: This is a digitally signed message part
Weird messages output
Hi all, running 6.1-RELEASE on several HP DL385 servers (identically configured), one of them has recently spat the following out in the / var/log/messages file: .. Mar 10 03:51:24 apphost02 ntpd[445]: kernel time sync enabled 2001 Mar 10 05:02:01 apphost02 kernel: NMI ISA 30, EISA ff Mar 10 05:02:01 apphost02 kernel: k Mar 10 05:02:01 apphost02 kernel: NMIN MIIe SIASA 202r,0 ,E IESIAS A ffnf Mar 10 05:02:01 apphost02 kernel: f Mar 10 05:02:01 apphost02 kernel: Mar 10 05:02:01 apphost02 kernel: el trap 19 with interrupts disabled Mar 10 05:02:01 apphost02 kernel: NMI ISA 20, EISA ff Mar 10 06:08:01 apphost02 ntpd[445]: kernel time sync enabled 6001 .. NMI = non-maskable interrupt, if I remember correctly. However, I have no idea what this means or why it appeared. The status light on the front of the server has lit up red, as opposed to the usual green. All services on the host are running and behaving normally from what I can tell. Any input, anyone? Thanks, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Weird messages output
On 27. mar. 2007, at 15.33, Gavin Atkinson wrote: On Tue, 2007-03-27 at 15:00 +0200, Eirik Øverby wrote: Hi all, running 6.1-RELEASE on several HP DL385 servers (identically configured), one of them has recently spat the following out in the / var/log/messages file: .. Mar 10 03:51:24 apphost02 ntpd[445]: kernel time sync enabled 2001 Mar 10 05:02:01 apphost02 kernel: NMI ISA 30, EISA ff Mar 10 05:02:01 apphost02 kernel: k Mar 10 05:02:01 apphost02 kernel: NMIN MIIe SIASA 202r,0 ,E IESIAS A ffnf Mar 10 05:02:01 apphost02 kernel: f Mar 10 05:02:01 apphost02 kernel: Mar 10 05:02:01 apphost02 kernel: el trap 19 with interrupts disabled Mar 10 05:02:01 apphost02 kernel: NMI ISA 20, EISA ff Mar 10 06:08:01 apphost02 ntpd[445]: kernel time sync enabled 6001 .. NMI = non-maskable interrupt, if I remember correctly. However, I have no idea what this means or why it appeared. The status light on the front of the server has lit up red, as opposed to the usual green. All services on the host are running and behaving normally from what I can tell. I suspect you'll find your (ECC) memory has problems. You are absolutely correct. Further investigation using the ProLiant management tools for FreeBSD revealed serious RAM trouble. Two banks were degraded, so we have now had the modules replaced on-site. Thanks for the tip! Do you happen to know if there are any "generic" tools/daemons available to decipher such NMIs? Perhaps be able to send SNMP traps or something? /Eirik Gavin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Panic: sleeping thread
Hi, ever since 6.1-RELEASE (possibly earlier, not sure) I've been seeing frequent panics on a previously stable (6.0-STABLE) dual opteron server. When I say "previously stable" I mean weeks and months of uptime, and no known non-intended reboots. Now I'm seeing panics on a semi-regular basis, up to 2-3 times per week. The panic goes (dmesg, bt and ps at the end of this message): Sleeping thread (tid 100082, pid 84236) owns a non-sleepable lock panic: sleeping thread cpuid = 0 KDB: enter: panic [thread pid 84235 tid 100474 ] Stopped at kdb_enter+0x2f: nop ...where pid 84236 is a sh instance. I cannot reproduce this on demand, but I usually only have to wait a few days (it'll happen, at the latest, whenever I think it'll survive another evening and go out for a beer...). Calling boot() or reset from db> just causes the box to hang, I have to power cycle it at this point. I do not have a dump device (no swap partitions large enough, known problem, more hardware coming), but I hope the attached information helps. With best regards, /Eirik dmesg: Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-STABLE #2: Wed May 31 20:13:06 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ANDUIN WARNING: debug.mpsafenet forced to 0 as ipsec requires Giant WARNING: MPSAFE network stack disabled, expect reduced performance. ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Opteron(tm) Processor 242 (1595.14-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0xf5a Stepping = 10 Features=0x78bfbffMCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> AMD Features=0xe0500800 real memory = 2147418112 (2047 MB) avail memory = 2061500416 (1966 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-27 on motherboard ioapic2 irqs 28-31 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x5008-0x500b on acpi0 cpu0: on acpi0 acpi_throttle0: on cpu0 cpu1: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 6.0 on pci0 pci3: on pcib1 ohci0: mem 0xfeafc000-0xfeafcfff irq 19 at device 0.0 on pci3 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 3 ports with 3 removable, self powered ohci1: mem 0xfeafd000-0xfeafdfff irq 19 at device 0.1 on pci3 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: on ohci1 usb1: USB revision 1.0 uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 3 ports with 3 removable, self powered atapci0: port 0xb400-0xb407,0xb000-0xb003,0xac00-0xac07,0xa800-0xa803,0xa400-0xa40f mem 0xfeafec00-0xfeafefff irq 19 at device 5.0 on pci3 ata2: on atapci0 ata3: on atapci0 ata4: on atapci0 ata5: on atapci0 pci3: at device 6.0 (no driver attached) fxp0: port 0xbc00-0xbc3f mem 0xfeafb000-0xfeafbfff,0xfeaa-0xfeab irq 18 at device 8.0 on pci3 miibus0: on fxp0 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:e0:81:2a:11:64 fxp0: [GIANT-LOCKED] isab0: at device 7.0 on pci0 isa0: on isab0 atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0 ata0: on atapci1 ata1: on atapci1 pci0: at device 7.2 (no driver attached) pci0: at device 7.3 (no driver attached) pcib2: at device 10.0 on pci0 pci2: on pcib2 ahd0: port 0x9000-0x90ff, 0x9c00-0x9cff mem 0xfc8fc000-0xfc8fdfff irq 24 at device 6.0 on pci2 ahd0: [GIANT-LOCKED] aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs ahd1: port 0x9800-0x98ff, 0x9400-0x94ff mem 0xfc8fe000-0xfc8f irq 25 at device 6.1 on pci2 ahd1: [GIANT-LOCKED] aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs bge0: mem 0xfc8b-0xfc8b,0xfc8a-0xfc8a irq 24 at device 9.0 on pci2 miibus1: on bge0 brgphy0: on miibus1 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:e0:81:2a:59:8c bge0: [GIANT-LOCKED] bge1: mem 0xfc8e-0xfc8e,0xfc8d-0xfc8d irq 25 at device 9.1 on pci2 miibus2: on bge1 brgphy1: on miibus2 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: Ethernet address: 00:e0:81:2a:59:8d bge1: [GIANT-LOCKED] pci0: at device 10.1 (no driver attached) pcib3: at device 11.0 on pci0 pci1: on pcib3 pci0: at device 11.1 (no driver attached) acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on
gmirror oddities
Hi! I've been using gmirror for a while to safeguard my system disks. I have taken the slice-based mirror approach, where I use, say, ad0s1 and ad2s1 as providers. On one of my servers, this seems to be impossible. I create the mirror using ad2s1 first (to keep my system running while I do some of the work), and then I re-initialize ad0s1 (making it exactly the size of ad2s1) before using gmirror insert to add it to the mirror. However, at this point - when doing a gmirror list - it turns out that it never added ad0s1 as a provider, but ad0 itself! As a result, I now have a load of slices (ad0a, ad0b, ad0d, ad0e, ad0f) instead of having the same structure as I have on ad2s1. It's just like ad2s1, just without the "s1" part. I've tried "dd if=/dev/zero of=/dev/ad0 bs=65536" a couple of times, in case some old provider metadata was stored there. I also have exactly the same setup in another server, the only difference being that it behaves as expected.. Am I doing something blatantly wrong here? This IS supposed to work, right? I've even found a very nice description of how to do it at http://people.freebsd.org/~rse/mirror/ confirming that what I'm doing is right. I'm on 5.4-PRERELEASE, but this problem has been there since 5.3-p2 or something, which was when I first tried this. Anyone? Thanks, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Current status of nullfs and/or unionfs?
Hi all, I'm struggling with some hosting environments where I am managing a large number of jails (>100) spread over about a dozen servers. I am starting to see disk space as a real problem, especially given that each physical box needs to be autonomous - i.e. I can't rely on any external storage, and I am limited to 1U and 2U servers. The solution, or at least parts of it, would be to have certain parts of the jail filesystems mounted in via nullfs (acceptable solution) or unionfs (ideal solution). However, ever since FreeBSD 4.10 this has been a major problem, as both filesystems started exhibiting major stability and data integrity issues. Before I start playing with this again, I'd like to know if any work has been done on either of these in 5.x. Specifically, I'm currently running 5.3-p6 or newer on all the systems, and as of yesterday I've been using 5.4-prerelease (cvsup) on a couple of test systems. What can I expect to see when trying nullfs and/or unionfs today? Has anything changed? Do I have even a remote chance of making it work - and if it doesn't work, what are my chances of anyone having time or energy to look into it? I'm an admin only, no coder, otherwise I'd be happy to look into it myself. Thanks, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Current status of nullfs and/or unionfs?
On 06-05-05 09:25, "Danny Braniss" <[EMAIL PROTECTED]> wrote: > >> Interesting approach. Is this with 4.x or 5.x? How do you union-mount /etc >> (mount command/fstab entry)? >> > > been doing it since 4.x (i think x < 9) Any idea how unionfs will behave if stacked (more mounts on top of each other)? I was playing with the thought of having a "template" jail directory which I unionmount into my jails, then perhaps use your trick to union-mount a md device into certain points in the jail. Got a gut feeling about that? /Eirik > in initdiskless (5.x) we have: > > if [ -e /conf/union ]; then > kldload unionfs > mount_md 4096 /conf/etc > chmod 755 /conf/etc > mount_unionfs /conf/etc /etc > ls -R /etc > /dev/null > touch /etc/.sentinel > md_created_etc=created > fi > > danny > > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "[EMAIL PROTECTED]" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Current status of nullfs and/or unionfs?
On 06-05-05 13:14, "Danny Braniss" <[EMAIL PROTECTED]> wrote: >> On 06-05-05 09:25, "Danny Braniss" <[EMAIL PROTECTED]> wrote: >> >>> Interesting approach. Is this with 4.x or 5.x? How do you union-mount /etc (mount command/fstab entry)? >>> >>> been doing it since 4.x (i think x < 9) >> >> Any idea how unionfs will behave if stacked (more mounts on top of each >> other)? I was playing with the thought of having a "template" jail directory >> which I unionmount into my jails, then perhaps use your trick to union-mount >> a md device into certain points in the jail. Got a gut feeling about that? > > i have the feeling that that will get into trouble :-), but im no expert > here. If what you mean is: > > mount_unionfs /md-0 /jail-0 > and then > mount_unionfs /md-1 /jail-0/xyz > > which is not strickly 'stacked', might work and should be easy to try out, but > IMHO, breaks the KISS principle :-) I was more thinking, like, mount_unionfs -b /jails/jail_template /jails/jail-0 mount_unionfs /md-0 /jails/jail-0/etc for example. I could also imagine stacking unionfs on top of nullfs, like mount_nullfs /cdrom/jail_template /jails/jail-0 mount_unionfs /md-0 /jails/jail-0 alternatively mount_unionfs /nfs-0 /jails/jail-0 Sounds weird, I know, but we could use it... > and also, im not sure if: > mkdir /jail-0/xyz > mount_unionfs /md-1 /jail-0/xyz > is the same as the above. > > danny > > > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
unionfs limitations?
Hi, I just started playing with mounting ports into jails using unionfs (mount_unionfs -b /usr/ports_jail /usr/local/jails/jail-0/usr/ports), and many things seem to work fine. However, when trying to install either of mysql41-server or mysql41-client, I see the following: [EMAIL PROTECTED] /usr/ports/databases/mysql41-server# make install ===> Installing for mysql-server-4.1.11_1 ===> mysql-server-4.1.11_1 depends on shared library: mysqlclient.14 - found ===> Generating temporary packing list ===> Checking if databases/mysql41-server already installed ln: POSIX: Operation not supported *** Error code 1 Stop in /usr/ports/databases/mysql41-server. Did I miss out on something, or is this not going to work? Do I need to think in other ways? I have stress-tested this setup pretty well over the last 24 hours, with as many as 20 mountpoints using the same ports tree, with constant package building in each of them. This was impossible last time I played with unionfs, so it must have stabilized somewhat ;) Anyone? /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
5.4-panic
Hi folks, I have sinned, I have forgotten to configure a dump device. I do have a debug kernel compiled though (I think), so maybe someone can help me figure out what's happening here. Nothing in particular going on, server has been up for a few weeks. Dual opteron machine, running FreeBSD-amd64. Info below (uname -a, panic info and dmesg). /Eirik Version info: FreeBSD anduin.net 5.4-STABLE FreeBSD 5.4-STABLE #0: Tue May 3 11:19:51 CEST 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ANDUIN amd64 PANIC INFO: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x88 fault code = supervisor read, page not present instruction pointer = 0x8:0x803cd9e9 stack pointer = 0x10:0xa54f5a20 frame pointer = 0x10:0xa54f5a50 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 62 (pagedaemon) [thread pid 62 tid 100049 ] Stopped at thread_fini+0x89: subl0x88(%ebx),%eax db> where Tracing pid 62 tid 100049 td 0xff003dab0280 thread_fini() at thread_fini+0x89 zone_drain() at zone_drain+0x1e5 zone_foreach() at zone_foreach+0x4d uma_reclaim() at uma_reclaim+0x21 vm_pageout() at vm_pageout+0x5fc fork_exit() at fork_exit+0x8f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xa54f5d00, rbp = 0 --- db> ps pid proc uid ppid pgrp flag stat wmesgwchan cmd 52843 ff0005248ba00 52838 52838 0004000 [RUNQ] perl 52842 ff00387e9ba0 91 52840 52840 0004000 [CPU 1] python 52840 ff0003162ba0 91 52837 52840 0004000 [SLPQ wait 0xff0003162ba0][SLP] sh 52838 ff000fa50 52835 52838 0004000 [SLPQ wait 0xff000fa5][SLP] sh 52837 ff003bea40000 636 636 000 [SLPQ piperd 0xff001e0a7b40][SLP] cron 52835 ff00227fb0000 636 636 000 [SLPQ piperd 0xff0011be2000][SLP] cron 52824 ff00394758b8 1000 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 52822 ff002fdd7000 1000 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 52820 ff0037475ba0 1000 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 52818 ff001a5bb8b8 1000 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 52815 ff000f20fba0 1000 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 52813 ff0034d442e8 1000 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 52811 ff0023f9eba0 1000 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 52808 ff001a140ba00 1291 52808 0004100 [SLPQ sbwait 0xff0016cb39b8][SLP] ftpd 52806 ff0039475ba00 1291 52806 0004100 [SLPQ sbwait 0xff00081abe08][SLP] ftpd 52805 ff0035c5e2e80 1291 52805 0004100 [SLPQ sbwait 0xff000395c118][SLP] ftpd 52764 ff000328b2e8 1051 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 52730 ff0036e4c8b80 1248 1248 000 [SLPQ accept 0xff002c2108fe][SLP] perl5.8.6 52589 ff002b39c5d0 1027 52588 52589 0004002 [SLPQ ttyin 0xff0024801410][SLP] bash 52588 ff001a1402e8 1027 52585 52585 100 [SLPQ select 0x8082b2d0][SLP] sshd 52585 ff002b227ba00 609 52585 100 [SLPQ sbwait 0xff002075abe0][SLP] sshd 52548 ff002c6be5d00 1248 1248 000 [SLPQ accept 0xff002c2108fe][SLP] perl5.8.6 52387 ff00374758b8 1024 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 52299 ff00219515d00 1248 1248 000 [SLPQ accept 0xff002c2108fe][SLP] perl5.8.6 52275 ff001471d8b80 1248 1248 000 [SLPQ accept 0xff002c2108fe][SLP] perl5.8.6 46768 ff00117825d0 1001 46765 46768 0004002 [SLPQ ttyin 0xffa8b810][SLP] bash 46765 ff001b674000 1001 46749 46749 100 [SLPQ select 0x8082b2d0][SLP] sshd 46749 ff0037ab32e80 609 46749 100 [SLPQ sbwait 0xff003694f790][SLP] sshd 46699 ff0003dc32e8 6681 46695 46699 0004002 [SLPQ select 0x8082b2d0][SLP] pine 46695 ff001a44a2e8 6681 46694 46695 0004002 [SLPQ wait 0xff001a44a2e8][SLP] bash 46694 ff0005248000 6681 46689 46689 100 [SLPQ select 0x8082b2d0][SLP] sshd 46689 ff002eec15d00 609 46689 100 [SLPQ sbwait 0xff001ecca118][SLP] sshd 45600 ff001529d2e8 1001 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 45043 ff0034d44000 80 697 697 100 [SLPQ accept 0xff002cfef05e][SLP] httpd 43651 ff0019e05000 6682 744 744 0004000 [SLPQ select 0x8082b2d0][SLP] imapd 42697 ffdb1000 80 697 697 100 [SLPQ accept 0xff002cfef05e][SLP] httpd 42696 ff00086d78b8 80 697 697 100 [SLPQ accept 0xff002cfef05e][SLP] htt
NFS-related hang in 5.4?
Hi, when doing large file transfers (backing up jails using tar+gzip to a neighboring server), NFS has a tendency to lock up on me. This usually happens after quite a while - like a few hours or so. Also, before the hang, performance is generally bad. KDB trace: db> trace Tracing pid 56 tid 100064 td 0xc1a18600 kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30 siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1+0xe7 siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78 intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) at intr_execute_handlers+0x88 lapic_handle_intr(34) at lapic_handle_intr+0x3a Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc06b8490, esp = 0xd5480800, ebp = 0xd5480818 --- _mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0 udp_input(c2d4,14,c1a99000,1,0) at udp_input+0x257 ip_input(c2d4,0,0,0,0) at ip_input+0x590 transmit_event(c1c64100,2094,0,c1d58a80,7f4220) at transmit_event +0x107 ready_event_wfq(c1c64100,2094,0,c1d58a80,c06d860a) at ready_event_wfq+0x511 dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519 ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1 pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) at pfil_run_hooks+0x138 ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593 udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597 udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30 sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1 nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9 nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342 nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) at nfs_writerpc+0x2a0 nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508 nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xd5480d6c, ebp = 0 --- I cannot seem to kill process 56 (nfsiod), so I have to reset the box. Anyone got a clue? What can I do to ease debugging here? Next time it happens I can probably make a dump, at least I will have a debug kernel running then. /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS-related hang in 5.4?
On 19. jun. 2005, at 20.06, Robert Watson wrote: On Sun, 19 Jun 2005, Eirik Øverby wrote: when doing large file transfers (backing up jails using tar+gzip to a neighboring server), NFS has a tendency to lock up on me. This usually happens after quite a while - like a few hours or so. Also, before the hang, performance is generally bad. Hmm. Looks like a bug in dummynet. ipfw should not be directly re- injecting UDP traffic back into the input path from an outbound path, or it risks re-entering, generating lock order problems, etc. It should be getting dropped into the netisr queue to be processed from the netisr context. This problem would exist across all 5.4 installations, both i386 and amd64? Would it depend on heavy load, or could it theoretically happen at any time when there's traffic? All three of my fbsd5 servers (dual opteron, dual p3-1ghz, dual p3-700mhz) are experiencing random hangs with ~a few weeks between, impression is that if running single-cpu mode they are all stable. All using dummynet in a comparable manner. Ideas? Is it possible to configure dummynet out of your configuration, and see if the problem goes away? I'm running a test right now, will let you know in the morning. Robert N M Watson KDB trace: db> trace Tracing pid 56 tid 100064 td 0xc1a18600 kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30 siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1 +0xe7 siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78 intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) at intr_execute_handlers+0x88 lapic_handle_intr(34) at lapic_handle_intr+0x3a Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc06b8490, esp = 0xd5480800, ebp = 0xd5480818 --- _mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0 udp_input(c2d4,14,c1a99000,1,0) at udp_input+0x257 ip_input(c2d4,0,0,0,0) at ip_input+0x590 transmit_event(c1c64100,2094,0,c1d58a80,7f4220) at transmit_event+0x107 ready_event_wfq(c1c64100,2094,0,c1d58a80,c06d860a) at ready_event_wfq+0x511 dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519 ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1 pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) at pfil_run_hooks+0x138 ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593 udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597 udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30 sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1 nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9 nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342 nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) at nfs_writerpc+0x2a0 nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508 nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xd5480d6c, ebp = 0 --- I cannot seem to kill process 56 (nfsiod), so I have to reset the box. Anyone got a clue? What can I do to ease debugging here? Next time it happens I can probably make a dump, at least I will have a debug kernel running then. /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS-related hang in 5.4?
On 20. jun. 2005, at 10.38, Robert Watson wrote: On Mon, 20 Jun 2005, Eirik Øverby wrote: Hmm. Looks like a bug in dummynet. ipfw should not be directly re- injecting UDP traffic back into the input path from an outbound path, or it risks re-entering, generating lock order problems, etc. It should be getting dropped into the netisr queue to be processed from the netisr context. This problem would exist across all 5.4 installations, both i386 and amd64? Would it depend on heavy load, or could it theoretically happen at any time when there's traffic? All three of my fbsd5 servers (dual opteron, dual p3-1ghz, dual p3-700mhz) are experiencing random hangs with ~a few weeks between, impression is that if running single-cpu mode they are all stable. All using dummynet in a comparable manner. Ideas? Yes. Basically, the network stack avoids recursion in processing for "complicated" packets by deferring processing an offending packet to a thread called the 'netisr'. Whenever the stack reaches a possible recursion point on a packet, it's supposed to queue the packet for processing 'later' in a per-protocol queue, unwind, and then when the netisr runs, pick up and continue processing. In the stack trace you provide, dummynet appears to immediately immediately invoke the in-bound network path from the out-bound network path, walking back into the network stack from the outbound path. This is generally forbidden, for a variety of reasons: - We do allow the in-bound path to call the out-bound path, so that protocols like TCP, and services like NFS can turn around packets without a context switch. If further recursion is permitted, the stack may overflow. - Both paths may hold network stack locks over calls in either direction -- specifically, we allow protocol locks to be held over calls into the socket layer, as the protocol layer drives operation; if a recursive call is made, deadlocks can occur due to violating the lock order. This is what is happening in your case. Pretty much all network code is entirely architecture-independent, so bugs typically span architectures, although race conditions can sometimes be hard to reproduce if they require precise timing and multiple processors. So I'm lucky to have seen this one... Great ;) Is it possible to configure dummynet out of your configuration, and see if the problem goes away? I'm running a test right now, will let you know in the morning. Thanks. I know enough not to call this a "confirmation", but disabling dummynet did indeed allow me to finish the backup. I never made it past 15GBs before, now the full 19GB tar.gz file is done, and the boxes are both still running. The funny thing is - I only disabled dummynet on one of the boxes now - the source of the backup, the box that pushes data. The other box has pretty much 100% the same setup, and is also i386. But as traffic shaping can only happen on outgoing packets, I suppose that makes sense. I can try re-running the test again if you wish, in order to gain more statistics. It's just too bad it takes a while ;) /Eirik Robert N M Watson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: NFS-related hang in 5.4?
On 20. jun. 2005, at 17.18, Marc Olzheim wrote: On Mon, Jun 20, 2005 at 10:53:19AM +0200, Eirik verby wrote: I know enough not to call this a "confirmation", but disabling dummynet did indeed allow me to finish the backup. I never made it past 15GBs before, now the full 19GB tar.gz file is done, and the boxes are both still running. The funny thing is - I only disabled dummynet on one of the boxes now - the source of the backup, the box that pushes data. The other box has pretty much 100% the same setup, and is also i386. But as traffic shaping can only happen on outgoing packets, I suppose that makes sense. Hmm, does that solve kern/79208 for you as well by any chance ? Seems not. Now, how do I get my box back to life? ;) /Eirik Marc ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Network/fxp related panic in 5.4?
Hi all, I recently re-enabled SMP on one of my 5.4 servers (dual intel p3), and after a relatively short while (couple of days) it starts acting up. Today it was frozen and had jumped into kernel debugger on serial console. Problem is that my serial console was controlled by a terminal at work, and when I got home it seemed that the work terminal had disconnected. All I could do was a 'trace' - I don't have the panic screen (if any) nor do I have any other output because the watchdog triggered the powerswitch cycle just after I got the trace: Tracing pid 29 tid 10 td 0xc22a fxp_intr_body(c2404000,c2404000,40,,8) at fxp_intr_body+0xd0 fxp_intr(c2404000,0,0,0,0) at fxp_intr+0x14e ithread_loop(c22f6500,e3384d38,0,0,0) at ithread_loop+0x1b8 fork_exit(c06a9150,c22f6500,e3384d38) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe3384d6c, ebp = 0 --- db> What makes me wonder is ... When I connected the serial console, the db> prompt was already there. Does that mean that the work terminal disconnect somehow sent a telnet break, and triggered the kernel debugger? I.e. - this was no panic, but a stupid serial console hiccup? Is there any way to prevent this in the future - like changing the control character that would trigger the kernel debugger? (I have BREAK_TO_DEBUGGER in my kernel config..) Thanks, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Jails that won't die...
Hi, I have, since upgrading to 5.x and updating my management tools, seen a number of problems relating to stopping jails. I'm maintaining several hosts with a number of full-featured jails (i.e. full virtual FreeBSD installations in each jail), and in general this works fine. However, whenever I stop a jail using 'jexec kill -SIGNAL -1' or 'jexec /bin/sh /etc/rc.shutdown' (in various combinations), jails have a tendency to stick around for minutes or hours - according to 'jls'. Often I see an entry in 'netstat -a' indicating that there is one or more sockets in FIN_WAIT state, preventing the jail from coming down. Taking the virtual network interface (alias) down does not help. All I can do at this point is wait. I normally use 'jls' to determine whether or not a jail can be restarted (i.e. it's not running), but this is pretty useless in such cases. And right now I have a case where 'netstat -a' shows me nothing pertaining to the jail, though it has no processes running. I have therefore force-started the jail again, which seems to work nicely, but now 'jls' gives me two entries for this jail, with different JIDs. What am I doing wrong here? /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Jails that won't die...
On 28. jun. 2005, at 16.58, Brian Fundakowski Feldman wrote: On Tue, Jun 28, 2005 at 10:37:29AM +0200, Eirik Øverby wrote: Hi, I have, since upgrading to 5.x and updating my management tools, seen a number of problems relating to stopping jails. I'm maintaining several hosts with a number of full-featured jails (i.e. full virtual FreeBSD installations in each jail), and in general this works fine. However, whenever I stop a jail using 'jexec kill -SIGNAL -1' or 'jexec /bin/sh /etc/rc.shutdown' (in various combinations), jails have a tendency to stick around for minutes or hours - according to 'jls'. Often I see an entry in 'netstat -a' indicating that there is one or more sockets in FIN_WAIT state, preventing the jail from coming down. Taking the virtual network interface (alias) down does not help. All I can do at this point is wait. I normally use 'jls' to determine whether or not a jail can be restarted (i.e. it's not running), but this is pretty useless in such cases. And right now I have a case where 'netstat -a' shows me nothing pertaining to the jail, though it has no processes running. I have therefore force-started the jail again, which seems to work nicely, but now 'jls' gives me two entries for this jail, with different JIDs. What am I doing wrong here? You could just use ps to check for jailed processes and check their respective jails using the procfs status entry (at least according to the ps manpage...) My jailctl script can do both - list by jls and list by processes in the jail. There are NO processes running in the jail. /Eirik -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> [EMAIL PROTECTED] \ The Power to Serve! \ Opinions expressed are my own. \,,\ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Jails that won't die...
On 29. jun. 2005, at 20.58, Brian Fundakowski Feldman wrote: On Wed, Jun 29, 2005 at 03:28:09PM +0200, Eirik Øverby wrote: On 28. jun. 2005, at 16.58, Brian Fundakowski Feldman wrote: On Tue, Jun 28, 2005 at 10:37:29AM +0200, Eirik Øverby wrote: Hi, I have, since upgrading to 5.x and updating my management tools, seen a number of problems relating to stopping jails. I'm maintaining several hosts with a number of full-featured jails (i.e. full virtual FreeBSD installations in each jail), and in general this works fine. However, whenever I stop a jail using 'jexec kill -SIGNAL -1' or 'jexec /bin/sh /etc/rc.shutdown' (in various combinations), jails have a tendency to stick around for minutes or hours - according to 'jls'. Often I see an entry in 'netstat -a' indicating that there is one or more sockets in FIN_WAIT state, preventing the jail from coming down. Taking the virtual network interface (alias) down does not help. All I can do at this point is wait. I normally use 'jls' to determine whether or not a jail can be restarted (i.e. it's not running), but this is pretty useless in such cases. And right now I have a case where 'netstat -a' shows me nothing pertaining to the jail, though it has no processes running. I have therefore force-started the jail again, which seems to work nicely, but now 'jls' gives me two entries for this jail, with different JIDs. What am I doing wrong here? You could just use ps to check for jailed processes and check their respective jails using the procfs status entry (at least according to the ps manpage...) My jailctl script can do both - list by jls and list by processes in the jail. There are NO processes running in the jail. So it's obviously not running, and you can mark its state as such. ...which is what I do on FreeBSD 4.x, but on 5.x the 'jls' command still claims the jail is running. I think this is unbelieveably dirty. Also, using /proc to determine if a jail is still running is a bad idea, as mounting /proc is depreceated. /Eirik -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> [EMAIL PROTECTED] \ The Power to Serve! \ Opinions expressed are my own. \,,\ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Jails that won't die...
On 30. jun. 2005, at 22.56, Brian Fundakowski Feldman wrote: On Thu, Jun 30, 2005 at 03:53:56PM +0200, Eirik Øverby wrote: On 29. jun. 2005, at 20.58, Brian Fundakowski Feldman wrote: On Wed, Jun 29, 2005 at 03:28:09PM +0200, Eirik Øverby wrote: On 28. jun. 2005, at 16.58, Brian Fundakowski Feldman wrote: On Tue, Jun 28, 2005 at 10:37:29AM +0200, Eirik Øverby wrote: Hi, I have, since upgrading to 5.x and updating my management tools, seen a number of problems relating to stopping jails. I'm maintaining several hosts with a number of full-featured jails (i.e. full virtual FreeBSD installations in each jail), and in general this works fine. However, whenever I stop a jail using 'jexec kill -SIGNAL -1' or 'jexec /bin/sh /etc/ rc.shutdown' (in various combinations), jails have a tendency to stick around for minutes or hours - according to 'jls'. Often I see an entry in 'netstat -a' indicating that there is one or more sockets in FIN_WAIT state, preventing the jail from coming down. Taking the virtual network interface (alias) down does not help. All I can do at this point is wait. I normally use 'jls' to determine whether or not a jail can be restarted (i.e. it's not running), but this is pretty useless in such cases. And right now I have a case where 'netstat -a' shows me nothing pertaining to the jail, though it has no processes running. I have therefore force-started the jail again, which seems to work nicely, but now 'jls' gives me two entries for this jail, with different JIDs. What am I doing wrong here? You could just use ps to check for jailed processes and check their respective jails using the procfs status entry (at least according to the ps manpage...) My jailctl script can do both - list by jls and list by processes in the jail. There are NO processes running in the jail. So it's obviously not running, and you can mark its state as such. ...which is what I do on FreeBSD 4.x, but on 5.x the 'jls' command still claims the jail is running. I think this is unbelieveably dirty. Also, using /proc to determine if a jail is still running is a bad idea, as mounting /proc is depreceated. The deprecation is due to security concerns, not bit-rot. You can just mount it with root-readable-only permissions. The jls for current isn't incorrect, you're just expecting a different criteria to mean "alive" than it is using. It would take increased kernel complexity to do what you want if you're not going to do it in userland. I am aware of that. However, I have seen instabilities with /proc as well, but that's another story. Anyway, why aren't you just using a /var/run file in the "real" system to tell whether the jail is running or not? It's the corollary to pid files versus doing "killall"... Just seems like something really trivial to implement as you like it in the userland. Sure, this is what I fall back on when running my jailctl script (/ usr/ports/sysutils/jailctl) on 4.x. However, I NEED 'jls' to be correct, because I use it to inject other processes (like executing shutdown scripts inside the jails when taking them down, etc.). I suppose I could sort the output of jls on jail id and always use whichever instance of a jail has the highest ID, but I don't know how these IDs work - if they are recycled, if they "wrap around" at some point, etc. In any case it would be nice to know which criteria exactly jls uses - and perhaps a way to remove whichever criteria that keeps it thinking the jail is still running. Thing is - sometimes jails stop just fine. Other times they don't. It all depends. Perhaps I should get lsof or something, see if there are any open files (though I think I tried once without finding any)... /Eirik -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> [EMAIL PROTECTED] \ The Power to Serve! \ Opinions expressed are my own. \,,\ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD -STABLE servers repeatedly crashing.
On Jul 6, 2005, at 6:29 PM, Blaz Zupan wrote: On Wed, 6 Jul 2005, Kris Kennaway wrote: That should be OK as long as you're not cross-compiling for different architectures. No, we only have i386 boxes. Hi, thanks for doing this work. I was working on preparing a similiar set of information, but have been too overworked lately. We have ordered and had delivered a substansial number of DL380 (intel) and DL385 (amd64) machines, that will all be running FreeBSD. However, the recent reports about trouble on these systems has made me wary. Perhaps this will give FreeBSD the solution it needs (I've seen similiar issues on other SMP systems), and me the sleep I need before launch in September ;) Thanks again. Now just hoping it's helpful to someone ;) /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD 6.0-BETA1 Available
On Jul 15, 2005, at 5:10 PM, Emanuel Strobl wrote: Am Freitag, 15. Juli 2005 16:58 CEST schrieb Marc G. Fournier: And, for "the stupid question of the day" ... how long before 5.x is no longer supported? I'm just about to deploy a new server, and was *going* to go with 5.x, but would I be better just skipping 5.x altogether? Or are there such drastic changes in 6.x that doing so at this time wouldn't be prudent? To post my opinion to the last part of the question: I'm also deploying new servers and I'll take RELENG_6 since there are so many improovements (nullfs in jails etc.) and 6-current has been pretty stable for me on my Hoi, what's changed wrt jails? And nullfs? I haven't been following the "news" as closely as I perhaps should, but I feel that the jail functionality doesn't get half as much attention in release notes as it should... Porting my jail-related tools to 5.x from 4.x was painful, but enjoyable when I was done. How does 6.x look? /Eirik UP workstation with all kinds of new stuff enabled (ULE PREEMPTION), so I guess I won't see more troubles than with 5.4, I think less :) -Harry On Fri, 15 Jul 2005, Scott Long wrote: Announcement The FreeBSD Release Engineering Team is pleased to announce the availability of FreeBSD 6.0-BETA1, which marks the beginning of the FreeBSD 6.0 Release Cycle. FreeBSD 6.0 will be a much less dramatic step from the FreeBSD 5 branch than the FreeBSD 5 branch was from FreeBSD 4. Much of the work that has gone into 6.0 development has focused on polishing and improving the work from 5.x These changes include streamlining direct device access in the kernel, providing a multi-threaded SMP-safe UFS/VFS filesystem layer, implementing WPA and Host-AP 802.11 features, as well as countless bugfixes and device driver improvements. Major updates and improvements have been made to ACPI power and thermal management, ATA, and many aspects of the network infrastructure. 32bit application support for AMD64 is also greatly improved, as is compatiblity with certain Athlon64 motherboards. This release is also the first to feature experimental PowerPC support for the Macintosh G3 and G4 platforms. This BETA1 release is in the same basic format as the Monthly Snapshots. For most of the architectures only the ISO images are available though the FTP install tree is available for a couple of the architectures. We encourage people to help with testing so any final bugs can be identified and worked out. Availability of ISO images is given below. If you have an older system you want to update using the normal CVS/cvsup source based upgrade the branch tag to use is RELENG_6 (though that will change for the Release Candidates later). Problem reports can be submitted using the send-pr(1) command. The list of open issues and things still being worked on are on the todo list: http://www.freebsd.org/releases/6.0R/todo.html Since this is the first release of a new branch we only have a rough idea for some of the dates. The current rough schedule is available but most dates are still listed as "TBD - To Be Determined": http://www.freebsd.org/releases/6.0R/schedule.html Known Issues For the PowerPC architecture /etc/fstab isn't written out properly, so the first boot throws you into the mountroot> prompt. You will need to manually enter where the root partition is and fix /etc/fstab. Also the GEM driver is listed as 'unknown' in the network config dialog. For all architectures a kernel rebuild might be needed to get some FreeBSD 5 applications to run. Add "options COMPAT_FREEBSD5" to the kernel configuration file if you have problems with FreeBSD 5 executables. Availability The BETA1 ISOs and FTP support are available on most of the FreeBSD Mirror sites. A list of the mirror sites is available here: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/mirrors- ftp. html The MD5s are: MD5 (6.0-BETA1-alpha-bootonly.iso) = eabda0a086e5492fe43626ce5be1d7e1 MD5 (6.0-BETA1-alpha-disc1.iso) = d7fe900bb3d5f259cc3cc565c4f303e4 MD5 (6.0-BETA1-amd64-bootonly.iso) = 9b04cb2f68300071c717f4aa4220bdac MD5 (6.0-BETA1-amd64-disc1.iso) = cb0f21feaf8b7dd9621f82a8157f6ed8 MD5 (6.0-BETA1-amd64-disc2.iso) = 84d40bc291a9ed5cd69dfa717445eeb5 MD5 (6.0-BETA1-i386-bootonly.iso) = 38e0b202ee7d279bae002b883f7074ec MD5 (6.0-BETA1-i386-disc1.iso) = b2baa8c18d4637ef02822a0da6717408 MD5 (6.0-BETA1-i386-disc2.iso) = 2b151a3cea8843d322c75ff76779ffcf MD5 (6.0-BETA1-ia64-bootonly.iso) = 97800ec7d4b29927a8e66a2b53e987fb MD5 (6.0-BETA1-ia64-disc1.iso) = 7d29cd9317997136507078971762a0d8 MD5 (6.0-BETA1-ia64-livefs.iso) = 6ff974e60a3964cf16fcec05925c14e9 MD5 (6.0-BETA1-pc98-disc1.iso) = 40a3134cce89bd5f7033d8b9181edf91 MD5 (6.0-BETA1-powerpc-bootonly.iso) = 2f64974e9bd5adcf813f5d35ff742443 MD5 (6.0-BETA1-powerpc-disc1.iso) = b2562c38414ff4866f5ed8b3a38683c8 MD5 (6.0-BETA1-sparc64-booto
Serious issue with serial console in 5.4
Hi, I reported this before, but I am very surprised that it is still the case: (This is from the last time it happened; this time the box rebooted and cleared the serial console before I had time to cut/paste it. Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 00 fault virtual address = 0x1c fault code = supervisor write, page not present instruction pointer = 0x8:0xc0620b5f stack pointer = 0x10:0xdadbd988 frame pointer = 0x10:0xdadbd994 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 51999 (getty) trap number = 12 panic: page fault cpuid = 1 boot() called on cpu#0 Uptime: 66d11h24m50s The above panic will show up occasionally when logging out from a serial console (i.e. ctrl-D, logout, exit, whatever). This is EXTREMELY BAD, as it will crash an otherwise perfectly healthy box at random - and renders the serial console useless. Robert Watson confirmed this to be an issue on the 10th of April. Anyone?? /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Q: RT32 (Request Tracker) + jail
On Jul 20, 2005, at 2:22 AM, J. Nyhuis wrote: Greetings, I would like to have RT running in a jailed environment. The challenge, it seems, will be to get sendmail running in the same jailed environment as RT and the other components. For those not so familiar with the components of RT, the jail would include apache1.3+modperl, MySQL, sendmail, and RT. That's a lot of stuff to get working in there! (but fortunately FreeBSD jails seem straightforward and easy) ^_^ I expect sendmail to be the real problem of the above bunch. Has anyone actually tried to do this with a big multi-part app like RT (I have not spotted anyone's documented attempts on Google) and would be willing to share to the list? If I were you I would grab /usr/ports/sysutils/jailctl (ok, insert blatant self-praise here ;), create yourself one or more jails, and log into them as if they were normal fbsd installs. Everything you mention should work perfectly fine; I'm running anything between 5 and 50 jails of similiar types (with web, mail, database, cvs, subversion, you name it running in them, in various combinations) on both private and work-owned hosts, some of them performing extremely critical tasks (think CC payment handling for millions of users). Wouldn't worry about sendmail ;) Does anyone else wonder if I've lost it? (Don't answer that)... Not at all. /Eirik ^_^ Thanks, John H. Nyhuis Sr. Computer Specialist Dept. of Pediatrics HS RR349B, Box 356320 University of Washington Desk: (206)-685-3884 [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: TinyBSD Call For Testers
On Jul 18, 2005, at 8:17 PM, Jean Milanez Melo wrote: Hello gentlemen, In the last saturday a new port has been added under sysutils/ category, ports/sysutils/tinybsd. TinyBSD is a tool which was meant to allow an easy way to build embedded systems based on FreeBSD. It is based on userland copying, library dependencies check/copy and kernel build. We did our best to make the embedded system creation an easy and specially fast proccess. The main (default) system generates an embedded system image which is about 20MB in size, which is a very generic approach, with a number of wired NIC support, and also the most popular wireless support (including atheros), divert, bridge, dummynet, firewall, etc; and CPU_ELAN (for soekris devices). If the "generic" system gets tighten up the final result can be as low as an 8MB embedded system. We are giving you this intro to ask you please to test TinyBSD out, the most that you can, and send every possible feedback regarding it. The main tinybsd goal is to make embedded systems creation a process which must be 1 - fast 2 - easy 3 - 100% functional If you can test it, we would appreciate your thoughts. If you think any of those 3 goals can't be reached for you, or could be improved, also let me know. Thanks for testing Without having actually tried yet (time hasn't been very permitting lately), is it conceivable to use this tool to create slim-but- functional jails? Sans the kernel part, that is? /Eirik -- Atenciosamente Jean Milanez Melo FreeBSD Brasil LTDA. Fone: (31) 3281-9633 http://www.freebsdbrasil.com.br ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Panic when logging out from serial console
On Apr 10, 2005, at 1:42 PM, Robert Watson wrote: On Sun, 10 Apr 2005 [EMAIL PROTECTED] wrote: warning: This report might be somewhat vague. For quite a while now I`ve been plagued with the problem that logging out from a serial console causes the box to panic. For a while I`ve been sure this was isolated to one of my boxen, because it`s been acting up in other ways as well, but today it happened on two other boxes too! And these boxes have been rock stable for the last two years. I`m running a fairly recent variation of RELENG-5 on all the boxes; one of them is amd64, the two others - including the one I`ve pasted from - are plain old p3 machines. They are all dual- CPU though. I've seen precisely this panic -- in fact, I saw it yesterday on a RELENG_5 box, and under identical circumstances -- it looks like it happens if a last process in a login session on a serial console closes the tty, and then getty re-opens it while there's console output coming from syslog. I was able to get a core dump, but haven't made much headway on it yet. It looks like the tty structure has been released -- the refcount on the tty is 0, and the mutex pointers in the kqueue state have been cleared (hence the null pointer dereference you see). Now, the question is why -- I've added some debugging output to the local box I saw it on, and will see if I can reproduce it. Did you ever manage to reproduce - or fix - this? I had a rather nasty incident recently due to this, even on a very recently updated 5.4. I sent a message to stable@ about it a few days ago, but have received no response. I personally think that this must be very very (very) bad - serial consoles shouldn't do this! ;) /Eirik Robert N M Watson I have no clue what I can do from here; has anyone seen this before? I can`t always reproduce it, but the risk is fairly high - around 33% I`d say. Anyone? Thanks for your attention, details below. Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 00 fault virtual address = 0x1c fault code = supervisor write, page not present instruction pointer = 0x8:0xc0620b5f stack pointer = 0x10:0xdadbd988 frame pointer = 0x10:0xdadbd994 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 51999 (getty) trap number = 12 panic: page fault cpuid = 1 boot() called on cpu#0 Uptime: 66d11h24m50s /Eirik This message was sent using IMP, the Internet Messaging Program. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Serious issue with serial console in 5.4
On Jul 21, 2005, at 7:00 AM, Kris Kennaway wrote: On Mon, Jul 18, 2005 at 11:58:54AM +0200, Eirik ?verby wrote: Hi, I reported this before, but I am very surprised that it is still the case: (This is from the last time it happened; this time the box rebooted and cleared the serial console before I had time to cut/paste it. Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 00 fault virtual address = 0x1c fault code = supervisor write, page not present instruction pointer = 0x8:0xc0620b5f stack pointer = 0x10:0xdadbd988 frame pointer = 0x10:0xdadbd994 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 51999 (getty) trap number = 12 panic: page fault cpuid = 1 boot() called on cpu#0 Uptime: 66d11h24m50s The above panic will show up occasionally when logging out from a serial console (i.e. ctrl-D, logout, exit, whatever). This is EXTREMELY BAD, as it will crash an otherwise perfectly healthy box at random - and renders the serial console useless. Robert Watson confirmed this to be an issue on the 10th of April. Anyone?? You might have to wait until 6.0-R since fixing it seems to require infrastructure changes that cannot easily be backported to 5.x. With all due respect - if this is (and I'm assuming it is, because it happens on all the servers I'm serial-controlling) an omnipresent problem on 5.x, I daresay it should warrant some more attention. Having unsafe serial terminal support that can bring down your system like that defies much of the point of having serial terminal support in the first place. However, since I seem to be the only one who has noticed this, perhaps I'm the last person on earth to routinely use serial terminal switches instead of KVM switches to do my admin work? /Eirik Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Serious issue with serial console in 5.4
On Jul 21, 2005, at 12:16 PM, Robert Watson wrote: On Thu, 21 Jul 2005, Eirik Øverby wrote: The above panic will show up occasionally when logging out from a serial console (i.e. ctrl-D, logout, exit, whatever). This is EXTREMELY BAD, as it will crash an otherwise perfectly healthy box at random - and renders the serial console useless. Robert Watson confirmed this to be an issue on the 10th of April. You might have to wait until 6.0-R since fixing it seems to require infrastructure changes that cannot easily be backported to 5.x. With all due respect - if this is (and I'm assuming it is, because it happens on all the servers I'm serial-controlling) an omnipresent problem on 5.x, I daresay it should warrant some more attention. Having unsafe serial terminal support that can bring down your system like that defies much of the point of having serial terminal support in the first place. However, since I seem to be the only one who has noticed this, perhaps I'm the last person on earth to routinely use serial terminal switches instead of KVM switches to do my admin work? The concern about the 5.x backport is that it will break parts of the device driver ABI, and is a significant change that involves a lot of risk. Regarding the general prevalence of the problem -- I've seen a small number of people reporting it's a big problem. Since I know of a great many people running with serial consoles (other than a workstation, I never run FreeBSD boxes any other way), this leads me to believe it's something that shows up in fairly specific conditions -- perhaps relating to precise timing of a race condition. This means that if we introduce a generally destabilizing change, it may impact more people than the problem as it exists (a nasty trade-off). I've only seen the issue when logging out of a serial console session, and had previously hypothesized that it had to do with the simultaneous timing of a console message from syslog and the opening/closing of the console's tty due to logging out and getty restarting, resulting in a reference count improperly hitting zero. I did indeed make some changes to my syslog configuration after getting the serials online. Your theory might not be entirely off. Let me know if I should post my syslog.conf file or anything else here or elsewhere... Thanks, /Eirik I thought Doug White had come up with a work-around patch that prevented the reference count from being allowed to hit 0 for the console by artificially elevating it, which would prevent the panic, so either (a) the work around wasn't committed, or (b) it didn't work. I can attempt to take another look at this problem in a week or so, but have a number of things I need to finish up for FreeBSD 6.0 before then that will be occupying my time. Robert N M Watson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Serious issue with serial console in 5.4
On Jul 21, 2005, at 1:04 PM, Robert Watson wrote: On Thu, 21 Jul 2005, Eirik Øverby wrote: I've only seen the issue when logging out of a serial console session, and had previously hypothesized that it had to do with the simultaneous timing of a console message from syslog and the opening/closing of the console's tty due to logging out and getty restarting, resulting in a reference count improperly hitting zero. I did indeed make some changes to my syslog configuration after getting the serials online. Your theory might not be entirely off. Let me know if I should post my syslog.conf file or anything else here or elsewhere... Since you appear to be able to reliably reproduce the problem (whereas I was able to reproduce it only after several hours of quite active serial console work), it would be quite interesting to answer the following question: If you cause syslogd not to send any output to /dev/console, does the problem go away? I'm afraid to say it doesn't /Eirik Thanks, Robert N M Watson ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Apache2 just listening to https?
On Jul 28, 2005, at 8:58 AM, Roger Grosswiler wrote: Hi, I have apache2 running, with ssl. now, if i call my domain in a browser not using https, i cannot connect. Try adding port 80 to your Listen statement(s) in httpd.conf. Also make sure you have virtual hosts that capture requests on port 80. /Eirik ps aux shows this: root59847 0.0 4.3 7528 4544 ?? Ss5:34PM 0:12.11 /usr/local/sbin/httpd -DSSL www 59848 0.0 6.5 9368 6888 ?? I 5:34PM 0:03.80 /usr/local/sbin/httpd -DSSL www 59849 0.0 5.9 8856 6292 ?? I 5:34PM 0:01.92 /usr/local/sbin/httpd -DSSL www 59850 0.0 6.5 9364 6876 ?? I 5:34PM 0:04.55 /usr/local/sbin/httpd -DSSL www 59852 0.0 6.0 8880 6332 ?? I 5:34PM 0:01.60 /usr/local/sbin/httpd -DSSL www 59862 0.0 5.9 8852 6292 ?? I 5:37PM 0:03.14 /usr/local/sbin/httpd -DSSL www 59931 0.0 5.1 8072 5436 ?? S 5:49PM 0:02.60 /usr/local/sbin/httpd -DSSL www 59935 0.0 6.1 9312 6428 ?? I 5:50PM 0:01.89 /usr/local/sbin/httpd -DSSL www 60152 0.0 5.3 8168 5652 ?? I 6:41PM 0:00.39 /usr/local/sbin/httpd -DSSL www 60153 0.0 4.5 7728 4748 ?? I 6:41PM 0:00.55 /usr/local/sbin/httpd -DSSL www 60154 0.0 5.2 8100 5504 ?? I 6:41PM 0:00.31 /usr/local/sbin/httpd -DSSL does this mean, that my apache just runs in ssl-mode??? tcp46 0 0 *.https*.* LISTEN tcp46 0 0 *.http *.* LISTEN ...not really do i have to create a virtual server if i use ssl? Roger ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Apache2 just listening to https?
On Jul 28, 2005, at 10:01 AM, Roger Grosswiler wrote: Try adding port 80 to your Listen statement(s) in httpd.conf. Also make sure you have virtual hosts that capture requests on port 80. /Eirik i did a file called virtual.conf in /usr/local/etc/apache2/Include with this content: ServerName freebsd.domain.net ServerAlias freebsd.domain.net DocumentRoot /usr/local/www/data Make sure you are not enabling SSL globally, but for each vhost individually. Try the telnet trick mentioned by others, but simply type "GET / HTTP/ 1.0" -- it should give you something about trying to talk HTTP to a HTTPS server. Would explain why lynx/links aren't working. /Eirik ...which should be loaded on startup. Also, i activated NameVirtualHost *:80 in httpd.conf - still no success...whats up here? firewall is open, redirecting on router is well...but still no success... :-( Roger ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
5.4-dropping to debugger
Hi, every once in a while (about once a week lately), one of my servers has been known to stop responding. Upon connecting the serial console, I find myself at a debugger prompt. This is the output I've gotten this time. I do think I have a debug kernel on that machine, what can I do to get more useful information out? PS: I have seen various kinds of instability on most of my 5.4- installations, no matter the patchlevel. This box is just one of many. Anyone? /Eirik db> db> c Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x2007010 fault code = supervisor write, page not present instruction pointer = 0x8:0xc0581fe8 stack pointer = 0x10:0xe3384c40 frame pointer = 0x10:0xe3384c70 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 29 (irq18: fxp0) [thread pid 29 tid 10 ] Stopped at fxp_add_rfabuf+0x68:movw%ax,0xe(%ebx) db> trace Tracing pid 29 tid 10 td 0xc22a fxp_add_rfabuf(c2404000,c2404500,2,a6c54bb2,b51487f8) at fxp_add_rfabuf+0x68 fxp_intr_body(c2404000,c2404000,40,,8) at fxp_intr_body+0xf1 fxp_intr(c2404000,0,0,0,0) at fxp_intr+0x14e ithread_loop(c22f6500,e3384d38,0,0,0) at ithread_loop+0x1b8 fork_exit(c06a9150,c22f6500,e3384d38) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe3384d6c, ebp = 0 --- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.4-dropping to debugger
On Aug 31, 2005, at 8:28 PM, Kris Kennaway wrote: On Wed, Aug 31, 2005 at 12:51:00PM +0200, Eirik ?verby wrote: Hi, every once in a while (about once a week lately), one of my servers has been known to stop responding. Upon connecting the serial console, I find myself at a debugger prompt. This is the output I've gotten this time. I do think I have a debug kernel on that machine, what can I do to get more useful information out? See the chapter on kernel debugging in the developers' handbook. Sorry, poorly phrased question. Was in a bit of a hurry. I have a debug kernel, however I have no dump device (and cannot create one; I'm geom-mirroring my disks, and for some reason I'm not able to specify a dump device when that is the case (has been discussed in the past). I've been told that a debug kernel might still help, but the developers handbook does not say anything about what can be done without a dump. I know this has been up on one of the lists (current, stable or amd64) I'm on, so I guess I'll go ahead searching for it. Sorry about the noise. Was just hoping someone recognized the symptoms. /Eirik Kris PS: I have seen various kinds of instability on most of my 5.4- installations, no matter the patchlevel. This box is just one of many. Anyone? /Eirik db> db> c Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x2007010 fault code = supervisor write, page not present instruction pointer = 0x8:0xc0581fe8 stack pointer = 0x10:0xe3384c40 frame pointer = 0x10:0xe3384c70 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 29 (irq18: fxp0) [thread pid 29 tid 10 ] Stopped at fxp_add_rfabuf+0x68:movw%ax,0xe(%ebx) db> trace Tracing pid 29 tid 10 td 0xc22a fxp_add_rfabuf(c2404000,c2404500,2,a6c54bb2,b51487f8) at fxp_add_rfabuf+0x68 fxp_intr_body(c2404000,c2404000,40,,8) at fxp_intr_body+0xf1 fxp_intr(c2404000,0,0,0,0) at fxp_intr+0x14e ithread_loop(c22f6500,e3384d38,0,0,0) at ithread_loop+0x1b8 fork_exit(c06a9150,c22f6500,e3384d38) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe3384d6c, ebp = 0 --- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Centralized building
Hi all! I've spent about a week trying to accomplish a rather simple task: To build kernel and world once for each architecture we have, and distribute this precompiled src and obj tree via NFS to all the systems that need updating. I have combined this with a locally maintained CVS tree, in order to assure coherent releases being installed on all our systems. However, I am seeing some peculiar issues that I simply don't manage to get around. Scenario: I've got one server running 6.0-STABLE-i386. On this host I've created a jail for building. We have both i386 and amd64 platforms in- house, so I've created a script that build for both: make TARGET_ARCH=i386 MAKEOBJDIRPREFIX=/usr/obj.i386 buildworld make TARGET_ARCH=amd64 MAKEOBJDIRPREFIX=/usr/obj.amd64 buildworld And the same for buildkernel. Starting out trying to upgrade the amd64 hosts, I export the two obj directories via NFS, and mount them as /usr/obj on the amd64 hosts that need upgrading. This was, at least, my initial approach. I then found out that the /usr/src tree in the build jail is somehow tainted by the build (and by the options I specified), so I need to export that as well (which, I am afraid, means I have to maintain two different build jails). Therefore I also export /usr/src and mount it on the target hosts. I then realized that I need to use the same objdir on the target hosts as in the build jail, so I try mounting to /usr/obj. on the target hosts. This allows me to get somewhat further. Installworld now progresses for a while, until it bombs out with the following error: ===> sys/boot/i386/boot2 (install) dd if=/dev/zero of=boot2.ldr bs=276 count=1 dd: not found *** Error code 127 When looking for dd, I find it in the host PATH, and also in the obj dir: [EMAIL PROTECTED] /usr/obj.amd64# find . -name dd -type f ./amd64/usr/src/bin/dd/dd At this point, I get rid of the MAKEOBJDIRPREFIX option and rebuild everything with just TARGET_ARCH, only exporting /usr/obj from the build jail. I notice that when using TARGET_ARCH with something else than the architecture the build is running on (i.e. amd64 on an i386 host), the resulting build is NOT to be found in /usr/obj, but in / usr/obj/amd64. Thus I need to specify MAKEOBJDIRPREFIX=/usr/obj/amd64 on the target host for installworld to get anything done at all. I'm still getting the dd: not found error, and I do believe I've tried every combination and variation I can think of. Clocks are in sync between all the systems, so that is not the problem. Is the build system partially broken in 6.0? Have I missed something? Do I actually need an amd64 host to be able to build for amd64 systems, or are there other ways to accomplish what I'm trying to do? Should I prehaps try doing centralized binary upgrades instead? Any help would be appreciated. With best regards, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Centralized building
On Nov 19, 2005, at 13:28 , Joseph Koshy wrote: Starting out trying to upgrade the amd64 hosts, I export the two obj directories via NFS, and mount them as /usr/obj on the amd64 hosts that need upgrading. I done upgrades the other way, by having the build machine mount the clients to-be-root partition and installing to it using NFS. Would I have to export every filesystem mount point on the hosts then? Or does an -alldirs option do the trick (in exports)? In any case this would not be compatible with our security policy, unfortunately. /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Centralized building
On Nov 19, 2005, at 19:43 , Joseph Koshy wrote: AFAICT cross-compiling amd64 on a i386 machine isn't supported yet. I ran into a similar problem when I upgraded an i386 machine to amd64. I thought I could just set CPUTYPE=athlon-64 and buildworld would do the right thing. Apparently not. Bootstrapping a single machine is supported: # make buildworld TARGET_ARCH=new-arch plus a few other steps. (See build(7)). There have been a couple of postings on the mailing lists on this topic in the recent past. I've taken a stab at describing how to cross-bootstrap too: http://edoofus.blogspot.com/2005/10/cross-building-freebsd.html The OP wanted to do a 'buildworld TARGET_ARCH=foo' on one machine and then an 'installworld' on a different set of machines. Yes, and he still wonders if this is supposed to be doable or not. I think the culprit is (partly) the fact that every architecture is built into its own subdirectory in /usr/obj, EXCEPT the architecture the build is running on. The same goes for the install part, and if the build and install architectures differ, it cannot ever work. Setting MAKEOBJDIRPREFIX on the target host makes the install start, but it fails after a couple of minutes with the "dd: not found" error. (I do notice that there is a /usr/obj/usr directory created also when cross-building; I'm assuming this contains the build bootstrap tools). -- FreeBSD Volunteer, http://people.freebsd.org/~jkoshy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Centralized building
On Nov 20, 2005, at 09:50 , Eirik Øverby wrote: On Nov 19, 2005, at 19:43 , Joseph Koshy wrote: AFAICT cross-compiling amd64 on a i386 machine isn't supported yet. I ran into a similar problem when I upgraded an i386 machine to amd64. I thought I could just set CPUTYPE=athlon-64 and buildworld would do the right thing. Apparently not. Bootstrapping a single machine is supported: # make buildworld TARGET_ARCH=new-arch plus a few other steps. (See build(7)). There have been a couple of postings on the mailing lists on this topic in the recent past. I've taken a stab at describing how to cross-bootstrap too: http://edoofus.blogspot.com/2005/10/cross-building-freebsd.html The OP wanted to do a 'buildworld TARGET_ARCH=foo' on one machine and then an 'installworld' on a different set of machines. Yes, and he still wonders if this is supposed to be doable or not. I think the culprit is (partly) the fact that every architecture is built into its own subdirectory in /usr/obj, EXCEPT the architecture the build is running on. The same goes for the install part, and if the build and install architectures differ, it cannot ever work. Setting MAKEOBJDIRPREFIX on the target host makes the install start, but it fails after a couple of minutes with the "dd: not found" error. (I do notice that there is a /usr/obj/usr directory created also when cross-building; I'm assuming this contains the build bootstrap tools). Follow-up. If I enter src/sys and do a "make install", the dd step works perfectly - however it stops later when trying to install cdboot. I am assuming this is due to missing options or wrong target for make, but - from all I can tell - shows a weakness in the build/ install system. Or maybe not... Anyone?? /Eirik -- FreeBSD Volunteer, http://people.freebsd.org/~jkoshy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Reduced java/tomcat performance 6-beta3 -> 6-stable ?
Hi all, are there any obvious changes between 6.0-BETA3 and 6.0-RELEASE / 6.0- STABLE that I should be aware of, that could cause a quite noticeable decline in performance (and a change in performance patterns) for java/tomcat? On a BETA-3 system I'm seeing, with the particular application we're running, about 28 transactions/second over a 10 minute interval. With -RELEASE and -STABLE I'm lucky to reach 24, and it'll usually wobble around 20. Another oddity is that where the BETA-3 system starts out with good performance from the beginning when running load tests, the -RELEASE and -STABLE systems need a good 20 seconds to reach their "max", starting out very low (3-10 transactions/second for the first 10 seconds or so). This is on HP DL385 servers with dual 2.4ghz Opteron CPUs, running FreeBSD-amd64 from 15kRPM drives in cached RAID. Hardware and software configuration (apart from the base system), network configuration and latencies, database access, etc. is 100% equal on all systems. Any ideas? Thanks, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?
On Nov 28, 2005, at 14:45 , Joseph Koshy wrote: On 11/26/05, Eirik Øverby <[EMAIL PROTECTED]> wrote: EØ> [Cross-posting after lack of response on -stable] The first step would be do some performance debugging. Yep. - What do top/vmstat/systat say about what the OS and apps are doing? Is the CPU pegged at 100%? What's the load seen by the disks? Is the RAID in good health? vmstat during system idle times are found below. I think they are rather interesting. To your other questions: The CPU usage is comparable on both systems. Not pegged at 100%, but load seems to stabilize around 0.5. Disk load is minimal on the application servers, somewhat more on the database servers, but they are not interesting here (they are not the bottle neck, and they perform equally). The RAIDs are in good health on both systems. The vmstat output is interesting. From the "fast" system (6.0-BETA3, ~idle): [EMAIL PROTECTED] ~# vmstat -w 5 procs memory pagedisks faults cpu r b w avmfre flt re pi po fr sr da0 pa0 in sy cs us sy id 1 0 0 2439220 38048 14 0 0 0 14 0 0 0 170 141 437 0 0 100 0 0 0 2439220 380282 0 0 0 3 0 2 0 192 94 475 0 0 100 0 0 0 2439220 379161 0 0 0 6 0 1 0 291 925 926 5 0 94 0 0 0 2439220 379160 0 0 0 0 0 0 0 185 91 458 0 0 100 0 0 0 2439220 378201 0 0 0 6 0 3 0 289 1163 1124 6 0 94 0 0 0 2439220 378200 0 0 0 0 0 0 0 183 91 454 0 0 100 From the "slow" system (6.0-BETA3, ~idle): [EMAIL PROTECTED] ~# vmstat -w 5 procs memory pagedisks faults cpu r b w avmfre flt re pi po fr sr da0 pa0 in sy cs us sy id 0 0 1 2468180 51660 15 0 0 0 18 4 0 0 1048 3200 5130 0 0 100 0 0 0 2468180 516601 0 0 0 0 0 0 0 1004 3068 5063 0 0 100 0 0 0 2468180 516600 0 0 0 0 0 0 0 1003 3094 5057 0 0 100 0 0 0 2468180 516600 0 0 0 0 0 1 0 1005 3068 5065 0 0 100 0 0 0 2468180 516561 0 0 0 0 0 0 0 1002 3090 5054 0 1 99 0 0 0 2468180 516560 0 0 0 0 0 0 0 1002 3064 5053 0 0 100 *loads* more context switches than on the BETA-3 system. I have not yet tried this during load; I have to wait for the testing window for that. But perhaps this helps? What do I look for next? - Any unusual messages in /var/log/messages? Any errors shown by the network interfaces (I'm assuming the application is using the network). No errors shown that I can determine. - A brief description of the workload presented by the app would help. This is a web application (payment gateway) that receives a HTTP POST, does some processing, asks an external service for a piece of information, then returns the gathered information to the client. The call to the external service can be eliminated, but does not change the performance profile. How the application works internally is impossible for me to say; it's 3rd party. I can say, after asking them, that it is "moderately" threaded. Whatever "moderately" threaded. My interpretation is that the heaviest threading happens in tomcat itself, with up to 150 concurrent connection threads running. Thanks, /Eirik -- FreeBSD Volunteer, http://people.freebsd.org/~jkoshy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?
On Nov 28, 2005, at 15:54 , Joseph Koshy wrote: EØ> *loads* more context switches than on the BETA-3 system. EØ> I have not yet tried this during load - Which scheduler have you configured (BSD or ULE)? Running GENERIC/SMP kernels, with BSD scheduler. Speaking of which; is there a way to extract the kernel configuration from a running kernel or kernel binary? - What do the interrupt statistics show? Any interrupt storms? Please check the mailing lists for a prior discussion on interrupt storms on some motherboards. Slow system: interrupt total rate irq1: atkbd0 4 0 irq14: ata0 46 0 irq24: ciss0 337166 1 irq28: bge0 8038794 35 cpu0: timer446869052 1999 cpu1: timer446861051 1999 Total 902106113 4037 Fast system: interrupt total rate irq1: atkbd0 6 0 irq14: ata0 46 0 irq24: ciss0 7465831 1 irq28: bge0 20764380 2 lapic0: timer14827978729 2000 lapic1: timer14827970729 2000 Total29684179721 4003 No significant differences I'd say. Anything else I can do to dig deeper? - Could you post the dmesg output from the systems (I presume there aren't any significant differences). dmesg from slow system follows. I do not have a dmesg for the fast system; I cannot boot it now either. However, I have compared them before, and they are 100% equal. Seems to be very close in serial numbers, probably same production run. Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-STABLE #0: Sat Nov 26 01:52:00 CET 2005 [EMAIL PROTECTED]:/usr/obj/amd64/usr/src/sys/SMP Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Opteron(tm) Processor 250 (2405.47-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x20f51 Stepping = 1 Features=0x78bfbffMCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> Features2=0x1 AMD Features=0xe2500800,LM,3DNow+,3DNow> real memory = 1073717248 (1023 MB) avail memory = 1024946176 (977 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-27 on motherboard ioapic2 irqs 28-31 on motherboard ioapic3 irqs 32-35 on motherboard ioapic4 irqs 36-39 on motherboard acpi0: on motherboard acpi0: Power Button (fixed) pci_link0: irq 5 on acpi0 pci_link1: irq 7 on acpi0 pci_link2: irq 0 on acpi0 pci_link3: irq 3 on acpi0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0 cpu0: on acpi0 cpu1: on acpi0 pcib0: on acpi0 pci0: on pcib0 pcib1: at device 3.0 on pci0 pci1: on pcib1 ohci0: mem 0xf7df-0xf7df0fff irq 19 at device 0.0 on pci1 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 3 ports with 3 removable, self powered ohci1: mem 0xf7de-0xf7de0fff irq 19 at device 0.1 on pci1 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: SMM does not respond, resetting usb1: on ohci1 usb1: USB revision 1.0 uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 3 ports with 3 removable, self powered pci1: at device 2.0 (no driver attached) pci1: at device 2.2 (no driver attached) pci1: at device 3.0 (no driver attached) isab0: at device 4.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x2000-0x200f at device 4.1 on pci0 ata0: on atapci0 ata1: on atapci0 pci0: at device 4.3 (no driver attached) pcib2: at device 7.0 on pci0 pci2: on pcib2 ciss0: port 0x5000-0x50ff mem 0xf7ef-0xf7ef1fff,0xf7e8-0xf7eb irq 24 at device 4.0 on pci2 ciss0: [GIANT-LOCKED] pci0: at device 7.1 (no driver attached) pcib3: at device 8.0 on pci0 pci3: on pcib3 bge0: mem 0xf7ff-0xf7ff irq 28 at device 6.0 on pci3 miibus0: on bge0 brgphy0: on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:13:21:b3:c1:f8 bge1: mem 0xf7fe-0xf7fe irq 29 at device 6.1 on pci3 miibus1: on bge1 brgphy1: on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: Ethernet address: 00:13:21:b3:c1:f7 pci0: at device 8.1 (no driver attac
Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?
Follow-up: I've now ran vmstat during load, which confirms the findings of vmstat during idle time. Slow system - one sample before and after load start included: procs memory pagedisks faults cpu r b w avmfre flt re pi po fr sr da0 pa0 in sy cs us sy id 3 0 0 2468572 45476 14 0 0 0 18 4 0 0 1049 3201 5132 0 0 100 0 0 1 2468572 423881 0 0 0 154 0 5 0 6852 19813 19970 22 8 70 1 0 0 2468572 393321 0 0 0 155 0 11 0 6823 19661 19886 23 7 71 2 0 0 2468432 363361 0 0 0 160 0 6 0 7031 20356 20534 19 7 74 0 0 0 2468432 332281 0 0 0 156 0 5 0 6685 19420 19613 20 7 73 2 0 0 2468432 299281 0 0 0 164 0 5 0 7105 20483 20673 21 7 71 1 0 0 2468432 535681 0 0 0 153 1308 5 0 6688 19278 19537 21 8 72 1 0 1 2468432 505802 0 0 0 150 0 6 0 6408 18430 18693 24 7 69 0 0 0 2468432 477482 0 0 0 143 0 6 0 6323 18098 18328 26 7 67 0 0 0 2468432 450561 0 0 0 136 0 5 0 5607 17122 17062 16 7 77 0 0 0 2468432 450400 0 0 0 0 0 0 0 1093 3172 5164 0 0 100 Fast system: procs memory pagedisks faults cpu r b w avmfre flt re pi po fr sr da0 pa0 in sy cs us sy id 0 0 0 2439276 397081 0 0 0 6 0 1 0 281 1029 992 6 1 93 0 0 0 2439276 393807 0 0 0 16 0 1 0 665 1341 1714 2 1 98 0 0 0 2439276 364725 0 0 0 145 0 6 0 5569 12409 14821 21 7 72 0 0 0 2439276 335121 0 0 0 149 0 5 0 5862 12597 15532 15 6 79 0 0 0 2439276 306001 0 0 0 146 0 4 0 5682 12655 15102 19 7 74 2 0 0 2439276 541441 0 0 5 152 1310 10 0 6006 12908 15964 17 6 77 0 0 0 2439276 511762 0 0 0 151 0 7 0 5348 11899 14190 22 6 72 2 0 0 2439276 48104 98 0 0 0 248 0 5 0 5924 12889 15757 15 7 78 1 0 0 2439276 451721 0 0 0 147 0 5 0 5882 12660 15624 16 7 77 2 0 0 2439276 422761 0 0 0 145 0 5 0 5558 12477 14864 21 6 73 0 0 0 2439276 393001 0 0 0 149 0 5 0 5842 12660 15556 14 7 79 0 0 0 2439276 363481 0 0 0 150 0 8 0 5659 12562 15042 21 5 74 0 0 0 2439276 334041 0 0 0 150 0 7 0 5868 12642 15536 14 6 80 0 0 0 2439276 305881 0 0 0 142 0 6 0 5449 11961 14487 19 7 74 0 0 0 2439276 305880 0 0 0 0 0 0 0 227 246 565 0 0 100 I'm tempted to upgrade the fast system to 6-STABLE (same rev as the slow one). Even the slow system performs "adequately", though it might help me isolate any potential hardware differences. /Eirik On Nov 28, 2005, at 15:54 , Joseph Koshy wrote: EØ> *loads* more context switches than on the BETA-3 system. EØ> I have not yet tried this during load - Which scheduler have you configured (BSD or ULE)? - What do the interrupt statistics show? Any interrupt storms? Please check the mailing lists for a prior discussion on interrupt storms on some motherboards. - Could you post the dmesg output from the systems (I presume there aren't any significant differences). Please CC -stable too. -- FreeBSD Volunteer, http://people.freebsd.org/~jkoshy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?
Hi, I think I have found the culprit. There must be some sort of difference between the machines after all (BIOS revision?), because while on one machine the interrupt rate for the bge card stays very low (2 to be exact) during maximum load, the other machine goes beyond 1000 and keeps rising constantly. This might also explain why performance slowly degrades over time on that machine, and response times vary wildly, while the "fast" machine responds nicely within 1-2 seconds no matter the load and testing time. I will have to investigate this more closely. Is there a way to force the NIC to polling mode (I'm assuming that is the difference, an IRQ rate of 2 is too low for a heavily loaded server if the NIC is interrupt-driven)? Anything else I could look at? Also, the interrupt rates for the CPUs stay at 2000 sharp on the fast system, but fluctuates somewhat on the other. /Eirik On Nov 28, 2005, at 15:54 , Joseph Koshy wrote: EØ> *loads* more context switches than on the BETA-3 system. EØ> I have not yet tried this during load - Which scheduler have you configured (BSD or ULE)? - What do the interrupt statistics show? Any interrupt storms? Please check the mailing lists for a prior discussion on interrupt storms on some motherboards. - Could you post the dmesg output from the systems (I presume there aren't any significant differences). Please CC -stable too. -- FreeBSD Volunteer, http://people.freebsd.org/~jkoshy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?
Firmware versions are equal. BIOS settings are equal. However, a diff of the dmesgs show (apart from MAC address differences): 30c30 < Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 --- > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 What on earth is that all about? The "slow" box has the ACPI-fast timecounter... /Eirik On Nov 28, 2005, at 22:14 , Kris Kennaway wrote: On Mon, Nov 28, 2005 at 09:54:30PM +0100, Eirik ?verby wrote: Hi, I think I have found the culprit. There must be some sort of difference between the machines after all (BIOS revision?), because while on one machine the interrupt rate for the bge card stays very low (2 to be exact) during maximum load, the other machine goes beyond 1000 and keeps rising constantly. This might also explain why performance slowly degrades over time on that machine, and response times vary wildly, while the "fast" machine responds nicely within 1-2 seconds no matter the load and testing time. I will have to investigate this more closely. Is there a way to force the NIC to polling mode (I'm assuming that is the difference, an IRQ rate of 2 is too low for a heavily loaded server if the NIC is interrupt-driven)? Anything else I could look at? BIOS update. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?
Update: The diff below was made after making sure both systems are running the exact same kernel. Behavior is the same. Building new kernels (6-STABLE) now to get out of the BETA stage. /Eirik On Nov 28, 2005, at 22:53 , Eirik Øverby wrote: Firmware versions are equal. BIOS settings are equal. However, a diff of the dmesgs show (apart from MAC address differences): 30c30 < Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 --- > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 What on earth is that all about? The "slow" box has the ACPI-fast timecounter... /Eirik On Nov 28, 2005, at 22:14 , Kris Kennaway wrote: On Mon, Nov 28, 2005 at 09:54:30PM +0100, Eirik ?verby wrote: Hi, I think I have found the culprit. There must be some sort of difference between the machines after all (BIOS revision?), because while on one machine the interrupt rate for the bge card stays very low (2 to be exact) during maximum load, the other machine goes beyond 1000 and keeps rising constantly. This might also explain why performance slowly degrades over time on that machine, and response times vary wildly, while the "fast" machine responds nicely within 1-2 seconds no matter the load and testing time. I will have to investigate this more closely. Is there a way to force the NIC to polling mode (I'm assuming that is the difference, an IRQ rate of 2 is too low for a heavily loaded server if the NIC is interrupt-driven)? Anything else I could look at? BIOS update. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?
On Nov 29, 2005, at 10:15 , Kris Kennaway wrote: On Tue, Nov 29, 2005 at 09:46:09AM +0100, Eirik Oeverby wrote: On Mon, 28 Nov 2005, Kris Kennaway wrote: On Mon, Nov 28, 2005 at 10:53:00PM +0100, Eirik ?verby wrote: Firmware versions are equal. BIOS settings are equal. However, a diff of the dmesgs show (apart from MAC address differences): 30c30 < Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 --- Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 What on earth is that all about? The "slow" box has the ACPI-fast timecounter... Could be ACPI bugs on your system: Yes, but the other system is 100% equal - hardware, bios config, bios and bootblock revision, controller bioses, etc. etc. It all matches. Clearly they're not 100% equal, but (100-epsilon)%. Your job is to identify the origin of the epsilon :-) Yea yea ;) Working on it.. Is there a way to force ACPI-safe on the slower system? /Eirik Should I complain to HP? If you think you'll get anywhere, it might be worth pursuing. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?
On Nov 29, 2005, at 10:44 , Joseph Koshy wrote: EØ> Yea yea ;) Working on it.. EØ> Is there a way to force ACPI-safe on the slower system? # sysctl kern.timecounter.hardware= kern.timecounter.choice: TSC(-100) ACPI-fast(1000) i8254(0) dummy (-100) ACPI-safe is not among the choices. Which means I can't choose it, I presume. I'm compiling up new kernels with ACPI_DEBUG right now, once they are installed, what can I do to determine differences in DSDT tables etc.? Or whatever else is different? /Eirik -- FreeBSD Volunteer, http://people.freebsd.org/~jkoshy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?
On Nov 29, 2005, at 11:37 , Kris Kennaway wrote: On Tue, Nov 29, 2005 at 10:25:07AM +0100, Eirik ?verby wrote: On Nov 29, 2005, at 10:15 , Kris Kennaway wrote: On Tue, Nov 29, 2005 at 09:46:09AM +0100, Eirik Oeverby wrote: On Mon, 28 Nov 2005, Kris Kennaway wrote: On Mon, Nov 28, 2005 at 10:53:00PM +0100, Eirik ?verby wrote: Firmware versions are equal. BIOS settings are equal. However, a diff of the dmesgs show (apart from MAC address differences): 30c30 < Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 --- Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 What on earth is that all about? The "slow" box has the ACPI-fast timecounter... Could be ACPI bugs on your system: Yes, but the other system is 100% equal - hardware, bios config, bios and bootblock revision, controller bioses, etc. etc. It all matches. Clearly they're not 100% equal, but (100-epsilon)%. Your job is to identify the origin of the epsilon :-) Yea yea ;) Working on it.. Is there a way to force ACPI-safe on the slower system? I think someone already mentioned this..see the kern.timecounter.hardware and other kern.timecounter sysctls. I have now forced ACPI-safe on the slow system, to match the fast one. Too bad though, it made absolutely zero difference. I'm upgrading BIOSes on both boxes now, even though they seem equal. Then I'll see what ACPI debug output shows me. If you have any other hints or ideas, please let me know... thanks so far. /Eirik Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?
On Dec 1, 2005, at 04:12 , Michael Vince wrote: Some apps that use of frequent queries of the system time for example MySQL are well known in FreeBSD to be slower then Linux because its more expensive to call compared to Linux, maybe Tomcat is also another such app this can also be double the case depending on on your jsp and servlet code. True, but on equal hardware it should perform equally. If you are on good hardware, are using 6 and keep your systems time updated via ntp you might want to try changing from kern.timecounter.hardware: ACPI-fast to TSC(-100) and doing a benchmark this has already proven to increase performance of MySQL by a significantly amount. I will try this, though it will not solve my original problem (and the subject is somewhat misleading now, as this seems to be independent of kernel revisions). Also some new experimental low-precision time code has been added to current source tree to see how much performance increases can be gained, weirdly enough some people have argued against it for I guess a wide range of reasons such as they just have crap hardware and don't care about performance, don't like the extra maintenance of code or just like Red Hat fanatics having an easy way to bad mouth FreeBSD performance. I think most people would agree though that it has to be done, or have to choose to believe FreeBSD isn't about performance among other goals. I will not join this discussion ;) With 6 you can also use the new thr threading library, try your libmap.conf to libthr for testing, for example [/usr/local/jdk1.4.2/] libpthread.so.2 libthr.so.2 libpthread.so libthr.so I been doing some 'ab' testing libthr with Apache2 compiled for worker MPM and have some really interesting differences on server load, loads of about 40 for pthread and around 5 thr under certain tests with ab with the exact same test. Too bad this causes jdk1.5.0-amd64 to crash... Application startup times were significantly reduced, but only the times it actually managed to start without failing. Latest at the 2nd or 3rd transaction Java coredumps. :( And as current load testing is done without Apache in between, this is moot.. /Eirik Mike Eirik Øverby wrote: Update: The diff below was made after making sure both systems are running the exact same kernel. Behavior is the same. Building new kernels (6-STABLE) now to get out of the BETA stage. /Eirik On Nov 28, 2005, at 22:53 , Eirik Øverby wrote: Firmware versions are equal. BIOS settings are equal. However, a diff of the dmesgs show (apart from MAC address differences): 30c30 < Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 --- > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 What on earth is that all about? The "slow" box has the ACPI- fast timecounter... /Eirik On Nov 28, 2005, at 22:14 , Kris Kennaway wrote: On Mon, Nov 28, 2005 at 09:54:30PM +0100, Eirik ?verby wrote: Hi, I think I have found the culprit. There must be some sort of difference between the machines after all (BIOS revision?), because while on one machine the interrupt rate for the bge card stays very low (2 to be exact) during maximum load, the other machine goes beyond 1000 and keeps rising constantly. This might also explain why performance slowly degrades over time on that machine, and response times vary wildly, while the "fast" machine responds nicely within 1-2 seconds no matter the load and testing time. I will have to investigate this more closely. Is there a way to force the NIC to polling mode (I'm assuming that is the difference, an IRQ rate of 2 is too low for a heavily loaded server if the NIC is interrupt-driven)? Anything else I could look at? BIOS update. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ZERO_COPY_SOCKETS
On Dec 6, 2005, at 03:20 , Joshua Coombs wrote: #optionsZERO_COPY_SOCKETS What's the status of this in 6.0-R and 6-stable? The idea of avoiding memory copies when possible seems really appealing for my 386, on which any little boost is significant. : ) Hoi, let me know how you got 6.0 running on i386 .. It sounds like the perfect way to spend some of the holidays ;) I've heard it won't run on a plain i386 out-of-the-box, what did you do to convince it to run? /Eirik Joshua Coombs ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Panic on logout in serial console
Hi all, I'm pretty aggravated right now. At exactly the wrong moment my spinal reflexes kicked in and I logged out from my serial console session on an important server. BANG! Kernel panic. This has been reported numerous times before, so I won't bother giving you the specifics right now - had to boot it immediately to come back up anyway, so no time. Anyway - I was told at some point that this would be fixed in 6.x - but backporting the fix to 5.x would be hard to impossible. Guess what: It's still not fixed. And it's a really really (did I say really?) bad thing, at least as far as I'm concerned. Any fixes in the pipeline here? I'd be happy to help testing any patches.. I'm mostly pissed with myself though - I knew about this and usually never log out. But when under pressure, habits kick in. :P With very frustrated regards, /Eirik ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Obscure errors in dmsg, system instability
On 25. Sep 2004, at 22:29, Doug White wrote: On Thu, 23 Sep 2004, [ISO-8859-1] Eirik Øverby wrote: On 23. Sep 2004, at 04:15, Doug White wrote: Is something sharing an interrupt with that device? PCI bus errors are generally Bad News .. either some device or the mobo is inroducing errors. Well.. Yes, there is some interrupt sharing. Relevant parts of dmsg: [EMAIL PROTECTED] ~$ dmesg | grep "irq 2" IOAPIC #0 intpin 19 -> irq 2 uhci0: port 0xd400-0xd41f irq 2 at device 4.2 on pci0 ahc0: port 0xd000-0xd0ff mem 0xe200-0xe2000fff irq 2 at device 6.0 on pci0 amr0: mem 0xe300-0xe300 irq 2 at device 9.1 on pci0 Apparently one of these devices doesn't like getting an interrupt when there's no data pending. It might be a FreeBSD driver bug, but being a 3-way share it'll make it hard to untangle. In that case, building a kernel without the adaptec driver might actually resolve the problem for now? I don't like the fact that the LSI and the Adaptec are sharing IRQs, given that the LSI is the main system drive controller (which is why I don't use the Adaptec at all - and it cannot be disabled in BIOS I think!)... I should perhaps try to reallocate some of the IRQs, but I don't really have a clue how to do that, since I have no VGA in that box.. Ohwell, I guess I just have to rip it open ;) Yah .. rearrange the cards in the slots and see what you can convince it to do. That's the plan. Thanks!! /Eirik ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
NO_YP_LIBC breaks 4-STABLE buildworld
PS: I've posted a similiar mail to @current, but not a dupe ;) Hi! For some time I've been wanting to use NO_YP_LIBC with buildworld for my jails, to enable NIS on the host system but keep the jails functioning. I noticed back in August that a patch was submitted to make this work on then-CURRENT: http://lists.freebsd.org/pipermail/freebsd-arch/2004-August/002550.html Sadly, when compiling 4-STABLE (as of 45 minutes ago), buildworld gives me the following errors: ...snip snip... ===> libexec/mknetid cc -O -pipe -march=pentiumpro-c /usr/src/libexec/mknetid/mknetid.c cc -O -pipe -march=pentiumpro-c /usr/src/libexec/mknetid/hash.c cc -O -pipe -march=pentiumpro-c /usr/src/libexec/mknetid/parse_group.c gzip -cn /usr/src/libexec/mknetid/netid.5 > netid.5.gz gzip -cn /usr/src/libexec/mknetid/mknetid.8 > mknetid.8.gz cc -O -pipe -march=pentiumpro -o mknetid mknetid.o hash.o parse_group.o mknetid.o: In function `main': mknetid.o(.text+0xdc): undefined reference to `yp_get_default_domain' *** Error code 1 1 error *** Error code 2 1 error *** Error code 2 1 error *** Error code 2 1 error *** Error code 2 1 error This feature would be very useful, and it is sad to see that it has once been in the tree but that it does not work any longer. Here's to hoping someone can look into it (Bjoern, are you reading this? ;) Thanks, /Eirik ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"