Re: Why is intr taking up so much cpu?
Doug, could you please show your timer configuration, part of devinfo -u that describes interrupts and top of the output of top -SPH (including the header) when high interrupt load strikes? P.S. I saw output of top -SH, but I have a reason to be curious about top -SPH. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
- Original Message > From: Hans Petter Selasky > To: PseudoCylon > Cc: freebsd-current@freebsd.org; Sam Leffler ; >freebsd-...@freebsd.org > Sent: Tue, July 20, 2010 4:46:34 AM > Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers > > On Tuesday 20 July 2010 12:03:22 PseudoCylon wrote: > > - Original Message > > > > > From: Hans Petter Selasky > > > To: freebsd-current@freebsd.org > > > Cc: PseudoCylon ; Sam Leffler ; > > > > > >freebsd-...@freebsd.org > > > > > > Sent: Mon, July 19, 2010 1:17:04 PM > > > Subject: Re: [panic] Race in IEEE802.11 layer towards device drivers > > > > > > Hi AK, > > > > > > I've committed your patches to USB P4. I've made some additional > > > patches. > > > > > > Can you check and verify everything? > > > > > > http://p4web.freebsd.org/@@181189?ac=10 > > > > Hi > > > > If we change sc->cmdq_run = RUN_CMDQ_ABORT, > > > > -- begin excerpt -- > > > > > > @@ -4890,7 +4877,10 @@ run_stop(void *arg) > > ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); > > > > sc->ratectl_run = RUN_RATECTL_OFF; > > -sc->cmdq_run = RUN_CMDQ_ABORT; > > + > > +RUN_CMDQ_LOCK(sc); > > +sc->cmdq_run = sc->cmdq_key_set = RUN_CMDQ_ABORT; > > +RUN_CMDQ_UNLOCK(sc); > > > > -- end excerpt -- > > > > > > we also need to change this, otherwise key will be cleared. > > Ok. > > Try to give the second mutex a different name, and see how many warnings go > away. > > --HPS > Giving different name makes all of "duplicate lock" warnings away. Here is the patch includes all changes -- begin patch -- diff --git a/dev/usb/wlan/if_run.c b/dev/usb/wlan/if_run.c index 017e4b0..da22077 100644 --- a/dev/usb/wlan/if_run.c +++ b/dev/usb/wlan/if_run.c @@ -549,7 +549,7 @@ run_attach(device_t self) mtx_init(&sc->sc_mtx, device_get_nameunit(sc->sc_dev), MTX_NETWORK_LOCK, MTX_DEF); mtx_init(&sc->sc_cmdq_mtx, device_get_nameunit(sc->sc_dev), -MTX_NETWORK_LOCK, MTX_DEF); +"command queue", MTX_DEF); iface_index = RT2860_IFACE_INDEX; @@ -4670,8 +4670,6 @@ run_init_locked(struct run_softc *sc) if(ic->ic_nrunning > 1) return; -run_stop(sc); - for (ntries = 0; ntries < 100; ntries++) { if (run_read(sc, RT2860_ASIC_VER_ID, &tmp) != 0) goto fail; -- end patch -- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: firefox is stuck in getbuf()
On Tue, 2010-07-20 at 16:29 +0300, Kostik Belousov wrote: > On Tue, Jul 20, 2010 at 10:58:00AM +0800, David Xu wrote: > > With newest -HEAD code, firefox is stuck in getbuf(). > > > > top > > > > last pid: 1814; load averages: 0.00, 0.05, 0.07 > > > > up 0+00:37:11 10:54:01 > > 135 processes: 1 running, 134 sleeping > > CPU: 3.7% user, 0.0% nice, 0.6% system, 0.0% interrupt, 95.7% idle > > Mem: 259M Active, 393M Inact, 151M Wired, 1484K Cache, 111M Buf, 186M Free > > Swap: 2020M Total, 2020M Free > > > > PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU > > COMMAND > > 1427 davidxu 1 450 114M 101M select 0 1:24 0.29% Xorg > > 1588 davidxu 10 440 279M 145M getbuf 0 2:15 0.00% > > firefox-bin > > > > > > procstat -k 1588 > > PIDTID COMM TDNAME KSTACK > > > > 1588 100200 firefox-bin initial thread mi_switch sleepq_switch > > sleepq_wait _sleep getdirtybuf flush_deplist softdep_sync_metadata > > ffs_syncvnode ffs_fsync VOP_FSYNC_APV fsync syscallenter syscall > > Xint0x80_syscall > > 1588 100207 firefox-bin -mi_switch sleepq_switch > > sleepq_catch_signals sleepq_wait_sig _cv_wait_sig seltdwait poll > > syscallenter syscall Xint0x80_syscall > > 1588 100208 firefox-bin -mi_switch sleepq_switch > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op > > syscallenter syscall Xint0x80_syscall > > 1588 100209 firefox-bin -mi_switch sleepq_switch > > sleepq_catch_signals sleepq_timedwait_sig _sleep __umtx_op_cv_wait > > _umtx_op syscallenter syscall Xint0x80_syscall > > 1588 100210 firefox-bin -mi_switch sleepq_switch > > sleepq_catch_signals sleepq_timedwait_sig _sleep __umtx_op_cv_wait > > _umtx_op syscallenter syscall Xint0x80_syscall > > 1588 100216 firefox-bin -mi_switch sleepq_switch > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op > > syscallenter syscall Xint0x80_syscall > > 1588 100220 firefox-bin -mi_switch sleepq_switch > > sleepq_wait _sleep getdirtybuf flush_deplist softdep_sync_metadata > > ffs_syncvnode ffs_fsync VOP_FSYNC_APV fsync syscallenter syscall > > Xint0x80_syscall > > 1588 100238 firefox-bin -mi_switch sleepq_switch > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op > > syscallenter syscall Xint0x80_syscall > > 1588 100239 firefox-bin -mi_switch sleepq_switch > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op > > syscallenter syscall Xint0x80_syscall > > 1588 100240 firefox-bin -mi_switch sleepq_switch > > sleepq_catch_signals sleepq_wait_sig _sleep __umtx_op_cv_wait _umtx_op > > syscallenter syscall Xint0x80_syscall > > Can you, please, do the following: > show the backtraces for the system processes, in particular, syncer, > bufdaemon, softdepflush daemon, pagedaemon and vm ? > for the stuck firefox thread, find the address of the buffer > supplied as an argument to getdirtybuf, and print the *(struct buf *)addr ? > This can be done on the live/stuck system using kgdb on /dev/mem. I can relatively easily recreate this, see my thread on -current on the 17th July ("Filesystem wedge, SUJ-related?"), which (and the followup emails) contain additional info. I'm currently trying to find the commit responsible for introducing this, and have established that a kernel from the 1st June does not seem to exhibit the same issue. Tonight, I'll revert to a current -current and try to get the info you need. Thanks, Gavin -- Gavin Atkinson FreeBSD committer and bugmeister ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [panic] Race in IEEE802.11 layer towards device drivers
Hi, Please confirm that this patch is working for you: http://p4web.freebsd.org/@@181261?ac=10 --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: current + mpt = panic: Bad link elm 0xffffff80002d6480 next->prev != elm
On 2010-07-20 at 14:16, Svein Skogen (Listmail account) wrote: > Sorry for the late response here, but what you're describing matches > fairly well what I saw with RELENG_8 (just after 8.0 was released), but > luckily I didn't have any disks on my MPT, just my tape autoloader. > > Random timeouts, and then bus resets (that made tape IO unreliable). > > The bad news, is that I had the exact same trouble with OpenSolaris > (134), and something-similar with Linux (can't remember versions), at > the time. > > I never did find a solution, and ended up throwing windows on the box, > just to get reliable backups. > > My MPT is a 3801 LSI1068e based card running the latest bios. Hmm, that does not sound good. Did windows work on the same hardware without problems? I -might- have solved my problem. It has now ran for 24h without timeouts, and with a bit of load on it. I think I might have ran into the seagate + NCQ-problem, even tho seagate's webpage told me my drives was not affected (according to the serial numbers). I did however update the following num drives firmware 6x ST31000340AS SD15 4x ST31500341AS SD17 to firmware SD1B (old SD17) and SD1A (old SD15), and that looks like it has done the trick. I'll report back in a week or so if the problem has not reappeared. -- Ståle Kristoffersen ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
HPC on FreeBSD - proposal to EPSRC
This post is primarily directed at people in the UK. EPSRC UK (Engineering and Physical Sciences Research Council) issued HPC Software Development Call for 2010/11. From their site: "This call invites proposals for High Performance Computing (HPC) software development to enable science and engineering." For full details see: http://www.epsrc.ac.uk/SiteCollectionDocuments/Calls/2010/HPCSoftwareDevelopmentCall.pdf I indend to draft a proposal for this call. I'm particularly interested in making HPC on FreeBSD ia64 a reality. Briefly, I want to propose development of an optimising MPI/OpenMP C/c++/Fortran compiler for a fbsd/ia64 cluster environment, probably based on llvm/clang. If you are in UK academia and want to participate, or if you are in business and might consider supporting a proposal of this sort (e.g. a letter of support) please get in touch directly. If you have other FreeBSD based ideas suitable for this call - I'd also love to hear. yours anton -- Anton Shterenlikht Room 2.6, Queen's Building Mech Eng Dept Bristol University University Walk, Bristol BS8 1TR, UK Tel: +44 (0)117 331 5944 Fax: +44 (0)117 929 4423 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: current + mpt = panic: Bad link elm 0xffffff80002d6480 next->prev != elm
On 21.07.2010 18:33, Ståle Kristoffersen wrote: > On 2010-07-20 at 14:16, Svein Skogen (Listmail account) wrote: >> Sorry for the late response here, but what you're describing matches >> fairly well what I saw with RELENG_8 (just after 8.0 was released), but >> luckily I didn't have any disks on my MPT, just my tape autoloader. >> >> Random timeouts, and then bus resets (that made tape IO unreliable). >> >> The bad news, is that I had the exact same trouble with OpenSolaris >> (134), and something-similar with Linux (can't remember versions), at >> the time. >> >> I never did find a solution, and ended up throwing windows on the box, >> just to get reliable backups. >> >> My MPT is a 3801 LSI1068e based card running the latest bios. > > Hmm, that does not sound good. Did windows work on the same hardware > without problems? Yup. But notice that I do _NOT_ have any disks on my MPT (I have an MFI for that), it's just a mini-sas<-->mini-sas into a HP 1/8G2 LTO3 Autoloader. > I -might- have solved my problem. It has now ran for 24h without timeouts, > and with a bit of load on it. I think I might have ran into the seagate + > NCQ-problem, even tho seagate's webpage told me my drives was not affected > (according to the serial numbers). I did however update the following > num drives firmware > 6x ST31000340AS SD15 > 4x ST31500341AS SD17 I have 8 of the last type (31500341AS) mine running on CC1H firmware, connected to my MFI. Not a single glitch so far. > > to firmware SD1B (old SD17) and SD1A (old SD15), and that looks like it has > done the trick. I'll report back in a week or so if the problem has not > reappeared. Hope it's fixed for you. I'm still keeping an eye on the MPT code to see if someone changes something that CAN be affecting my timeout issues/reset, and if I see something promising, I'm willing to dump out the entire server to tapes, and test run (I have sufficient spare tapes to actually test without losing data), but such a job will take me a week to prepare, and another to test. Quite a bit of time for something that "may" solve my problem... ;) //Svein -- +---+--- /"\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. Picture Gallery: https://gallery.stillbilde.net/v/svein/ signature.asc Description: OpenPGP digital signature
Re: Why is intr taking up so much cpu?
On Wed, 21 Jul 2010, Andriy Gapon wrote: Doug, could you please show your timer configuration, Nothing special in /boot/loader.conf, /etc/sysctl.conf, or my kernel. It's basically just GENERIC minus devices I don't have, plus the following: options DDB_CTF options VESA options GEOM_BDE device atapicam device sound device snd_hda Interestingly, I had a runaway intr thing again after watching a flash video, but this time it was hdac0, not swi:4. http://people.freebsd.org/~dougb/bad-dtrace-3-hdac.txt http://people.freebsd.org/~dougb/bad-dtrace-4-hdac.txt part of devinfo -u that describes interrupts Interrupt request lines: 0 (attimer0) 1 (atkbd0) 3 (root0) 4 (uart0) 5-7 (root0) 8 (atrtc0) 9 (acpi0) 10-11 (root0) 12 (psm0) 12 (psmcpnp0) 13 (root0) 14 (ata0) 15 (ata1) 16 (root0) 17 (wpi0) 18 (cbb0) 19 (root0) 20 (ehci0) 20 (uhci0) 20 (hpet0) 21 (uhci1) 22 (uhci2) 23 (uhci3) 256 (hdac0) and top of the output of top -SPH (including the header) when high interrupt load strikes? Will do next time, thanks! Doug -- Improve the effectiveness of your Internet presence with a domain name makeover!http://SupersetSolutions.com/ Computers are useless. They can only give you answers. -- Pablo Picasso ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Why is intr taking up so much cpu?
on 21/07/2010 21:50 Doug Barton said the following: > On Wed, 21 Jul 2010, Andriy Gapon wrote: > >> >> >> Doug, >> >> could you please show your timer configuration, > > Nothing special in /boot/loader.conf, /etc/sysctl.conf, or my kernel. > It's basically just GENERIC minus devices I don't have, plus the following: I didn't mean your manual tuning, I meant how the system is configured :-) E.g. the relevant sysctl tree. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
panic with clangbsd kernel in kern/vfs_bio.c
Hi, on my Acer 7738G laptop running FreeBSD 9.0-amd64 r209980 (with latest clangbsd kernel), I encountered this panic (recovered from /var/log/messages), while doing some moderately light load (portmaster, openoffice, firefox, thunderbird in an xfce4 session): Jul 21 22:29:47 acer kernel: panic: buf 0xff80526d00c0 already counted as free Jul 21 22:29:47 acer kernel: cpuid = 1 Jul 21 22:29:47 acer kernel: KDB: enter: panic Jul 21 22:29:47 acer kernel: Jul 21 22:29:47 acer kernel: 0xff0005093b40: tag devfs, type VCHR Jul 21 22:29:47 acer kernel: usecount 1, writecount 0, refcount 588 mountedhere 0xff0004c84200 Jul 21 22:29:47 acer kernel: flags () Jul 21 22:29:47 acer kernel: v_object 0xff00050adbd0 ref 0 pages 12590 Jul 21 22:29:47 acer kernel: lock type devfs: EXCL by thread 0xff0004c59880 (pid 17) Jul 21 22:29:47 acer kernel: dev ad4s1f I have the following modules loaded: acer % kldstat Id Refs AddressSize Name 1 32 0x8010 f925f0 kernel (GENERIC) 21 0x81093000 1bd28if_iwn.ko 31 0x810af000 296f8snd_hda.ko 42 0x810d9000 85fe8sound.ko 51 0x8115f000 570f8iwn5000fw.ko 61 0x811b7000 dc29d0 nvidia.ko (ports/x11/nvidia-driver, version 256.35 with patches from current@) 73 0x81f7a000 423c8linux.ko 81 0x81fbd000 d38 biosfont.ko (ports/sysutils/biosfont) 91 0x82012000 3a73 linprocfs.ko acer % grep -nr "already counted as free" ~/freebsd/clangbsd/sys/* | grep -v \.svn kern/vfs_bio.c:401: ("buf %p already counted as free", bp)); acer % ident kern/vfs_bio.c kern/vfs_bio.c: $FreeBSD: projects/clangbsd/sys/kern/vfs_bio.c 209170 2010-06-14 18:45:33Z rdivacky $ acer % ident /sys/kern/vfs_bio.c /sys/kern/vfs_bio.c: $FreeBSD: head/sys/kern/vfs_bio.c 209902 2010-07-11 20:11:44Z alc $ So it might be fixed already. Let me know if you need more information. Unfortunately I didn't get a core dump, although dumpdev="AUTO" in /etc/rc.conf. The laptop rebooted by itself. Regards, Rene ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Why is intr taking up so much cpu?
On Wed, 21 Jul 2010, Andriy Gapon wrote: I didn't mean your manual tuning, I meant how the system is configured :-) E.g. the relevant sysctl tree. Duh. :) Sorry. sysctl -a | grep timer kern.eventtimer.choice: LAPIC(500) HPET(450) HPET1(440) HPET2(440) i8254(100) RTC(0) kern.eventtimer.et.LAPIC.flags: 15 kern.eventtimer.et.LAPIC.frequency: 83223728 kern.eventtimer.et.LAPIC.quality: 500 kern.eventtimer.et.HPET.flags: 3 kern.eventtimer.et.HPET.frequency: 14318180 kern.eventtimer.et.HPET.quality: 450 kern.eventtimer.et.HPET1.flags: 3 kern.eventtimer.et.HPET1.frequency: 14318180 kern.eventtimer.et.HPET1.quality: 440 kern.eventtimer.et.HPET2.flags: 3 kern.eventtimer.et.HPET2.frequency: 14318180 kern.eventtimer.et.HPET2.quality: 440 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.timer2: HPET kern.eventtimer.timer1: LAPIC kern.eventtimer.singlemul: 2 net.inet.tcp.timer_race: 0 net.inet.tcp.per_cpu_timers: 0 machdep.acpi_timer_freq: 3579545 p1003_1b.timers: 200112 p1003_1b.delaytimer_max: 2147483647 p1003_1b.timer_max: 32 dev.acpi_timer.0.%desc: 24-bit timer at 3.579545MHz dev.acpi_timer.0.%driver: acpi_timer dev.acpi_timer.0.%location: unknown dev.acpi_timer.0.%pnpinfo: unknown dev.acpi_timer.0.%parent: acpi0 dev.attimer.0.%desc: AT timer dev.attimer.0.%driver: attimer dev.attimer.0.%location: handle=\_SB_.PCI0.ISAB.TMR_ dev.attimer.0.%pnpinfo: _HID=PNP0100 _UID=0 dev.attimer.0.%parent: acpi0 dev.pmtimer.0.%driver: pmtimer dev.pmtimer.0.%parent: isa0 -- Improve the effectiveness of your Internet presence with a domain name makeover!http://SupersetSolutions.com/ Computers are useless. They can only give you answers. -- Pablo Picasso ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"