date:20120224

Re: FreeBSD9 and the sheer number of problem reports

2012-02-24 Thread Andreas Nilsson

On Fri, Feb 24, 2012 at 7:46 AM, Erich Dollansky <
erichfreebsdl...@ovitrap.com> wrote:

> Hi,
>
> On Friday 24 February 2012 04:21:12 Peter Maloney wrote:
> > Am 23.02.2012 21:15, schrieb Mark Felder:
> > > On Thu, 23 Feb 2012 12:25:01 -0600, Damien Fleuriot  wrote:
> > >
> > >>
> > >> Now, I find the number of problem reports regarding 9.0-RELEASE
> alarming
> > >> and I'm growing more and more fearful towards it.
> > >
> > I suggest these concepts should be tested:
> >
> I can tell you what in practical terms stops me from testing very often.
> The switch back to the running version.
>
> Let me suggest this.
>
> Currently, we have on the disk normally two kernels. The current one and
> the last one. Why not add a third one called testing?
>
> Add then an entry into the boot menu that users can switch between the
> current kernel and a kernel they just installed for testing.
>

Well, as you would want to test both kernel + userland its get a bit tricky
on ufs based system, as you have to setup several slices/partitions. For
ZFS its easier, as the only thing required would be a snapshot of clean
install, which the user then can just zfs recv, modify vfs.root.mountfrom
and so on.

Just my thoughts.

Andreas

>
> I know that I can do this manually. But this is the point where it becomes
> difficult for the majority of people.
>
> As FreeBSD needs a large amount of testing on unknown hardware, this could
> increase the number of actual testers without much effort.
>
> Ok, the developers must then be ready to deal with reports which miss many
> things.
>
> Erich
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD9 and the sheer number of problem reports

2012-02-24 Thread Erich Dollansky

Hi,

On Friday 24 February 2012 15:34:06 Andreas Nilsson wrote:
> On Fri, Feb 24, 2012 at 7:46 AM, Erich Dollansky <
> erichfreebsdl...@ovitrap.com> wrote:
> 
> > On Friday 24 February 2012 04:21:12 Peter Maloney wrote:
> > > Am 23.02.2012 21:15, schrieb Mark Felder:
> > > > On Thu, 23 Feb 2012 12:25:01 -0600, Damien Fleuriot  wrote:
> > Let me suggest this.
> >
> > Currently, we have on the disk normally two kernels. The current one and
> > the last one. Why not add a third one called testing?
> >
> > Add then an entry into the boot menu that users can switch between the
> > current kernel and a kernel they just installed for testing.
> >
> 
> Well, as you would want to test both kernel + userland its get a bit tricky
> on ufs based system, as you have to setup several slices/partitions. For
> ZFS its easier, as the only thing required would be a snapshot of clean
> install, which the user then can just zfs recv, modify vfs.root.mountfrom
> and so on.
> 
/usr/local for the current system and
/usr/localtest for the other system.

Of course, the same for /bin, /etc ...

It is not that difficult.

Or a script which renames the directories for the next start.

Erich
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD9 and the sheer number of problem reports

2012-02-24 Thread Tom Evans

On Thu, Feb 23, 2012 at 9:21 PM, Peter Maloney
 wrote:
> I suggest these concepts should be tested:
>
> Perhaps the testers tested beta1 and beta2, but there were so many
> changes after beta2, that bugs appeared in release that did not exist in
> beta2. Test this by reproducing things reported in release also in beta1
> or 2.
>
> Perhaps the people who know the rule about running .0 releases (such as
> myself) never bothered to test beta1, beta2, or even release .0 (true in
> my case). If so, then this rule is a very bad one. Test this with a poll.
>

At $JOB, we never install a N.0 release either, but only because the
.0 release has such a brief life. The N.1 and N.3 releases have
extended lifetimes, and so we tend to only use those versions.

Cheers

Tom
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Another ZFS ARC memory question

2012-02-24 Thread Luke Marsden

Hi all,

Just wanted to get your opinion on best practices for ZFS.

We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
but have been having trouble with short spikes in application memory
usage resulting in huge amounts of swapping, bringing the whole machine
to its knees and crashing it hard.  I suspect this is because when there
is a sudden spike in memory usage the zfs arc reclaim thread is unable
to free system memory fast enough.

This most recently happened yesterday as you can see from the following
munin graphs:

E.g. http://hybrid-logic.co.uk/memory-day.png
 http://hybrid-logic.co.uk/swap-day.png

Our response has been to start limiting the ZFS ARC cache to 4GB on our
production machines - trading performance for stability is fine with me
(and we have L2ARC on SSD so we still get good levels of caching).

My questions are:

  * is this a known problem?
  * what is the community's advice for production machines running
ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
that there's enough actually free memory to handle a spike in
application memory usage) the best solution to this
spike-in-memory-means-crash problem?
  * has FreeBSD 9.0 / ZFS v28 solved this problem?
  * rather than setting a hard limit on the ARC cache size, is it
possible to adjust the auto-tuning variables to leave more free
memory for spiky memory situations?  e.g. set the auto-tuning to
make arc eat 80% of memory instead of ~95% like it is at
present?
  * could the arc reclaim thread be made to drop ARC pages with
higher priority before the system starts swapping out
application pages?

Thank you for any/all answers, and thank you for making FreeBSD
awesome :-)

Best Regards,
Luke Marsden

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: another panic in 8.3-PRERELEASE

2012-02-24 Thread Konstantin Belousov

On Thu, Feb 23, 2012 at 11:45:58PM +0900, Hiroki Sato wrote:
> Hi,
> 
>  This is another reproducible panic.  This seems to happen only when
>  top(1) is running for a long time (a sysctl() call for
>  CTL_KERN.KERN_PROC.KERN_PROC_PROC MIB triggered it).
> 
> 
> pool.allbsd.org dumped core - see /var/crash/vmcore.0
> 
> Thu Feb 23 23:21:52 JST 2012
> 
> FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #8: Thu Feb 23 
> 04:40:54 JST 2012 h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL  amd64
> 
> panic:
> 
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 4; apic id = 04
> fault virtual address = 0x800e96000
> fault code= supervisor write data, protection violation
> instruction pointer   = 0x20:0x809440cb
> stack pointer = 0x28:0xff86c63890b0
> frame pointer = 0x28:0xff86c6389100
> code segment  = base 0x0, limit 0xf, type 0x1b
>   = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 47211 (top)
> lock order reversal: (Giant after non-sleepable)
>  1st 0xff0244b85568 process lock (process lock) @ 
> /usr/src/sys/kern/kern_proc.c:1211
>  2nd 0x80d74c80 Giant (Giant) @ /usr/src/sys/dev/usb/input/ukbd.c:2018
> KDB: stack backtrace:
> Dumping 23903 out of 24550 MB:..1%..11%..21%..31% (CTRL-C to abort)  (CTRL-C 
> to abort) ..41%..51%..61%..71%..81%..91%
> 
> Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from 
> /boot/kernel/geom_mirror.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/geom_mirror.ko
> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
> /boot/kernel/zfs.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/zfs.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
> /boot/kernel/opensolaris.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
> /boot/kernel/ipfw.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ipfw.ko
> #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
> 263   if (textdump_pending)
> (kgdb) #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
> #1  0x801f8cfc in db_fncall (dummy1=Variable "dummy1" is not 
> available.
> )
> at /usr/src/sys/ddb/db_command.c:548
> #2  0x801f9031 in db_command (last_cmdp=0x80d37f40, 
> cmd_table=Variable "cmd_table" is not available.
> 
> ) at /usr/src/sys/ddb/db_command.c:445
> #3  0x801f9280 in db_command_loop ()
> at /usr/src/sys/ddb/db_command.c:498
> #4  0x801fb369 in db_trap (type=Variable "type" is not available.
> ) at /usr/src/sys/ddb/db_main.c:229
> #5  0x8069dff1 in kdb_trap (type=12, code=0, tf=0xff86c6389000)
> at /usr/src/sys/kern/subr_kdb.c:548
> #6  0x809461ed in trap_fatal (frame=0xff86c6389000, eva=Variable 
> "eva" is not available.
> )
> at /usr/src/sys/amd64/amd64/trap.c:820
> #7  0x809468b5 in trap (frame=0xff86c6389000)
> at /usr/src/sys/amd64/amd64/trap.c:326
> #8  0x8092d2f4 in calltrap ()
> at /usr/src/sys/amd64/amd64/exception.S:228
> #9  0x809440cb in copyout () at /usr/src/sys/amd64/amd64/support.S:258
> #10 0x80675f1f in sysctl_old_user (req=0xff86c63899c0,
> p=0xff86c6389470, l=1088) at /usr/src/sys/kern/kern_sysctl.c:1276
> #11 0x8065f6a6 in sysctl_out_proc_copyout (ki=0xff86c6389470,
> req=0xff86c63899c0) at /usr/src/sys/kern/kern_proc.c:1085
> #12 0x8065ff6c in sysctl_out_proc (p=0xff0244b85470,
> req=0xff86c63899c0, flags=Variable "flags" is not available.
> ) at /usr/src/sys/kern/kern_proc.c:1114
> #13 0x8066245e in sysctl_kern_proc (oidp=Variable "oidp" is not 
> available.
> )
> at /usr/src/sys/kern/kern_proc.c:1302
> #14 0x806756e8 in sysctl_root (oidp=Variable "oidp" is not available.
> )
> at /usr/src/sys/kern/kern_sysctl.c:1455
> #15 0x8067598e in userland_sysctl (td=0x0, name=0xff86c6389a80,
> namelen=3, old=0x800e96000, oldlenp=Variable "oldlenp" is not available.
> )
> at /usr/src/sys/kern/kern_sysctl.c:1565
> #16 0x80675e3a in __sysctl (td=0xff0396ec5460,
> uap=0xff86c6389bc0) at /usr/src/sys/kern/kern_sysctl.c:1491
> #17 0x80945809 in amd64_syscall (td=0xff0396ec5460, traced=0)
> at subr_syscall.c:114

Re: another panic in 8.3-PRERELEASE

2012-02-24 Thread Konstantin Belousov

On Fri, Feb 24, 2012 at 04:33:36PM +0200, Konstantin Belousov wrote:
> On Thu, Feb 23, 2012 at 11:45:58PM +0900, Hiroki Sato wrote:
> > Hi,
> > 
> >  This is another reproducible panic.  This seems to happen only when
> >  top(1) is running for a long time (a sysctl() call for
> >  CTL_KERN.KERN_PROC.KERN_PROC_PROC MIB triggered it).
> > 
> > 
> > pool.allbsd.org dumped core - see /var/crash/vmcore.0
> > 
> > Thu Feb 23 23:21:52 JST 2012
> > 
> > FreeBSD pool.allbsd.org 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #8: Thu Feb 
> > 23 04:40:54 JST 2012 h...@pool.allbsd.org:/usr/obj/usr/src/sys/POOL  
> > amd64
> > 
> > panic:
> > 
> > GNU gdb 6.1.1 [FreeBSD]
> > Copyright 2004 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and you are
> > welcome to change it and/or distribute copies of it under certain 
> > conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB.  Type "show warranty" for details.
> > This GDB was configured as "amd64-marcel-freebsd"...
> > 
> > Unread portion of the kernel message buffer:
> > 
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 4; apic id = 04
> > fault virtual address   = 0x800e96000
> > fault code  = supervisor write data, protection violation
> > instruction pointer = 0x20:0x809440cb
> > stack pointer   = 0x28:0xff86c63890b0
> > frame pointer   = 0x28:0xff86c6389100
> > code segment= base 0x0, limit 0xf, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags= interrupt enabled, resume, IOPL = 0
> > current process = 47211 (top)
> > lock order reversal: (Giant after non-sleepable)
> >  1st 0xff0244b85568 process lock (process lock) @ 
> > /usr/src/sys/kern/kern_proc.c:1211
> >  2nd 0x80d74c80 Giant (Giant) @ 
> > /usr/src/sys/dev/usb/input/ukbd.c:2018
> > KDB: stack backtrace:
> > Dumping 23903 out of 24550 MB:..1%..11%..21%..31% (CTRL-C to abort)  
> > (CTRL-C to abort) ..41%..51%..61%..71%..81%..91%
> > 
> > Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from 
> > /boot/kernel/geom_mirror.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/geom_mirror.ko
> > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
> > /boot/kernel/zfs.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/zfs.ko
> > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
> > /boot/kernel/opensolaris.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/opensolaris.ko
> > Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
> > /boot/kernel/ipfw.ko.symbols...done.
> > done.
> > Loaded symbols for /boot/kernel/ipfw.ko
> > #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
> > 263 if (textdump_pending)
> > (kgdb) #0  doadump () at /usr/src/sys/kern/kern_shutdown.c:263
> > #1  0x801f8cfc in db_fncall (dummy1=Variable "dummy1" is not 
> > available.
> > )
> > at /usr/src/sys/ddb/db_command.c:548
> > #2  0x801f9031 in db_command (last_cmdp=0x80d37f40, 
> > cmd_table=Variable "cmd_table" is not available.
> > 
> > ) at /usr/src/sys/ddb/db_command.c:445
> > #3  0x801f9280 in db_command_loop ()
> > at /usr/src/sys/ddb/db_command.c:498
> > #4  0x801fb369 in db_trap (type=Variable "type" is not available.
> > ) at /usr/src/sys/ddb/db_main.c:229
> > #5  0x8069dff1 in kdb_trap (type=12, code=0, tf=0xff86c6389000)
> > at /usr/src/sys/kern/subr_kdb.c:548
> > #6  0x809461ed in trap_fatal (frame=0xff86c6389000, 
> > eva=Variable "eva" is not available.
> > )
> > at /usr/src/sys/amd64/amd64/trap.c:820
> > #7  0x809468b5 in trap (frame=0xff86c6389000)
> > at /usr/src/sys/amd64/amd64/trap.c:326
> > #8  0x8092d2f4 in calltrap ()
> > at /usr/src/sys/amd64/amd64/exception.S:228
> > #9  0x809440cb in copyout () at 
> > /usr/src/sys/amd64/amd64/support.S:258
> > #10 0x80675f1f in sysctl_old_user (req=0xff86c63899c0,
> > p=0xff86c6389470, l=1088) at /usr/src/sys/kern/kern_sysctl.c:1276
> > #11 0x8065f6a6 in sysctl_out_proc_copyout (ki=0xff86c6389470,
> > req=0xff86c63899c0) at /usr/src/sys/kern/kern_proc.c:1085
> > #12 0x8065ff6c in sysctl_out_proc (p=0xff0244b85470,
> > req=0xff86c63899c0, flags=Variable "flags" is not available.
> > ) at /usr/src/sys/kern/kern_proc.c:1114
> > #13 0x8066245e in sysctl_kern_proc (oidp=Variable "oidp" is not 
> > available.
> > )
> > at /usr/src/sys/kern/kern_proc.c:1302
> > #14 0x806756e8 in sysctl_root (oidp=Variable "oidp" is not 
> > available.
> > )
> > at /usr/src/sys/kern/kern_sysctl.c:1455
> > #15 0x8067598e in userland_sysctl (td=0x0, name=0xff86c6389a80,
> > namelen=3, old=0x800e96000, oldlenp=Variab

Re: random problem with 8.3 from yesterday

2012-02-24 Thread Ian Lepore

On Fri, 2012-02-24 at 13:50 +0700, Erich Dollansky wrote:
> Hi,
> 
> On Thursday 23 February 2012 20:22:57 Stefan Bethke wrote:
> > Am 22.02.2012 um 07:34 schrieb Erich Dollansky:
> > 
> > > 
> > > tunefs -L NewDeviceName /dev/da0a
> > > 
> > > Either this call or the mount command does not work randomly.
> > > 
> > > When I then try to mount the device on /dev/da0a it does not work always.
> > > 
> > > I do not know what this causes, I am only randomly able to reproduce it.
> > > 
> > > It might be affected by removing the device or keeping it plugged in.
> > 
> > You need to be more specific: what "does not work" mean? Output, results?
> > 
> it seems that I forgot to copy the console output for this.
> 
> Ok, as far as I remember, tunefs said something like it does not recognise 
> the slice.
> 
> Mount has had two different messages. One also said that it could not 
> find/recognise the slice. The other one said that the file system was unknown 
> despite just running a newfs on it.
> 
> I am very much aware that this kind of errors are very hard to find 
> especially if they are not reproduceable.
> 
> Erich
> 
> > 
> > Stefan
> > 
> > -- 
> > Stefan BethkeFon +49 151 14070811

I've been putting up with problems like this since first upgrading to
8.2.  I guess I haven't dug deeper into them because it's actually a
huge improvement from what I was used to in 6.x and 7.x where complete
system lockups were more common with removable usb drives.  Here's an
example sequence that just happened to me with a compact flash card in a
usb multi-card reader...

revolution # ll /dev/da*
crw-r-  1 root  operator0, 246 Feb 24 08:21 /dev/da0
crw-r-  1 root  operator1,  26 Feb 24 08:21 /dev/da0s1
crw-r-  1 root  operator1,  39 Feb 24 08:21 /dev/da0s1a
crw-r-  1 root  operator1,  40 Feb 24 08:21 /dev/da0s1e
crw-r-  1 root  operator1,  27 Feb 24 08:21 /dev/da0s2
crw-r-  1 root  operator1,  29 Feb 24 08:21 /dev/da0s2a
crw-r-  1 root  operator1,  30 Feb 24 08:21 /dev/da0s2e
crw-r-  1 root  operator1,  28 Feb 24 08:21 /dev/da0s3
crw-r-  1 root  operator1,  32 Feb 24 08:21 /dev/da0s3a
crw-r-  1 root  operator0, 248 Feb 23 12:01 /dev/da1
crw-r-  1 root  operator0, 249 Feb 23 12:01 /dev/da2
crw-r-  1 root  operator0, 250 Feb 23 12:01 /dev/da3
crw-r-  1 root  operator1,  44 Feb 24 08:54 /dev/da4
revolution # mount /dev/da0s1a /mnt
mount: /dev/da0s1a : Invalid argument
revolution # fsck -y /dev/da0s1a
fsck: Could not determine filesystem type
revolution # fsck -t ufs -y /dev/da0s1a
** /dev/da0s1a
Cannot find file system superblock
ioctl (GCINFO): Inappropriate ioctl for device
fsck_ufs: /dev/da0s1a: can't read disk label

At this point I unplug the multi-card reader and plug it back in.

revolution # fsck -y /dev/da0s1a
fsck: Could not determine filesystem type
revolution # fsck -t ufs -y /dev/da0s1a
** /dev/da0s1a
** Last Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
1932 files, 45569 used, 385214 free (54 frags, 48145 blocks,
0.0% fragmentation)

* FILE SYSTEM IS CLEAN *
revolution # mount /dev/da0s1a /mnt

At this point everything is fine and I can access the card.  Sometimes I
have to do the unplug/replug dance and sometimes I don't.  I've always
suspected something in the geom layer isn't noticing that a CF or SD
card in the reader got removed/inserted/reformatted, and un-/re-plugging
the whole reader (making the cam layer destroy and recreate the devices)
makes geom aware of the change.

Oh, a datapoint... notice how the timestamps on the /dev/da0* files
above are all 08:21?  I had just inserted that card at 08:57 when I ran
the command sequence above, but apparently the geom layer was still
reporting on a different card that was used and removed earlier this
morning.

I'm not sure whether or not this is related to the problem Erich
originally reported, but there are some similarities in symptoms such as
the inability to recognize the filesystem type, so I thought I'd mention
it.  This happens to me several times a week (often several times a day)
so if anyone has suggestions on information-gathering I'll probably have
lots of opportunities.

-- Ian

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Regression in 8.2-STABLE bge code (from 7.4-STABLE)

2012-02-24 Thread John Baldwin

On Tuesday, February 14, 2012 7:56:00 pm YongHyeon PYUN wrote:
> On Sat, Jan 28, 2012 at 09:24:53PM -0500, Michael L. Squires wrote:
> 
> Sorry for late reply.  Had been busy due to relocation.
> 
> > There is a bug in the Tyan S4881/S4882 PCI-X bridges that was fixed with a 
> > patch in 7.x (thank you very much).  This patch is not present in the 
> > 8.2-STABLE code and the symptoms (watchdog timeouts) have recurred.
> > 
> 
> Hmm, I thought the mailbox reordering bug was avoided by limiting
> DMA address space to 32bits but it seems it was not right workaround
> for AMD 8131 PCI-X Bridge.
> 
> > The watchdog timeouts do not appear to be present after I switched to an 
> > Intel gigabit PCI-X card.
> > 
> > I did a brute-force patch of the 8.2-STABLE bge code using the patches for
> > 7.4-STABLE; the resulting code compiled and, other than odd behavior at
> > startup, seems to be working normally.
> > 
> > This is using FreeBSD 8.2-STABLE amd64; I don't know what happens with 
> > i386.
> > 
> > Given the age of the boards it may be easier if I just continue using the
> > Intel gigabit card but am happy to test anything that comes my way.
> > 
> 
> Try attached patch and let me know how it goes.
> I didn't enable 64bit DMA addressing though. I think the AMD-8131
> PCI-X bridge needs both workarounds.

Eh, please don't do the thing where you walk all pcib devices.  Instead, walk 
up the tree like so:

static int
bge_mbox_reorder(struct bge_softc *sc)
{
devclass_t pcib, pci;
device_t dev, bus;

pci = devclass_find("pci");
pcib = devclass_find("pcib");
dev = sc->dev;
bus = device_get_parent(dev);
for (;;) {
dev = device_get_parent(bus);
bus = device_get_parent(dev);
if (device_get_devclass(dev) != pcib_devclass ||
device_get_devclass(bus) != pci_devclass)
break;
/* Probe device ID. */
}
return (0);
}

It is not safe to use pci_get_vendor() with non-PCI devices (you may get
random junk, and Host-PCI bridges are not PCI devices).  Also, this will only
apply the quirk if a relevant bridge is in the bge device's path.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: mpslsi0 : Trying sleep, but thread marked as sleeping prohibited

2012-02-24 Thread John Baldwin

On Thursday, February 23, 2012 8:22:07 am Desai, Kashyap wrote:
> 
> > -Original Message-
> > From: Konstantin Belousov [mailto:kostik...@gmail.com]
> > Sent: Thursday, February 23, 2012 2:55 PM
> > To: Desai, Kashyap
> > Cc: freebsd-s...@freebsd.org; freebsd-stable; Justin T. Gibbs; Kenneth
> > D. Merry; McConnell, Stephen
> > Subject: Re: mpslsi0 : Trying sleep, but thread marked as sleeping
> > prohibited
> > 
> > On Thu, Feb 23, 2012 at 05:52:12AM +0530, Desai, Kashyap wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Konstantin Belousov [mailto:kostik...@gmail.com]
> > > > Sent: Thursday, February 23, 2012 12:45 AM
> > > > To: Desai, Kashyap
> > > > Cc: freebsd-s...@freebsd.org; freebsd-stable; Justin T. Gibbs;
> > > > Kenneth D. Merry; McConnell, Stephen
> > > > Subject: Re: mpslsi0 : Trying sleep, but thread marked as sleeping
> > > > prohibited
> > > >
> > > > On Wed, Feb 22, 2012 at 07:36:42PM +0530, Desai, Kashyap wrote:
> > > > > Hi,
> > > > >
> > > > > I am doing some code changes in mps dirver. While working on those
> > > > changes, I come to know about something which is new to me.
> > > > > Some expert help is required to clarify my doubt.
> > > > >
> > > > > 1. When any irq is register with FreeBSD OS, it sets "
> > TDP_NOSLEEPING"
> > > > > pflag. It means though irq in freebsd is treated as thread, We
> > > > > cannot
> > > > sleep in IRQ because of " "TDP_NOSLEEPING " set.
> > > > > 2. In mps driver we have below code snippet in ISR routine.
> > > > >
> > > > >
> > > > > mps_dprint(sc, MPS_TRACE, "%s\n", __func__);
> > > > > mps_lock(sc);
> > > > > mps_intr_locked(data);
> > > > > mps_unlock(sc);
> > > > >
> > > > > I wonder why there is no issue with above code ? Theoretical we
> > > > > cannot sleep in ISR. (as explained in #1) Any thoughts ?
> > > > >
> > > > >
> > > > > 3. I recently added few place msleep() instead of DELAY in ISR
> > > > > context and I see " Trying sleep, but thread marked as sleeping
> > prohibited".
> > > > >
> > > > FreeBSD has several basic ways to prevent a thread from executing on
> > > > CPU.
> > > > They mostly fall into two categories: bounded sleep, sometimes
> > > > called blocking, and unbounded sleep, usually abbreviated as sleep.
> > > > The bounded there refers to amount of code executed by other thread
> > > > that hold resource preventing blocked thread from making a progress.
> > > >
> > > > Examples of the blocking primitives are mutexes, rw locks and rm
> > locks.
> > > > The blocking is not counted as sleeping, so interrupt threads, which
> > > > are designated as non-sleeping, still can lock mutexes.
> > > Thanks for the tech help.  .
> > >
> > > As per you comment, So now I understood as "TDP_NOSLEEPING" is only
> > > for unbounded sleep restriction. Just curious to know, What is a
> > > reason that thread can do blocking sleep but can't do unbounded sleep
> > > ? Since technically we introduced sleeping restriction on interrupt
> > > thread is to avoid starvation and that can be fit with either of the
> > > sleep type. Is this not true ?
> > No, not to avoid starvation.
> > 
> > The intent of the blocking primitives is to acquire resources for
> > limited amount of time. In other words, you never take a mutex for
> > undefinitely long computation process. On the other hand, msleep sleep
> > usually has no limitations.
> 
> I got same reply from Ed Schouten. I agree and understood your note. Thanks 
for poring knowledge on this area.
> _but_ only query is when thread take mutex, we don't know when it will 
release. So holding time of mutex is really not known.
> In case of some bad code, where thread took mutex and not release within 
short time. This can eventually match upto msleep restriction as well.
> Do we have  any checks that thread took long time holding mutext ? Similar 
to linux where spinlock has been not release in some specific time, they dump 
warnings with backtrace.

We don't allow code to do unbounded sleeps while holding mutexes either, and
WITNESS warns about doing so.  That ensures that barring an infinite loop-type
bug, mutexes should be held for a bounded amount of time.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: another panic in 8.3-PRERELEASE

2012-02-24 Thread Hiroki Sato

Konstantin Belousov  wrote
  in <20120224150259.gv55...@deviant.kiev.zoral.com.ua>:

ko> > > #19 0x000800abecfc in ?? ()
ko> > > Previous frame inner to this frame (corrupt stack?)
ko> > > (kgdb)
ko> > Can you, please, print out the content of *td, e.g. from the frame 16 ?
ko> 
ko> And *req from the frame 11, please.

 Here:

(kgdb) f 16
#16 0x80675e3a in __sysctl (td=0xff0396ec5460, 
uap=0xff86c6389bc0) at /usr/src/sys/kern/kern_sysctl.c:1491
1491error = userland_sysctl(td, name, uap->namelen,
(kgdb) print *td
$2 = {td_lock = 0x80d7f540, td_proc = 0xff03969bf470, td_plist = {
tqe_next = 0x0, tqe_prev = 0xff03969bf480}, td_runq = {tqe_next = 0x0, 
tqe_prev = 0x80d7f788}, td_slpq = {tqe_next = 0x0, 
tqe_prev = 0xff0396ebe800}, td_lockq = {tqe_next = 0x0, 
tqe_prev = 0xff86c57b48a0}, td_cpuset = 0xff0005789dc8, 
  td_sel = 0xff01b5dd0500, td_sleepqueue = 0xff0396ebe800, 
  td_turnstile = 0xff01334cf600, td_umtxq = 0xff0396ec3a80, 
  td_tid = 100763, td_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, 
sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, 
  tqh_last = 0xff0396ec5500}, sq_proc = 0xff03969bf470, 
sq_flags = 1}, td_flags = 65540, td_inhibitors = 0, td_pflags = 0, 
  td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, td_wmesg = 0x0, 
  td_lastcpu = 4 '\004', td_oncpu = 4 '\004', td_owepreempt = 0 '\0', 
  td_tsqueue = 255 'ÿ', td_locks = 4, td_rw_rlocks = 0, td_lk_slocks = 0, 
  td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, 
  td_sleeplocks = 0x80ecebf0, td_intr_nesting_level = 0, 
  td_pinned = 0, td_ucred = 0xff007d537b00, td_estcpu = 0, td_slptick = 0, 
  td_blktick = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {
  tv_sec = 0, tv_usec = 0}, ru_maxrss = 1864, ru_ixrss = 66288, 
ru_idrss = 1347856, ru_isrss = 176768, ru_minflt = 263901, ru_majflt = 10, 
ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, 
ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 14937, ru_nivcsw = 3286}, 
  td_incruntime = 0, td_runtime = 15204044088, td_pticks = 15, td_sticks = 15, 
  td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 
  0, 0, 0}}, td_sigmask = {__bits = {0, 0, 0, 0}}, td_generation = 18223, 
  td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, 
  td_profil_addr = 0, td_profil_ticks = 0, 
  td_name = "top", '\0' , td_fpop = 0x0, td_dbgflags = 0, 
  td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {
  si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, 
  si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, 
sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {
_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, 
_mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0, 
  __spare2__ = {0, 0, 0, 0, 0, 0, 0, ksi_flags = 0, 
ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, 
osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, 
  td_rqindex = 32 ' ', td_base_pri = 128 '\200', td_priority = 128 '\200', 
  td_pri_class = 3 '\003', td_user_pri = 129 '\201', 
  td_base_user_pri = 129 '\201', td_pcb = 0xff86c6389d10, 
  td_state = TDS_RUNNING, td_retval = {0, 34375032832}, td_slpcallout = {
c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, 
tqe_prev = 0xff800042ccd0}}, c_time = 51568077, 
c_arg = 0xff0396ec5460, c_func = 0x806a84c0 , 
c_lock = 0x0, c_flags = 18, c_cpu = 4}, td_frame = 0xff86c6389c50, 
  td_kstack_obj = 0xff03410b20d8, td_kstack = 18446743553049124864, 
  td_kstack_pages = 4, td_unused1 = 0x0, td_unused2 = 0, td_unused3 = 0, 
  td_critnest = 0, td_md = {md_spinlock_count = 0, md_saved_flags = 70}, 
  td_sched = 0xff0396ec5890, td_ar = 0x0, td_syscalls = 469926, 
  td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0x0, 
  td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_rux = {
rux_runtime = 15204044088, rux_uticks = 226, rux_sticks = 1140, 
rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, 
  td_map_def_user = 0x0, td_dbg_forked = 0}
(kgdb) f 11
#11 0x8065f6a6 in sysctl_out_proc_copyout (ki=0xff86c6389470, 
req=0xff86c63899c0) at /usr/src/sys/kern/kern_proc.c:1085
1085error = SYSCTL_OUT(req, ki, sizeof(struct kinfo_proc));
(kgdb) print *req
$3 = {td = 0xff0396ec5460, lock = 2, oldptr = 0x800e96000, oldlen = 68217, 
  oldidx = 1088, oldfunc = 0x80675e80 , newptr = 0x0, 
  newlen = 0, newidx = 0, newfunc = 0x80675d10 , 
  validlen = 68217, flags = 0}
(kgdb) quit

-- Hiroki


pgpXBb7kwRDuX.pgp
Description: PGP signature

Re: The "New BSD Installer" thread has shown me that I am totally obsolete in disk partitioning.

2012-02-24 Thread Edwin L. Culp W.

2012/2/20 Peter Maloney 

> Am 17.02.2012 21:08, schrieb Edwin L. Culp W.:
> > If such a thing exists,  I need a howto in mixing and matching all the
> > different partitioning options and combinations, pro's and con's, for as
> > many modern situations as possible. Any suggestions appreciated.
> To create a full document from scratch explaining everything is likely
> beyond the capabilites of any individual. If you had more specific
> questions or topics, it would help us to help you more efficiently. What
> are you looking for? A gpart howto? Mirroring and other types of
> devices? mbr vs gpt? Whether or not to have separate partitions for /usr
> and /var, etc.?
>
> And then when you get your answers, submit a PR including the details of
> what you expected in the handbook, and what you learned elsewhere that
> should be added, and then in the PR, ask them to add what you wrote in
> the PR to the handbook.
>
> Personally, I can tell you a bunch about gpart, zfs, and explain or
> elaborate on technical jargon, but I don't know too much about
> sysinstall, bsdlabel, UFS, different boot options or FreeBSD software
> raid configuration files. All my FreeBSD machines are running pure ZFS,
> and I've only created temporary gpart, gstripe, etc. for tests so far.
> If I needed to put together a real software raid UFS system, I would
> need to look through documentation.
>
> > I did look at the handbook but it seems to have changed little and uses
> > sysinstall for the examples at:
> >
> >
> >
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/install-steps.html#SYSINSTALL-FDISK2
> >
> > Thanks for any suggestions.  I apologize for my ignorance.
> >
> > ed
> >
> > P.S.  I have wanted to understand and try things like the following
> comment
> > to the thread,  but I have no idea where to begin or options for doing
> it.
> >
> > Sorry, I wasnt suggesting that you should always mirror
> > the indiviudual partititons - just I happen to do that where
> > I am mixing ZFS and gmirror. Obviosuly you dont want to
> create
> > lots of little mirrors if you dont have to. But even with
> > one mirror, you can mirror a big partiton covering the whole
> > drive, and then carve that up with bsdlabel. No need to ever
> > mirror
> >  the actual raw discs, and it works with GPT.
>
> I think he means something like this, but I don't know how to use
> bsdlabel, so here is gpart (one of the new things you should learn
> anyway). And also I don't know if the result of my code below would even
> boot, which bootcode to use, or what to put in /boot/loader.conf to make
> it boot.
>
> # create a gpt table (rather than MBR)
> gpart create -s gpt da0
> #(not sure... boot loader needed outside the mirror?)
> gpart add -s 64k -l boot1 -t freebsd-boot da0
> # add a slice for your mirror (aka. partition).
> # from quote:  "But even with one mirror,"
> gpart add -s 80g -l mirrorslice1 -t mbr da0
>
> # set up the second disk
> gpart create -s gpt da1
> #(not sure... boot loader needed outside the mirror?)
> gpart add -s 64k -l boot2 -t freebsd-boot da1
> # from quote: "you can mirror *a big partiton* covering the whole drive"
> gpart add -s 80g -l mirrorslice2 -t mbr da1
>
> # Not sure if "mbr" is the right choice for the type above... I also
> tried guessing "gpt" which didn't work and is not in the manual.
>
> # create the mirror device (I don't know the proper way to do this... I
> just tried this and don't know if it persists on boot, etc.)
> # from quote: "*you can mirror* a big partiton covering the whole drive"
> gmirror load
> gmirror label mymirror gpt/mirrorslice1 gpt/mirrorslice2
>
> # slice the mirror device as if it was a regular disk
> # from quote: "and then carve that up with bsdlabel" (but I used gpart
> instead of bsdlabel)
> gpart create -s gpt gmirror/mymirror
> gpart add -s 1g -l root -t freebsd-ufs gmirror/mymirror
> gpart add -s 1g -l usr -t freebsd-ufs gmirror/mymirror
>
> newfs gpt/root
> newfs gpt/usr
>
>
>
> Further explanation of other things he said not involved in the above:
>
> "wasn't suggesting that you should always mirror the individual
> partititons"
> gpart create -s gpt da0
> gpart create -s gpt da1
> gpart add -s 1g -l root1 -t freebsd-ufs da0
> gpart add -s 1g -l root2 -t freebsd-ufs da1
> gpart add -s 1g -l usr1 -t freebsd-ufs da0
> gpart add -s 1g -l usr2 -t freebsd-ufs da1
> ...
> gmirror label rootmirror gpt/root1 gpt/root2
> gmirror label mymirror gpt/usr1 gpt/usr2
> ...
>
> #and an example of what he said you don't need to do:
> # from quote: "No need to ever mirror actual raw discs"
> gmirror label mymirror da0 da1
>
> raw disk = da0
>
> gpt slice/partition = da0p1 where 1 is the index seen in "gpart show"
> or
> gpt slice/partition = gpt/root1 where root1 is the label seen in "gpart
> show -l" and set in "gpart add ... -l labelhere"
>
> mirror device = mirror/mymirror
>
> etc.
>
> Be sure to align properly

RE: mpslsi0 : Trying sleep, but thread marked as sleeping prohibited

2012-02-24 Thread Bruce Evans


On Fri, 24 Feb 2012, Desai, Kashyap wrote:


From: Alexander Kabaev [mailto:kab...@gmail.com]
...
sleep locks are by definition unbound. There is no spinning, no priority
propagation. Holders are free to take, say, page faults and go to long
journey to disk and back, etc.


I understood your above lines.

Hardly the stuff _anyone_ would want to
do from interrupt handler, thread or otherwise.


So the way mps driver does in interrupt handler is as below.

mps_lock(sc);
mps_intr_locked(data);
mps_unlock(sc);

We hold the mtx lock in Interrupt handler and do whole bunch of work(this is 
bit lengthy work) under that.
It looks mps driver is miss using mtx_lock. Are we ?


No.  Most NIC drivers do this.

Lengthy work isn't as long as it used to be, and here the lock only locks
out other accesses to a single piece of hardware (provided sc is for a
single piece of hardware as it should be).  Worry instead about more
global locks, either in your driver or in upper layers.  You might need
one to lock your whole driver, and upper layers might need one to lock
things globally too.  Giant locking is an example of the latter.  I don't
trust the upper layers much, but for interrupt handling they can be trusted
to not have anything locked when the interrupt handler is called (except
for Giant locking when the driver requests this).  Also worry about your
interrupt handler taking too long -- although nothing except interrupt
thread priority prevents other code running, it is possible that other
code doesn't get enough (or any) cycles if an interrupt handler is too
hoggish.  This problem is smaller than when there was a single ~1 MHz
CPU doing PIO.  With multiple ~2GHz CPUs doing DMA, the interrupt handler
can often be 100 times sloppier without anyone noticing.  But not 1000
times, and not 100 times with certain hardware.

Bruce
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Inconsistent utx.active?

2012-02-24 Thread Ed Schouten

Hi Vlad,

> Has anyone else noticed erratic bookkeeping by utmpx in RELENG_9?

Would you mind explaining to me what you're seeing? It's hard for me to
fix bugs if I don't get proper reports.

-- 
 Ed Schouten 
 WWW: http://80386.nl/


pgpcKCladCJsr.pgp
Description: PGP signature

Re: random problem with 8.3 from yesterday

2012-02-24 Thread Andriy Gapon

on 24/02/2012 18:23 Ian Lepore said the following:
> I've always
> suspected something in the geom layer isn't noticing that a CF or SD
> card in the reader got removed/inserted/reformatted, and un-/re-plugging
> the whole reader (making the cam layer destroy and recreate the devices)
> makes geom aware of the change.

This is a fact, actually.  Nothing in GEOM layer (and below it) notices a silent
card change, since most hardware doesn't have any notification for the change
and FreeBSD disk stack doesn't do any polling for changes.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Inconsistent utx.active?

2012-02-24 Thread Ed Schouten

Hi Vlad,

* Vlad Galu , 20120224 23:35:
> Yes, sorry about that. I'm seeing stale (which sometimes turn into
> duplicate) entries when I log off and on again. The symptom seems to
> be exacerbated by unclean logouts (such as when my stateful corporate
> firewall kills my SSH sessions - I don't have keepalives active at
> either end).
> 
> In the example below, I'm actually logged on from IP address X.Y.Z.T,
> the first two entries belong to earlier sessions that have been long
> gone. The pts is the same, and the command displayed under WHAT is
> mirrored for all 3 entries.

Would you mind pasting the output of `getent utmpx active'?

Thanks,
-- 
 Ed Schouten 
 WWW: http://80386.nl/


pgpsQaKT9RQiy.pgp
Description: PGP signature

Re: Inconsistent utx.active?

2012-02-24 Thread Vlad Galu

On Friday, February 24, 2012 at 10:40 PM, Ed Schouten wrote:
> Hi Vlad,
> 
> * Vlad Galu mailto:d...@dudu.ro)>, 20120224 23:35:
> > Yes, sorry about that. I'm seeing stale (which sometimes turn into
> > duplicate) entries when I log off and on again. The symptom seems to
> > be exacerbated by unclean logouts (such as when my stateful corporate
> > firewall kills my SSH sessions - I don't have keepalives active at
> > either end).
> > 
> > In the example below, I'm actually logged on from IP address X.Y.Z.T,
> > the first two entries belong to earlier sessions that have been long
> > gone. The pts is the same, and the command displayed under WHAT is
> > mirrored for all 3 entries.
> 
> 
> 
> Would you mind pasting the output of `getent utmpx active'?
> 
Not at all, here it is:

-- cut here --
[1330014380.652067 -- Thu Feb 23 17:26:20 2012] user process: 
id="4f86d023f250d3c9" pid="39012" user="dudu" line="pts/0" host="A.B.C.D"
[1330014398.177818 -- Thu Feb 23 17:26:38 2012] user process: 
id="269d75b37f295346" pid="39221" user="dudu" line="pts/1" host="A.B.C.D"
[1330085459.796787 -- Fri Feb 24 13:10:59 2012] user process: 
id="d026e8e5c0648ec2" pid="38093" user="dudu" line="pts/0" host="A.B.C.D"
[1330122640.813570 -- Fri Feb 24 23:30:40 2012] user process: 
id="dd8d3dff2f3002a0" pid="82959" user="dudu" line="pts/0" host="X.Y.Z.T"
[1330122493.638088 -- Fri Feb 24 23:28:13 2012] user process: 
id="92b73279a543d99f" pid="73085" user="dudu" line="pts/1" host="X.Y.Z.T"
[1330122498.444614 -- Fri Feb 24 23:28:18 2012] user process: 
id="c0f3c404a3ca8565" pid="73573" user="dudu" line="pts/2" host="X.Y.Z.T"

[1330122634.538515 -- Fri Feb 24 23:30:34 2012] dead process: 
id="fea56df5dde26e4d" pid="76338"
-- and here -- 

The local time is UTC+1. The current (and only) bash PID (82986) is not even on 
that list. 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Inconsistent utx.active?

2012-02-24 Thread Vlad Galu

On Friday, February 24, 2012 at 10:15 PM, Ed Schouten wrote:

> Hi Vlad,
> 
> > Has anyone else noticed erratic bookkeeping by utmpx in RELENG_9?
> 
> Would you mind explaining to me what you're seeing? It's hard for me to
> fix bugs if I don't get proper reports.
> 
> -- 
> Ed Schouten mailto:e...@80386.nl)>
> WWW: http://80386.nl/

Hi Ed,

Yes, sorry about that. I'm seeing stale (which sometimes turn into duplicate) 
entries when I log off and on again. The symptom seems to be exacerbated by 
unclean logouts (such as when my stateful corporate firewall kills my SSH 
sessions - I don't have keepalives active at either end).

In the example below, I'm actually logged on from IP address X.Y.Z.T, the first 
two entries belong to earlier sessions that have been long gone. The pts is the 
same, and the command displayed under WHAT is mirrored for all 3 entries.

-- cut here --
dudu@joint ~ $ w
11:30PM up 2 days, 6:17, 3 users, load averages: 0.00, 0.00, 0.00
USER TTY FROM LOGIN@ IDLE WHAT
dudu pts/0 A.B.C.D Thu05PM - w
dudu pts/0 A.B.C.D 1:10PM - w
dudu pts/0 X.Y.Z.T  11:30PM - w
dudu@joint ~ $ ps ax
PID TT STAT TIME COMMAND
82986 0 SJ 0:00.00 -bash (bash)
83323 0 R+J 0:00.00 ps ax
dudu@joint ~ $ 

-- and here --

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Inconsistent utx.active?

2012-02-24 Thread Ed Schouten

Hello Vlad,

* Vlad Galu , 20120224 23:54:
> [1330014380.652067 -- Thu Feb 23 17:26:20 2012] user process: 
> id="4f86d023f250d3c9" pid="39012" user="dudu" line="pts/0" host="A.B.C.D"
> [1330014398.177818 -- Thu Feb 23 17:26:38 2012] user process: 
> id="269d75b37f295346" pid="39221" user="dudu" line="pts/1" host="A.B.C.D"
> [1330085459.796787 -- Fri Feb 24 13:10:59 2012] user process: 
> id="d026e8e5c0648ec2" pid="38093" user="dudu" line="pts/0" host="A.B.C.D"
> [1330122640.813570 -- Fri Feb 24 23:30:40 2012] user process: 
> id="dd8d3dff2f3002a0" pid="82959" user="dudu" line="pts/0" host="X.Y.Z.T"
> [1330122493.638088 -- Fri Feb 24 23:28:13 2012] user process: 
> id="92b73279a543d99f" pid="73085" user="dudu" line="pts/1" host="X.Y.Z.T"
> [1330122498.444614 -- Fri Feb 24 23:28:18 2012] user process: 
> id="c0f3c404a3ca8565" pid="73573" user="dudu" line="pts/2" host="X.Y.Z.T"
> [1330122634.538515 -- Fri Feb 24 23:30:34 2012] dead process: 
> id="fea56df5dde26e4d" pid="76338"

You mentioned in a previous email that these entries belong to SSH
sessions. Are you sure about this? The identifiers seem to contain
randomly generated data, just like pam_lastlog(8) does. OpenSSH uses
identifiers based on the TTY name, like so:

> [1330124273.955165 -- Fri Feb 24 23:57:53 2012] user process: 
> id="7074732f3000" pid="15880" user="ed" line="pts/0" host="m.fxq.nl"

0x7074732f30 is equal to "pts/0".

Maybe they're generated by some different login service or you've
configured PAM/OpenSSH/etc. in a non-default way?

Thanks so far,
-- 
 Ed Schouten 
 WWW: http://80386.nl/


pgplN0Q8HgmxK.pgp
Description: PGP signature

Re: Inconsistent utx.active?

2012-02-24 Thread Vlad Galu

On Friday, February 24, 2012 at 11:00 PM, Ed Schouten wrote:
> Hello Vlad,
> 
> * Vlad Galu mailto:d...@dudu.ro)>, 20120224 23:54:
> > [1330014380.652067 -- Thu Feb 23 17:26:20 2012] user process: 
> > id="4f86d023f250d3c9" pid="39012" user="dudu" line="pts/0" host="A.B.C.D"
> > [1330014398.177818 -- Thu Feb 23 17:26:38 2012] user process: 
> > id="269d75b37f295346" pid="39221" user="dudu" line="pts/1" host="A.B.C.D"
> > [1330085459.796787 -- Fri Feb 24 13:10:59 2012] user process: 
> > id="d026e8e5c0648ec2" pid="38093" user="dudu" line="pts/0" host="A.B.C.D"
> > [1330122640.813570 -- Fri Feb 24 23:30:40 2012] user process: 
> > id="dd8d3dff2f3002a0" pid="82959" user="dudu" line="pts/0" host="X.Y.Z.T"
> > [1330122493.638088 -- Fri Feb 24 23:28:13 2012] user process: 
> > id="92b73279a543d99f" pid="73085" user="dudu" line="pts/1" host="X.Y.Z.T"
> > [1330122498.444614 -- Fri Feb 24 23:28:18 2012] user process: 
> > id="c0f3c404a3ca8565" pid="73573" user="dudu" line="pts/2" host="X.Y.Z.T"
> > [1330122634.538515 -- Fri Feb 24 23:30:34 2012] dead process: 
> > id="fea56df5dde26e4d" pid="76338"
> 
> 
> 
> You mentioned in a previous email that these entries belong to SSH
> sessions. Are you sure about this? The identifiers seem to contain
> randomly generated data, just like pam_lastlog(8) does. OpenSSH uses
> identifiers based on the TTY name, like so:
> 
> > [1330124273.955165 -- Fri Feb 24 23:57:53 2012] user process: 
> > id="7074732f3000" pid="15880" user="ed" line="pts/0" host="m.fxq.nl 
> > (http://m.fxq.nl)"
> 
> 0x7074732f30 is equal to "pts/0".
> 
> Maybe they're generated by some different login service or you've
> configured PAM/OpenSSH/etc. in a non-default way?
> 

Sigh, you are right. I had UseLogin set to yes in sshd_config. Sorry for the 
noise and thanks!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: random problem with 8.3 from yesterday

2012-02-24 Thread Erich Dollansky

On Friday 24 February 2012 23:23:53 Ian Lepore wrote:
> On Fri, 2012-02-24 at 13:50 +0700, Erich Dollansky wrote:
> > Hi,
> > 
Hi,

> > On Thursday 23 February 2012 20:22:57 Stefan Bethke wrote:
> > > Am 22.02.2012 um 07:34 schrieb Erich Dollansky:

> > > > tunefs -L NewDeviceName /dev/da0a
> > > > 
> > > > Either this call or the mount command does not work randomly.
> > > > 
> > > > When I then try to mount the device on /dev/da0a it does not work 
> > > > always.
> > > > 
> > > -- 
> > > Stefan BethkeFon +49 151 14070811
> 
> I've been putting up with problems like this since first upgrading to
> 8.2.  I guess I haven't dug deeper into them because it's actually a
> huge improvement from what I was used to in 6.x and 7.x where complete
> system lockups were more common with removable usb drives.  Here's an
> example sequence that just happened to me with a compact flash card in a
> usb multi-card reader...
> 
I was lucky then. Since 8.0, these problems disappeared for me and came back 
only with 8.3.

> revolution # mount /dev/da0s1a /mnt
> mount: /dev/da0s1a : Invalid argument
> revolution # fsck -y /dev/da0s1a
> fsck: Could not determine filesystem type
> revolution # fsck -t ufs -y /dev/da0s1a
> ** /dev/da0s1a
> Cannot find file system superblock
> ioctl (GCINFO): Inappropriate ioctl for device
> fsck_ufs: /dev/da0s1a: can't read disk label
> 
> At this point I unplug the multi-card reader and plug it back in.
> 
> revolution # fsck -y /dev/da0s1a
> fsck: Could not determine filesystem type

Yes, this are some of the messages. They changed randomly for me.

> 
> I'm not sure whether or not this is related to the problem Erich
> originally reported, but there are some similarities in symptoms such as
> the inability to recognize the filesystem type, so I thought I'd mention

It is the same what happened to me.

Erich
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: random problem with 8.3 from yesterday

2012-02-24 Thread Ian Lepore

On Sat, 2012-02-25 at 00:39 +0200, Andriy Gapon wrote:
> on 24/02/2012 18:23 Ian Lepore said the following:
> > I've always
> > suspected something in the geom layer isn't noticing that a CF or SD
> > card in the reader got removed/inserted/reformatted, and un-/re-plugging
> > the whole reader (making the cam layer destroy and recreate the devices)
> > makes geom aware of the change.
> 
> This is a fact, actually.  Nothing in GEOM layer (and below it) notices a 
> silent
> card change, since most hardware doesn't have any notification for the change
> and FreeBSD disk stack doesn't do any polling for changes.
> 

If the hardware did have change notification, is there a mechanism that
would communicate that to geom?  That's a precursor question to my real
question:  is there a way to manually kick geom when necessary?  If the
api exists but there's no userland app to make the needed calls, I'll
write some code -- just point me at a manpage or header file.

-- Ian

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Regression in 8.2-STABLE bge code (from 7.4-STABLE)

2012-02-24 Thread YongHyeon PYUN

On Thu, Feb 23, 2012 at 09:46:20AM -0500, John Baldwin wrote:
> On Tuesday, February 14, 2012 7:56:00 pm YongHyeon PYUN wrote:
> > On Sat, Jan 28, 2012 at 09:24:53PM -0500, Michael L. Squires wrote:
> > 
> > Sorry for late reply.  Had been busy due to relocation.
> > 
> > > There is a bug in the Tyan S4881/S4882 PCI-X bridges that was fixed with 
> > > a 
> > > patch in 7.x (thank you very much).  This patch is not present in the 
> > > 8.2-STABLE code and the symptoms (watchdog timeouts) have recurred.
> > > 
> > 
> > Hmm, I thought the mailbox reordering bug was avoided by limiting
> > DMA address space to 32bits but it seems it was not right workaround
> > for AMD 8131 PCI-X Bridge.
> > 
> > > The watchdog timeouts do not appear to be present after I switched to an 
> > > Intel gigabit PCI-X card.
> > > 
> > > I did a brute-force patch of the 8.2-STABLE bge code using the patches for
> > > 7.4-STABLE; the resulting code compiled and, other than odd behavior at
> > > startup, seems to be working normally.
> > > 
> > > This is using FreeBSD 8.2-STABLE amd64; I don't know what happens with 
> > > i386.
> > > 
> > > Given the age of the boards it may be easier if I just continue using the
> > > Intel gigabit card but am happy to test anything that comes my way.
> > > 
> > 
> > Try attached patch and let me know how it goes.
> > I didn't enable 64bit DMA addressing though. I think the AMD-8131
> > PCI-X bridge needs both workarounds.
> 
> Eh, please don't do the thing where you walk all pcib devices.  Instead, walk 
> up the tree like so:
> 
> static int
> bge_mbox_reorder(struct bge_softc *sc)
> {
>   devclass_t pcib, pci;
>   device_t dev, bus;
> 
>   pci = devclass_find("pci");
>   pcib = devclass_find("pcib");
>   dev = sc->dev;
>   bus = device_get_parent(dev);
>   for (;;) {
>   dev = device_get_parent(bus);
>   bus = device_get_parent(dev);
>   if (device_get_devclass(dev) != pcib_devclass ||
>   device_get_devclass(bus) != pci_devclass)
>   break;
>   /* Probe device ID. */
>   }
>   return (0);
> }
> 
> It is not safe to use pci_get_vendor() with non-PCI devices (you may get
> random junk, and Host-PCI bridges are not PCI devices).  Also, this will only
> apply the quirk if a relevant bridge is in the bge device's path.
> 

Thanks for reviewing and suggestion.
Would you review updated one?
Index: sys/dev/bge/if_bgereg.h
===
--- sys/dev/bge/if_bgereg.h (revision 232144)
+++ sys/dev/bge/if_bgereg.h (working copy)
@@ -2828,6 +2828,7 @@
 #defineBGE_FLAG_RX_ALIGNBUG0x0400
 #defineBGE_FLAG_SHORT_DMA_BUG  0x0800
 #defineBGE_FLAG_4K_RDMA_BUG0x1000
+#defineBGE_FLAG_MBOX_REORDER   0x2000
uint32_tbge_phy_flags;
 #defineBGE_PHY_NO_WIRESPEED0x0001
 #defineBGE_PHY_ADC_BUG 0x0002
Index: sys/dev/bge/if_bge.c
===
--- sys/dev/bge/if_bge.c(revision 232144)
+++ sys/dev/bge/if_bge.c(working copy)
@@ -380,6 +380,8 @@
 static int bge_dma_ring_alloc(struct bge_softc *, bus_size_t, bus_size_t,
 bus_dma_tag_t *, uint8_t **, bus_dmamap_t *, bus_addr_t *, const char *);
 
+static int bge_mbox_reorder(struct bge_softc *);
+
 static int bge_get_eaddr_fw(struct bge_softc *sc, uint8_t ether_addr[]);
 static int bge_get_eaddr_mem(struct bge_softc *, uint8_t[]);
 static int bge_get_eaddr_nvram(struct bge_softc *, uint8_t[]);
@@ -635,6 +637,8 @@
off += BGE_LPMBX_IRQ0_HI - BGE_MBX_IRQ0_HI;
 
CSR_WRITE_4(sc, off, val);
+   if ((sc->bge_flags & BGE_FLAG_MBOX_REORDER) != 0)
+   CSR_READ_4(sc, off);
 }
 
 /*
@@ -2609,8 +2613,8 @@
 * XXX
 * watchdog timeout issue was observed on BCM5704 which
 * lives behind PCI-X bridge(e.g AMD 8131 PCI-X bridge).
-* Limiting DMA address space to 32bits seems to address
-* it.
+* Both limiting DMA address space to 32bits and flushing
+* mailbox write seem to address the issue.
 */
if (sc->bge_flags & BGE_FLAG_PCIX)
lowaddr = BUS_SPACE_MAXADDR_32BIT;
@@ -2775,6 +2779,47 @@
 }
 
 static int
+bge_mbox_reorder(struct bge_softc *sc)
+{
+   /* Lists of PCI bridges that are known to reorder mailbox writes. */
+   static const struct mbox_reorder {
+   const uint16_t vendor;
+   const uint16_t device;
+   const char *desc;
+   } const mbox_reorder_lists[] = {
+   { 0x1022, 0x7450, "AMD-8131 PCI-X Bridge" },
+   };
+   devclass_t pci, pcib;
+   device_t bus, dev;
+   int count, i;
+
+   count = sizeof(mbox_reorder_lists) / sizeof(mbox_reorder_list

geom vs. removable disks/cards (was: Re: random problem with 8.3 from yesterday)

2012-02-24 Thread Juergen Lock

In article <1330126840.7317.60.ca...@revolution.hippie.lan> you write:
>On Sat, 2012-02-25 at 00:39 +0200, Andriy Gapon wrote:
>> on 24/02/2012 18:23 Ian Lepore said the following:
>> > I've always
>> > suspected something in the geom layer isn't noticing that a CF or SD
>> > card in the reader got removed/inserted/reformatted, and un-/re-plugging
>> > the whole reader (making the cam layer destroy and recreate the devices)
>> > makes geom aware of the change.
>> 
>> This is a fact, actually.  Nothing in GEOM layer (and below it) notices a 
>> silent
>> card change, since most hardware doesn't have any notification for the change
>> and FreeBSD disk stack doesn't do any polling for changes.
>> 
>
>If the hardware did have change notification, is there a mechanism that
>would communicate that to geom?  That's a precursor question to my real
>question:  is there a way to manually kick geom when necessary?  If the
>api exists but there's no userland app to make the needed calls, I'll
>write some code -- just point me at a manpage or header file.

scsi has a mechanism called unit attention to report things like
media changes, not sure usb devices use that tho since the host can
only poll them...

 Anyway, the usual workaround is to force a geom retaste by opening
the device for writing without actually writing anything, e.g.:

# : >/dev/da0

 Btw this can't be Erich's problem I'd say since he said he's
plugging in a thumbdrive not a card into a reader (and also writing
/dev/zero to it) so geom _should_ already taste it.  (Unless the
write fails since the thumbdrive is too slow initializing or something
like that...)

 HTH,
Juergen
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD9 and the sheer number of problem reports

Re: FreeBSD9 and the sheer number of problem reports

Re: FreeBSD9 and the sheer number of problem reports

Another ZFS ARC memory question

Re: another panic in 8.3-PRERELEASE

Re: another panic in 8.3-PRERELEASE

Re: random problem with 8.3 from yesterday

Re: Regression in 8.2-STABLE bge code (from 7.4-STABLE)

Re: mpslsi0 : Trying sleep, but thread marked as sleeping prohibited

Re: another panic in 8.3-PRERELEASE

Re: The "New BSD Installer" thread has shown me that I am totally obsolete in disk partitioning.

RE: mpslsi0 : Trying sleep, but thread marked as sleeping prohibited

Re: Inconsistent utx.active?

Re: random problem with 8.3 from yesterday

Re: Inconsistent utx.active?

Re: Inconsistent utx.active?

Re: Inconsistent utx.active?

Re: Inconsistent utx.active?

Re: Inconsistent utx.active?

Re: random problem with 8.3 from yesterday

Re: random problem with 8.3 from yesterday

Re: Regression in 8.2-STABLE bge code (from 7.4-STABLE)

geom vs. removable disks/cards (was: Re: random problem with 8.3 from yesterday)

23 matches

Site Navigation

Mail list logo

Footer information