Re: nscd not caching

2014-08-18 Thread Eggert, Lars
Hi,

On 2014-8-17, at 18:10, Adam McDougall  wrote:
> We were using +: type entries in the local password and group
> tables and I believe we used an unmodified /etc/nsswitch.conf (excluding
> cache lines while testing nscd):

I tried that setup too, and it doesn't seem to be caching any NIS lookups 
either.

The current NIS server is 25ms away, which is a pain. I'm trying to get a local 
slave set up, which will make the need for nscd go away, but it would sure be 
nice if it worked in the meantime.

> At our site, we never had enough load to outright require nscd on
> FreeBSD, although there were some areas where caching had a usability
> benefit.

Load is not an issue, latency is (see above).

>  top was slow to open since it would load the whole passwd
> table first, but top -u was a workaround.

Right, I see that issue too.

> As a workaround until we retired NIS, I wrote a hack of a script to
> merge NIS groups into my local /etc/group files periodically from cron.
> Aside from bugs in my script, that worked well.

I may end up doing this, too.

Given all this, maybe it's time to retire nscd?

Lars




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: nscd not caching

2014-08-18 Thread Stefan Esser
Am 17.08.2014 um 18:10 schrieb Adam McDougall:
> On 08/17/2014 09:09, Eggert, Lars wrote:
>> Nobody using nscd? Really?
> 
> I would test for you, but we retired our NIS infrastructure at least a
> year ago.  I did have it working on a test client at some point, but I
> didn't push it into production because I found a couple issues (below).
[...]
> The two main problems I recall were nscd making java crash, and nscd
> holding on to negative cache lookups too long, causing failures while
> installing ports that depend on adding users/groups for a following file
> permission change.  I can't remember if the latter issue was fixed at
> some point.  I also can't remember if I was receiving perfectly accurate
> results from the cache either.

I added the "negative-confidence-threshold" option to nscd, a few
years ago.  If set to a number > 1 (the default), then that number
of failures are required to cause a negative cache entry.  Setting
this value to 3 should allow for 2 probes for the presence of a UID
or username, before the cache returns a failure without bothering
to re-check the source.  The value should be low enough to prevent
flooding of a remote source with requests, if an entry really does
not exist.

The default was left unchanged - you need to increase the value to
see any effect of this threshold.  3 might be a reasonable default
for the user database.  But I never bothered to suggest and discuss
an increased default value on the mail-lists ...

[...]
> I dabbled with nscd a bit after we switched from NIS to LDAP.  I think I
> recall lookups being slightly slower WITH the cache, plus I would get
> some duplicated group entries returned on all but the first getent
> group.  The short version is we in no way seem to benefit or require a
> cache of LDAP with our site size, so I'm just not using nscd.  I didn't
> make bug reports for these issues, I had to prioritize towards more
> pressing issues.  I'm trying to do better about reporting bugs.

I also found that there were glitches, when I tested the extension
to cache only the nth negative reply. The code is not easy to read
and change (IMHO), and I did not succeed when I tried to reproduce
and debug these glitches.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: pmap active 0xfffff8002d2ae9f8

2014-08-18 Thread Konstantin Belousov
On Fri, Aug 15, 2014 at 10:38:25PM -0500, Bryan Drewery wrote:
> On 2014-08-13 10:38, Bryan Drewery wrote:
> > On 6/24/2014 4:28 PM, Craig Rodrigues wrote:
> >> Hi,
> >> 
> >> I have a system running CURRENT at r266925 from May 31.
> >> 
> >> While doing some software builds using poudriere, the system
> >> panicked.  Unfortunately this system was not configured with
> >> swap space, so I cannot do a kernel dump.
> >> 
> >> The system is currently at the ddb prompt.
> >> Here is the backtrace:
> >> 
> >> 
> >> Here is the backtrace from ddb:
> >> 
> >> panic: pmap active 0xf8002d2ae9f8
> >> cpuid = 5
> >> KDB: stack backtrace:
> >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> >> 0xfe183958a7d0
> >> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe183958a880
> >> vpanic() at vpanic+0x126/frame 0xfe183958a8c0
> >> kassert_panic() at kassert_panic+0x139/frame 0xfe183958a930
> >> pmap_remove_pages() at pmap_remove_pages+0x8c/frame 0xfe183958aa20
> >> vmspace_exit() at vmspace_exit+0xa1/frame 0xfe183958aa60
> >> exit1() at exit1+0x541/frame 0xfe183958aad0
> >> sys_sys_exit() at sys_sys_exit+0xe/frame 0xfe183958aae0
> >> amd64_syscall() at amd64_syscall+0x25a/frame 0xfe183958abf0
> >> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe183958abf0
> >> --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip - 0x800b195aa, rsp -
> >> 0x7ffe3e8, rbp = 0x7e400
> >> KDB: enter: panic
> >> [ thread pid 94762 tid 101570 ]
> >> Stopped at   kdb_enter+0x3e: movq$0.kdb_why
> >> db>
> >> 
> >> 
> >> Is this a known problem?
> >> Are there other commands I should type at the ddb prompt?
> >> --
> >> Craig
> > 
> > I have run into this as well on r269147:
> > 
> >> panic: pmap active 0xf80035f422f8
> >> cpuid = 10
> >> KDB: stack backtrace:
> >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> >> 0xfe124852b7d0
> >> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe124852b880
> >> vpanic() at vpanic+0x126/frame 0xfe124852b8c0
> >> kassert_panic() at kassert_panic+0x139/frame 0xfe124852b930
> >> pmap_remove_pages() at pmap_remove_pages+0x8c/frame 0xfe124852ba20
> >> vmspace_exit() at vmspace_exit+0x9c/frame 0xfe124852ba60
> >> exit1() at exit1+0x541/frame 0xfe124852bad0
> >> sys_sys_exit() at sys_sys_exit+0xe/frame 0xfe124852bae0
> >> ia32_syscall() at ia32_syscall+0x270/frame 0xfe124852bbf0
> >> Xint0x80_syscall() at Xint0x80_syscall+0x95/frame 0xfe124852bbf0
> >> --- syscall (1, FreeBSD ELF32, sys_sys_exit), rip = 0x297e386f, rsp = 
> >> 0xd7ac, rbp = 0xd7b8 ---
> >> KDB: enter: panic
> >> [ thread pid 85335 tid 101517 ]
> >> Stopped at  kdb_enter+0x3e: movq$0,kdb_why
> >> db> call doadump
> >> 
> >> Dump failed. Partition too small.
> >> = 0
> 
> Got it again on recent r269950 while building with poudriere:
> 
> panic: pmap active 0xf8113c3c6d78
> cpuid = 10
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> 0xfe1248acc7d0
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe1248acc880
> vpanic() at vpanic+0x126/frame 0xfe1248acc8c0
> kassert_panic() at kassert_panic+0x139/frame 0xfe1248acc930
> pmap_remove_pages() at pmap_remove_pages+0x8c/frame 0xfe1248acca20
> vmspace_exit() at vmspace_exit+0x9c/frame 0xfe1248acca60
> exit1() at exit1+0x541/frame 0xfe1248accad0
> sys_sys_exit() at sys_sys_exit+0xe/frame 0xfe1248accae0
> amd64_syscall() at amd64_syscall+0x25a/frame 0xfe1248accbf0
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe1248accbf0
> --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x80387fadc, rsp = 
> 0x7fffd4e8, rbp = 0x7fffd5a0 ---
> KDB: enter: panic
> [ thread pid 84433 tid 101503 ]
> Stopped at  kdb_enter+0x3e: movq$0,kdb_why
> db> call doadump
> 
> Dump failed. Partition too small.
> = 0

The interesting information is pmap->pm_active, for pmap address reported
by the panic.  Easiest way to get the active mask is using kgdb on vmcore.


pgpV6mYjeYg0S.pgp
Description: PGP signature


DEADLKRES crash

2014-08-18 Thread Larry Rosenman
I got the following:

borg.lerctr.org dumped core - see /var/crash/vmcore.8

Mon Aug 18 07:30:42 CDT 2014

FreeBSD borg.lerctr.org 11.0-CURRENT FreeBSD 11.0-CURRENT #63 r269784M: Sun Aug 
10 12:33:07 CDT 2014 r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  amd64

panic: deadlkres: possible deadlock detected for 0xf8002abeb000, blocked 
for 1800926 ticks

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: deadlkres: possible deadlock detected for 0xf8002abeb000, blocked 
for 1800926 ticks

cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe100bff1a10
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe100bff1ac0
vpanic() at vpanic+0x126/frame 0xfe100bff1b00
panic() at panic+0x43/frame 0xfe100bff1b60
deadlkres() at deadlkres+0x35c/frame 0xfe100bff1bb0
fork_exit() at fork_exit+0x84/frame 0xfe100bff1bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe100bff1bf0
--- trap 0, rip = 0, rsp = 0xfe100bff1cb0, rbp = 0 ---
Uptime: 7d14h14m38s
Dumping 9124 out of 64463 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/if_lagg.ko.symbols...done.
Loaded symbols for /boot/kernel/if_lagg.ko.symbols
Reading symbols from /boot/kernel/snd_envy24ht.ko.symbols...done.
Loaded symbols for /boot/kernel/snd_envy24ht.ko.symbols
Reading symbols from /boot/kernel/snd_spicds.ko.symbols...done.
Loaded symbols for /boot/kernel/snd_spicds.ko.symbols
Reading symbols from /boot/kernel/coretemp.ko.symbols...done.
Loaded symbols for /boot/kernel/coretemp.ko.symbols
Reading symbols from /boot/kernel/ichsmb.ko.symbols...done.
Loaded symbols for /boot/kernel/ichsmb.ko.symbols
Reading symbols from /boot/kernel/smbus.ko.symbols...done.
Loaded symbols for /boot/kernel/smbus.ko.symbols
Reading symbols from /boot/kernel/ichwd.ko.symbols...done.
Loaded symbols for /boot/kernel/ichwd.ko.symbols
Reading symbols from /boot/kernel/cpuctl.ko.symbols...done.
Loaded symbols for /boot/kernel/cpuctl.ko.symbols
Reading symbols from /boot/kernel/crypto.ko.symbols...done.
Loaded symbols for /boot/kernel/crypto.ko.symbols
Reading symbols from /boot/kernel/cryptodev.ko.symbols...done.
Loaded symbols for /boot/kernel/cryptodev.ko.symbols
Reading symbols from /boot/kernel/dtraceall.ko.symbols...done.
Loaded symbols for /boot/kernel/dtraceall.ko.symbols
Reading symbols from /boot/kernel/profile.ko.symbols...done.
Loaded symbols for /boot/kernel/profile.ko.symbols
Reading symbols from /boot/kernel/cyclic.ko.symbols...done.
Loaded symbols for /boot/kernel/cyclic.ko.symbols
Reading symbols from /boot/kernel/dtrace.ko.symbols...done.
Loaded symbols for /boot/kernel/dtrace.ko.symbols
Reading symbols from /boot/kernel/systrace_freebsd32.ko.symbols...done.
Loaded symbols for /boot/kernel/systrace_freebsd32.ko.symbols
Reading symbols from /boot/kernel/systrace.ko.symbols...done.
Loaded symbols for /boot/kernel/systrace.ko.symbols
Reading symbols from /boot/kernel/sdt.ko.symbols...done.
Loaded symbols for /boot/kernel/sdt.ko.symbols
Reading symbols from /boot/kernel/lockstat.ko.symbols...done.
Loaded symbols for /boot/kernel/lockstat.ko.symbols
Reading symbols from /boot/kernel/fasttrap.ko.symbols...done.
Loaded symbols for /boot/kernel/fasttrap.ko.symbols
Reading symbols from /boot/kernel/fbt.ko.symbols...done.
Loaded symbols for /boot/kernel/fbt.ko.symbols
Reading symbols from /boot/kernel/dtnfscl.ko.symbols...done.
Loaded symbols for /boot/kernel/dtnfscl.ko.symbols
Reading symbols from /boot/kernel/dtmalloc.ko.symbols...done.
Loaded symbols for /boot/kernel/dtmalloc.ko.symbols
Reading symbols from /boot/modules/vboxdrv.ko...done.
Loaded symbols for /boot/modules/vboxdrv.ko
Reading symbols from /boot/modules/nvidia.ko...done.
Loaded symbols for /boot/modules/nvidia.ko
Reading symbols from /boot/kernel/ipmi.ko.symbols...done.
Loaded symbols for /boot/kernel/ipmi.ko.symbols
Reading symbols from /boot/kernel/ipmi_linux.ko.symbols...done.
Loaded symbols for /boot/kernel/ipmi_linux.ko.symbols
Reading symbols from /boot/kernel/radeonkms.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkms.ko.symbols
Reading symbols from /boot/kernel/iicbb.ko.symbols...done.
Loaded symbols for /boot/kernel/iicbb.ko.symbols
Reading symbols from /boot/kernel/iicbus.ko.symbols...done.
Loaded symbols for /boot/kernel/iicbus.ko.symbols
Reading symbols from /boot/kernel/iic.ko.symbols...done.
Loaded symbols for /boot/kernel/iic.ko.symbols
Reading symbols from /boot/kernel/d

Re: DEADLKRES crash

2014-08-18 Thread Benjamin Kaduk
On Mon, 18 Aug 2014, Larry Rosenman wrote:

> I got the following:
>
> borg.lerctr.org dumped core - see /var/crash/vmcore.8
>
> Mon Aug 18 07:30:42 CDT 2014
>
> FreeBSD borg.lerctr.org 11.0-CURRENT FreeBSD 11.0-CURRENT #63 r269784M: Sun 
> Aug 10 12:33:07 CDT 2014 r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER 
>  amd64
>
> panic: deadlkres: possible deadlock detected for 0xf8002abeb000,
> blocked for 1800926 ticks
[...]
> Current language:  auto; currently minimal
> (kgdb)
>
> What info do folks need?

Most useful would be the "show alllocks" from DDB.  Getting the lock
information from kgdb is rather more annoying, though jhb has some gdb
scripts which help. (http://people.freebsd.org/~jhb/gdb/)  I guess the
'allchains' command from gdb6 is the one in question, but it's been a
while since I tried to use these scripts.

-Ben
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: DEADLKRES crash

2014-08-18 Thread Larry Rosenman

On 2014-08-18 12:56, Benjamin Kaduk wrote:

On Mon, 18 Aug 2014, Larry Rosenman wrote:


I got the following:

borg.lerctr.org dumped core - see /var/crash/vmcore.8

Mon Aug 18 07:30:42 CDT 2014

FreeBSD borg.lerctr.org 11.0-CURRENT FreeBSD 11.0-CURRENT #63 
r269784M: Sun Aug 10 12:33:07 CDT 2014 
r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  amd64


panic: deadlkres: possible deadlock detected for 0xf8002abeb000,
blocked for 1800926 ticks

[...]

Current language:  auto; currently minimal
(kgdb)

What info do folks need?


Most useful would be the "show alllocks" from DDB.  Getting the lock
information from kgdb is rather more annoying, though jhb has some gdb
scripts which help. (http://people.freebsd.org/~jhb/gdb/)  I guess the
'allchains' command from gdb6 is the one in question, but it's been a
while since I tried to use these scripts.

-Ben
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 
"freebsd-current-unsubscr...@freebsd.org"
after source'ing the above, and typing allchains, it sits and spins, but 
no output... :(


Ideas?

I **CAN** give SSH access


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
US Mail: 108 Turvey Cove, Hutto, TX 78634-5688
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: nscd not caching

2014-08-18 Thread John-Mark Gurney
Eggert, Lars wrote this message on Mon, Aug 18, 2014 at 07:42 +:
> The current NIS server is 25ms away, which is a pain. I'm trying to get a 
> local slave set up, which will make the need for nscd go away, but it would 
> sure be nice if it worked in the meantime.

Why not run a local slave on your server?

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Inconsistent behavior with dd(1)

2014-08-18 Thread William Orr
Reply inline.

On 08/16/2014 10:34 AM, John-Mark Gurney wrote:
> Alan Somers wrote this message on Fri, Aug 15, 2014 at 10:42 -0600:
>> On Thu, Aug 14, 2014 at 11:55 PM, William Orr  wrote:
>>> Hey,
>>>
>>> I found some inconsistent behavior with dd(1) when it comes to specifying 
>>> arguments in -CURRENT.
>>>
>>>  [ worr on terra ] ( ~ ) % dd if=/dev/zero of=/dev/null 
>>> count=18446744073709551616
>>> dd: count: Result too large
>>>  [ worr on terra ] ( ~ ) % dd if=/dev/zero of=/dev/null 
>>> count=18446744073709551617
>>> dd: count: Result too large
>>>  [ worr on terra ] ( ~ ) % dd if=/dev/zero of=/dev/null 
>>> count=18446744073709551615
>>> dd: count cannot be negative
>>>  [ worr on terra ] ( ~ ) % dd if=/dev/zero of=/dev/null 
>>> count=-18446744073709551615
>>> 1+0 records in
>>> 1+0 records out
>>> 512 bytes transferred in 0.000373 secs (1373071 bytes/sec)
>>>  [ worr on terra ] ( ~ ) % dd if=/dev/zero of=/dev/null count=-1
>>> dd: count cannot be negative
>>>
>>> ???
>>>
>>> Any chance someone has the time and could take a look? 
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191263
>>>
>>> Thanks,
>>> William Orr
>>>
>>> ???
>>
>>
>> IMHO, this is a bug in strtouq(3), not in dd(1).  Why should it parse
>> negative numbers at all, when there is stroq(3) for that purpose?  The
>> standard is clear that it must, though.  Oddly enough, stroq would
>> probably not accept -18446744073709551615, even though strtouq does.
>> Specific comments on your patch below:
>>
>>
>>>
>>> Here???s the patch:
>>>
>>> Index: bin/dd/args.c
>>> ===
>>> --- bin/dd/args.c   (revision 267712)
>>> +++ bin/dd/args.c   (working copy)
>>> @@ -186,46 +186,31 @@
>>>  static void
>>>  f_bs(char *arg)
>>>  {
>>> -   uintmax_t res;
>>> -
>>> -   res = get_num(arg);
>>> -   if (res < 1 || res > SSIZE_MAX)
>>> -   errx(1, "bs must be between 1 and %jd", 
>>> (intmax_t)SSIZE_MAX);
>>> -   in.dbsz = out.dbsz = (size_t)res;
>>> +   in.dbsz = out.dbsz = get_num(arg);
>>> +   if (in.dbsz < 1 || out.dbsz < 1)
>>
>> Why do you need to check both in and out?  Aren't they the same?
>> Also, you eliminated the check for overflowing SSIZE_MAX.  That's not
>> ok, because these values get passed to places that expect signed
>> numbers, for example in dd.c:303.
> 
> The type of dbsz is size_t, so really:
> 
>>> +   errx(1, "bs must be between 1 and %ju", (uintmax_t)-1);
> 
> This should be SIZE_MAX, except there isn't a define for this?  So maybe
> the code really should be:
>   (uintmax_t)(size_t)-1
> 
> to get the correct value for SIZE_MAX...
> 
> Otherwise on systems that uintmax_t is >32bits and size_t is 32bits,
> the error message will be wrong...

Yes, this should probably be SIZE_MAX rather than that cast. Same with
the others

> 
>>>  }
>>>
>>>  static void
>>>  f_cbs(char *arg)
>>>  {
>>> -   uintmax_t res;
>>> -
>>> -   res = get_num(arg);
>>> -   if (res < 1 || res > SSIZE_MAX)
>>> -   errx(1, "cbs must be between 1 and %jd", 
>>> (intmax_t)SSIZE_MAX);
>>> -   cbsz = (size_t)res;
>>> +   cbsz = get_num(arg);
>>> +   if (cbsz < 1)
>>> +   errx(1, "cbs must be between 1 and %ju", (uintmax_t)-1);
>>>  }
>>
>> Again, you eliminated the check for SSIZE_MAX, but cbsz must be signed.
> 
> What do you mean by this?  cbsz is size_t which is unsigned...

I believe he's referring to this use of cbsz/in.dbsz/out.dbsz:

https://svnweb.freebsd.org/base/head/bin/dd/dd.c?revision=265698&view=markup#l171

Really, this is more wrong since there is math inside of a malloc(3)
call without any overflow handling. By virtue of making this max out at
a ssize_t, it becomes more unlikely that you'll have overflow.

This math should probably be done ahead of time with proper overflow
handling. I'll include that in my next patch, if there's no objection.

I don't see any other reason why in.dbsz, out.dbsz or cbsz should be
signed, but it's very possible that I didn't look hard enough.

> Again, the cast above is wrong...  Maybe we should add a SIZE_MAX
> define so we don't have to see the double cast...
> 
>>>  static void
>>>  f_count(char *arg)
>>>  {
>>> -   intmax_t res;
>>> -
>>> -   res = (intmax_t)get_num(arg);
>>> -   if (res < 0)
>>> -   errx(1, "count cannot be negative");
>>> -   if (res == 0)
>>> -   cpy_cnt = (uintmax_t)-1;
>>
>> This is a special case.  See dd_in().  I think that eliminating this
>> special case will have the unintended effect of breaking count=0.
>>
>>> -   else
>>> -   cpy_cnt = (uintmax_t)res;
>>> +   cpy_cnt = get_num(arg);
>>>  }
>>>
>>>  static void
>>>  f_files(char *arg)
>>>  {
>>> -
> 
> Don't eliminate these blank lines.. they are intentional per style(9):
>  /* Insert an empty line if the function has no local variables. 
> */
> 
>>> files_cnt = get_num(arg);
>>>

Re: DEADLKRES crash

2014-08-18 Thread Ryan Stone
On Mon, Aug 18, 2014 at 11:21 AM, Larry Rosenman  wrote:
> I got the following:
>
> borg.lerctr.org dumped core - see /var/crash/vmcore.8
>
> Mon Aug 18 07:30:42 CDT 2014
>
> FreeBSD borg.lerctr.org 11.0-CURRENT FreeBSD 11.0-CURRENT #63 r269784M: Sun 
> Aug 10 12:33:07 CDT 2014 r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER 
>  amd64
>
> panic: deadlkres: possible deadlock detected for 0xf8002abeb000, blocked 
> for 1800926 ticks
>
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
>
> Unread portion of the kernel message buffer:
> panic: deadlkres: possible deadlock detected for 0xf8002abeb000, blocked 
> for 1800926 ticks
>
> cpuid = 3
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe100bff1a10
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe100bff1ac0
> vpanic() at vpanic+0x126/frame 0xfe100bff1b00
> panic() at panic+0x43/frame 0xfe100bff1b60
> deadlkres() at deadlkres+0x35c/frame 0xfe100bff1bb0
> fork_exit() at fork_exit+0x84/frame 0xfe100bff1bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe100bff1bf0
> --- trap 0, rip = 0, rsp = 0xfe100bff1cb0, rbp = 0 ---
> Uptime: 7d14h14m38s

The first thing that I'd like to see is (in kgdb):

set $td=(struct thread)0xf8002abeb000
tid $td->td_tid
bt

That will show us the backtrace of the thread that was blocked for so long.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: DEADLKRES crash

2014-08-18 Thread Larry Rosenman

On 2014-08-18 15:45, Ryan Stone wrote:
On Mon, Aug 18, 2014 at 11:21 AM, Larry Rosenman  
wrote:

I got the following:

borg.lerctr.org dumped core - see /var/crash/vmcore.8

Mon Aug 18 07:30:42 CDT 2014

FreeBSD borg.lerctr.org 11.0-CURRENT FreeBSD 11.0-CURRENT #63 
r269784M: Sun Aug 10 12:33:07 CDT 2014 
r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  amd64


panic: deadlkres: possible deadlock detected for 0xf8002abeb000, 
blocked for 1800926 ticks


GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and 
you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for 
details.

This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: deadlkres: possible deadlock detected for 0xf8002abeb000, 
blocked for 1800926 ticks


cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe100bff1a10

kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe100bff1ac0
vpanic() at vpanic+0x126/frame 0xfe100bff1b00
panic() at panic+0x43/frame 0xfe100bff1b60
deadlkres() at deadlkres+0x35c/frame 0xfe100bff1bb0
fork_exit() at fork_exit+0x84/frame 0xfe100bff1bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe100bff1bf0
--- trap 0, rip = 0, rsp = 0xfe100bff1cb0, rbp = 0 ---
Uptime: 7d14h14m38s


The first thing that I'd like to see is (in kgdb):

set $td=(struct thread)0xf8002abeb000
tid $td->td_tid
bt

That will show us the backtrace of the thread that was blocked for so 
long.

0  doadump (textdump=1) at pcpu.h:219
219 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) set $td=(struct thread)0xf8002abeb000
Invalid cast.
(kgdb) set $td=(struct thread*)0xf8002abeb000
Current language:  auto; currently minimal
(kgdb) tid $td->td_tid
[Switching to thread 469 (Thread 100681)]#0  sched_switch (
td=0xf8002abeb000, newtd=,
flags=) at /usr/src/sys/kern/sched_ule.c:1931
1931cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xf8002abeb000, newtd=,
flags=) at /usr/src/sys/kern/sched_ule.c:1931
#1  0x80a107d9 in mi_switch (flags=260, newtd=0x0)
at /usr/src/sys/kern/kern_synch.c:493
#2  0x80a4c442 in sleepq_switch (wchan=,
pri=) at 
/usr/src/sys/kern/subr_sleepqueue.c:552

#3  0x80a4c2a3 in sleepq_wait (wchan=0xf80070a4dd50, pri=96)
at /usr/src/sys/kern/subr_sleepqueue.c:631
#4  0x809eb1fa in sleeplk (lk=,
flags=, ilk=,
wmesg=, pri=,
timo=) at /usr/src/sys/kern/kern_lock.c:225
#5  0x809eaa06 in __lockmgr_args (lk=0xf80070a4dd50,
flags=, ilk=0xf80070a4dd80,
wmesg=, pri=,
timo=) at /usr/src/sys/kern/kern_lock.c:931
#6  0x8092e092 in nfs_lock1 (ap=) at 
lockmgr.h:97

#7  0x80f2d57c in VOP_LOCK1_APV (vop=,
a=) at vnode_if.c:2082
#8  0x80abd22a in _vn_lock (vp=0xf80070a4dce8,
flags=,
file=0x8110db88 "/usr/src/sys/kern/vfs_subr.c", line=2137)
at vnode_if.h:859
#9  0x80aad4e7 in vget (vp=0xf80070a4dce8, flags=524544,
---Type  to continue, or q  to quit---
td=0xf8002abeb000) at /usr/src/sys/kern/vfs_subr.c:2137
#10 0x80aa1491 in vfs_hash_get (mp=0xf8002aa1e990,
hash=1741450670, flags=, td=0xf8002abeb000,
vpp=0xfe100c75c670, fn=0x80935820 )
at /usr/src/sys/kern/vfs_hash.c:88
#11 0x809314bd in ncl_nget (mntp=0xf8002aa1e990,
fhp=0xf80070ccf4a4 "\001", fhsize=12, npp=0xfe100c75c6e0,
lkflags=)
at /usr/src/sys/fs/nfsclient/nfs_clnode.c:114
#12 0x809340fd in nfs_statfs (mp=0xf8002aa1e990,
sbp=0xf8002aa1ea48) at 
/usr/src/sys/fs/nfsclient/nfs_clvfsops.c:288

#13 0x80aa7ade in __vfs_statfs (mp=0x0, sbp=0xf8002aa1ea48)
at /usr/src/sys/kern/vfs_mount.c:1706
#14 0x80ab4f5e in kern_getfsstat (td=0xf8002abeb000,
buf=, bufsize=,
bufseg=UIO_USERSPACE, flags=)
at /usr/src/sys/kern/vfs_syscalls.c:511
#15 0x80e1625a in amd64_syscall (td=0xf8002abeb000, 
traced=0)

at subr_syscall.c:133
#16 0x80df760b in Xfast_syscall ()
at /usr/src/sys/amd64/amd64/exception.S:390
#17 0x0008010fc83a in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
US Mail: 108 Turvey Cove, Hutto, TX 78634-5688
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"