Re: SIOCGIFADDR broken on 9.0-RC1?

2011-11-16 Thread GomoR
> From: "Jeremy Chadwick" 
> 
> I would recommend adding synchronous_dhclient="yes" to /etc/rc.conf.
> This will cause dhclient (the DHCP client) to wait until it gets an
> answer + IP back from the DHCP server before continuing with the rc.d
> scripts.  The default is "no".

Awesome, thank you. I should have searched for dhclient 
options in /etc/defaults/rc.conf ;)

-- 
  ^  ___  ___ http://www.GomoR.org/  <-+
  | / __ |__/Senior Security Engineer  |
  | \__/ |  \ ---[ zsh$ alias psed='perl -pe ' ]---|
  +-->  Net::Frame <=> http://search.cpan.org/~gomor/  <---+
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible to build 9-stable kernel on 8.2?

2011-11-16 Thread Maciej Milewski
Dnia wtorek 15 listopad 2011 17:32:09 Jeremy Chadwick pisze:
> Not to mention, one should always do buildworld first.  The absolute
> correct procedure is outlined in /usr/src/Makefile:
And in /usr/src/UPDATING which should be read before updating as it may have 
some important informations. The same apply to /usr/ports/UPDATING.

less +/cross-install /usr/src/UPDATING

-- 
Maciej Milewski
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible to build 9-stable kernel on 8.2?

2011-11-16 Thread Dimitry Andric
On 2011-11-16 01:29, Glen Barber wrote:
> On Tue, Nov 15, 2011 at 11:45:02AM -0800, Chuck Tuffli wrote:
...
>> ld:/usr/home/ctuffli/dev/releng_9/src/sys/conf/ldscript.amd64:9: syntax error
>> *** Error code 1
> You'll need to do 'buildworld' first.

Actually, doing "make kernel-toolchain" is enough.  This builds just the
required tools, e.g. binutils, gcc and so on.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ATA/Cdrom(?) panic

2011-11-16 Thread Ivan Voras
On 16/11/2011 07:43, Bjoern A. Zeeb wrote:
> Hey,
> 
> we have seen this or a very similar panic for about 1 year now once in
> a while and I think I reported it before; this is FreeBSD as guest on

Yes, IIRC I've also reported it before; it crashes randomly, when the
machine is not doing anything with the cdrom. As a workaround, I now
remove the cdrom device from vmware instances.


> vmware.   Seems it was a double panic this time.   Could someone please
> see what's going on there?It was on 8.x-STABLE in the past and this
> is 8.2-RELEASE-p4.
> 
> Thanks
> /bz
> 
> acd0: WARNING - READ_TOC taskqueue timeout - completing request directly
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 4; apic id = 04
> fault virtual address   = 0x1f4
> fault code  = supervisor read, page not present
> instruction pointer = 0x20:0xc08a1e9f
> 
> stack pointer   = 0x28:0xe6ad5b9c
> Fatal trap 12: page fault while in kernel mode
> frame pointer   = 0x28:0xe6ad5bb4
> cpuid = 2;
> code segment= base 0x0, limit 0xf, type 0x1bapic id = 02
> = DPL 0, pres 1, def32 1, gran 1
> fault virtual address   = 0x1f4
> processor eflags=
> fault code  = supervisor read, page not presentinterrupt
> enabled,
> instruction pointer = 0x20:0xc08a1e9fresume,
> stack pointer   = 0x28:0xe8e9e808IOPL = 0
> frame pointer   = 0x28:0xe8e9e820
> current process =
> code segment= base 0x0, limit 0xf, type 0x1b12 (swi6:
> task queue)
> = DPL 0, pres 1, def32 1, gran 1
> trap number = 12
> processor eflags= interrupt enabled,
> panic: page faultresume,
> cpuid = 4IOPL = 0
> current process =
> KDB: stack backtrace:25162 (bsnmpd)
> 
> trap number = 12#0 0xc08e0d07 at kdb_backtrace+0x47
> 
> #1 0xc08b1dc7 at panic+0x117
> #2 0xc0be4b53 at trap_fatal+0x323
> #3 0xc0be4dd0 at trap_pfault+0x270
> #4 0xc0be5315 at trap+0x465
> #5 0xc0bcbecc at calltrap+0x6
> #6 0xc08b0d86 at _sema_post+0x46
> #7 0xc056fa47 at ata_completed+0x727
> #8 0xc08eb97a at taskqueue_run_locked+0xca
> #9 0xc08ebc8a at taskqueue_run+0xaa
> #10 0xc08ebd53 at taskqueue_swi_run+0x13
> #11 0xc088903b at intr_event_execute_handlers+0x13b
> #12 0xc088a75b at ithread_loop+0x6b
> #13 0xc0886d51 at fork_exit+0x91
> #14 0xc0bcbf44 at fork_trampoline+0x8
> Uptime: 5d20h1m56s
> 
> 
> (gdb) l *ata_completed+0x727
> 489 (request->callback)(request);
> 490 else
> 491 sema_post(&request->done);
> 492
> 493 /* only call ata_start if channel is present */
> 494 if (ch)
> 495 ata_start(ch->dev);
> 496 }
> 497
> 498 void
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: mfi timeouts

2011-11-16 Thread Vincent Hoffman
On 14/11/2011 19:42, John Baldwin wrote:
> On Thursday, November 10, 2011 5:59:28 am Vincent Hoffman wrote:
>> Well the dell has been up for about 19 hours now using MSI, I ran 
>> bonnie++ a few times on it and have now stuck it in a permanent loop 
>> (will look in from time to time.) Are there any tests you'd like 
>> run/info you'd like?
> Actually, can you please test www.freebsd.org/~jhb/patches/mfi_msi.patch?
> You will have to set the hw.mfi.msi=1 tunable to enable MSI support.  This
> is a commit candidate if it works.  Thanks.
>
Applied and running with bonnie++ overnight. All good for me at least.

Vince
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ATA/Cdrom(?) panic

2011-11-16 Thread Alexander Motin
Hi.

On 11/16/11 08:43, Bjoern A. Zeeb wrote:
> we have seen this or a very similar panic for about 1 year now once in
> a while and I think I reported it before; this is FreeBSD as guest on
> vmware.   Seems it was a double panic this time.   Could someone please
> see what's going on there?It was on 8.x-STABLE in the past and this
> is 8.2-RELEASE-p4.

The part of code reporting "completing request directly" is IMHO broken
by design. It returns request completion before request will actually be
completed by lower levels without any knowledge of what's going on
there. There is kind of protection against double request completion,
but it looks like not always working. May be because that part of code
is not locked and nothing prevents that semaphore timeout and normal
request timeout/completion to happen simultaneously. It is surprising to
see even two traps same time, not sure what synchronized them so precisely.

Simple removing that semaphore timeout is not an option, because it will
cause deadlock when this wait happen within taskqueue thread that is
used to handle requests completion and abort that wait. Avoid waiting
inside taskqueue is also impossible without major rewrite. That's why
ATA_CAM drops that code completely.

-- 
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ATA/Cdrom(?) panic

2011-11-16 Thread Bjoern A. Zeeb

On Wed, 16 Nov 2011, Alexander Motin wrote:


Hi.

On 11/16/11 08:43, Bjoern A. Zeeb wrote:

we have seen this or a very similar panic for about 1 year now once in
a while and I think I reported it before; this is FreeBSD as guest on
vmware.   Seems it was a double panic this time.   Could someone please
see what's going on there?It was on 8.x-STABLE in the past and this
is 8.2-RELEASE-p4.


The part of code reporting "completing request directly" is IMHO broken
by design. It returns request completion before request will actually be
completed by lower levels without any knowledge of what's going on
there. There is kind of protection against double request completion,
but it looks like not always working. May be because that part of code
is not locked and nothing prevents that semaphore timeout and normal
request timeout/completion to happen simultaneously. It is surprising to
see even two traps same time, not sure what synchronized them so precisely.

Simple removing that semaphore timeout is not an option, because it will
cause deadlock when this wait happen within taskqueue thread that is
used to handle requests completion and abort that wait. Avoid waiting
inside taskqueue is also impossible without major rewrite. That's why
ATA_CAM drops that code completely.


So the bottom line of what you are saying is:
1) it's hard to fix right in 8
2) it's not an issue in 9 anymore at all?

--
Bjoern A. Zeeb You have to have visions!
 Stop bit received. Insert coin for new address family.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ATA/Cdrom(?) panic

2011-11-16 Thread Alexander Motin
On 11/16/11 16:14, Bjoern A. Zeeb wrote:
> On Wed, 16 Nov 2011, Alexander Motin wrote:
> 
>> Hi.
>>
>> On 11/16/11 08:43, Bjoern A. Zeeb wrote:
>>> we have seen this or a very similar panic for about 1 year now once in
>>> a while and I think I reported it before; this is FreeBSD as guest on
>>> vmware.   Seems it was a double panic this time.   Could someone please
>>> see what's going on there?It was on 8.x-STABLE in the past and this
>>> is 8.2-RELEASE-p4.
>>
>> The part of code reporting "completing request directly" is IMHO broken
>> by design. It returns request completion before request will actually be
>> completed by lower levels without any knowledge of what's going on
>> there. There is kind of protection against double request completion,
>> but it looks like not always working. May be because that part of code
>> is not locked and nothing prevents that semaphore timeout and normal
>> request timeout/completion to happen simultaneously. It is surprising to
>> see even two traps same time, not sure what synchronized them so
>> precisely.
>>
>> Simple removing that semaphore timeout is not an option, because it will
>> cause deadlock when this wait happen within taskqueue thread that is
>> used to handle requests completion and abort that wait. Avoid waiting
>> inside taskqueue is also impossible without major rewrite. That's why
>> ATA_CAM drops that code completely.
> 
> So the bottom line of what you are saying is:
> 1) it's hard to fix right in 8
> 2) it's not an issue in 9 anymore at all?

Right.

-- 
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ATA/Cdrom(?) panic

2011-11-16 Thread Joel Dahl
On 16-11-2011 16:33, Alexander Motin wrote:
> On 11/16/11 16:14, Bjoern A. Zeeb wrote:
> > On Wed, 16 Nov 2011, Alexander Motin wrote:
> > 
> >> Hi.
> >>
> >> On 11/16/11 08:43, Bjoern A. Zeeb wrote:
> >>> we have seen this or a very similar panic for about 1 year now once in
> >>> a while and I think I reported it before; this is FreeBSD as guest on
> >>> vmware.   Seems it was a double panic this time.   Could someone please
> >>> see what's going on there?It was on 8.x-STABLE in the past and this
> >>> is 8.2-RELEASE-p4.
> >>
> >> The part of code reporting "completing request directly" is IMHO broken
> >> by design. It returns request completion before request will actually be
> >> completed by lower levels without any knowledge of what's going on
> >> there. There is kind of protection against double request completion,
> >> but it looks like not always working. May be because that part of code
> >> is not locked and nothing prevents that semaphore timeout and normal
> >> request timeout/completion to happen simultaneously. It is surprising to
> >> see even two traps same time, not sure what synchronized them so
> >> precisely.
> >>
> >> Simple removing that semaphore timeout is not an option, because it will
> >> cause deadlock when this wait happen within taskqueue thread that is
> >> used to handle requests completion and abort that wait. Avoid waiting
> >> inside taskqueue is also impossible without major rewrite. That's why
> >> ATA_CAM drops that code completely.
> > 
> > So the bottom line of what you are saying is:
> > 1) it's hard to fix right in 8
> > 2) it's not an issue in 9 anymore at all?
> 
> Right.

Hmm. We're running many FreeBSD 8.2 machines as guests in VMware but have
never encountered the panic described above. Should I be worried?  :-)

-- 
Joel
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Trouble with SSD on SATA

2011-11-16 Thread Willem Jan Withagen

Hi,

I'm getting these:

Nov 16 16:40:49 zfs kernel: ata6: port is not ready (timeout 15000ms) 
tfd = 0080

Nov 16 16:40:49 zfs kernel: ata6: hardware reset timeout
Nov 16 16:41:50 zfs kernel: ata6: port is not ready (timeout 15000ms) 
tfd = 0080

Nov 16 16:41:50 zfs kernel: ata6: hardware reset timeout

When inserting the tray with a SSD disk connected to that controller.

Which is probably due to a BIOS upgrade
At least it started after upgrading the BIOS. So I'm asking SuperMicro 
for an older version.


When this happens, the system sometimes panics, haven't written the 
details yet down right now. somewhere in get_devices...


After the panic I really need to powerdown the machine, otherwise it 
boots but stalls at finding any disks. It does not just find no disks, 
it "freezes" at the point it should report the found disks in the bios-boot.

So apparently the ata controller are left in a very confused state.

Why is the controller found at boot, and works as it should.
And why later it just starts generating these hardware resets??

--WjW
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ATA/Cdrom(?) panic

2011-11-16 Thread Ivan Voras
On 16/11/2011 15:45, Joel Dahl wrote:

> 
> Hmm. We're running many FreeBSD 8.2 machines as guests in VMware but have
> never encountered the panic described above. Should I be worried?  :-)
> 

I've encountered them often enough that I started removing cdrom devices
from the VMs.



signature.asc
Description: OpenPGP digital signature


Re: Trouble with SSD on SATA

2011-11-16 Thread Peter Maloney
Willem,

I can only guess, but...

Is AHCI enabled in the bios? If you are not using 'fake-raid' for any
disks, you should [depending on FreeBSD version, HBA, etc.] probably
enable AHCI. Some servers actually come with SATA set in IDE mode. And
if you are using zfs, the controller optimally should not be RAID at
all. And if you have AHCI enabled already, try disabling it (losing hot
swapping ability, and some performance).

What version of FreeBSD are you using? I had a terrible experience with
ZFS on FreeBSD 8.2 release, and 8.2-stable-April2011. I would recommend
upgrading to the latest 8-stable with cvsup.

This thread seems related:
http://forums.freebsd.org/showthread.php?t=24189

The guy was using 8.2 release, and he downgraded to an old version of
the driver to fix, saying that a patch also existed in 8-stable that
fixes the problem.

Are you using an expander?

What HBA / hard disk controller are you using?


Peter

Am 16.11.2011 17:12, schrieb Willem Jan Withagen:
> Hi,
>
> I'm getting these:
>
> Nov 16 16:40:49 zfs kernel: ata6: port is not ready (timeout 15000ms)
> tfd = 0080
> Nov 16 16:40:49 zfs kernel: ata6: hardware reset timeout
> Nov 16 16:41:50 zfs kernel: ata6: port is not ready (timeout 15000ms)
> tfd = 0080
> Nov 16 16:41:50 zfs kernel: ata6: hardware reset timeout
>
> When inserting the tray with a SSD disk connected to that controller.
>
> Which is probably due to a BIOS upgrade
> At least it started after upgrading the BIOS. So I'm asking SuperMicro
> for an older version.
>
> When this happens, the system sometimes panics, haven't written the
> details yet down right now. somewhere in get_devices...
>
> After the panic I really need to powerdown the machine, otherwise it
> boots but stalls at finding any disks. It does not just find no disks,
> it "freezes" at the point it should report the found disks in the
> bios-boot.
> So apparently the ata controller are left in a very confused state.
>
> Why is the controller found at boot, and works as it should.
> And why later it just starts generating these hardware resets??
>
> --WjW
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Trouble with SSD on SATA

2011-11-16 Thread Alexander Motin

Hi.

On 16.11.2011 18:12, Willem Jan Withagen wrote:

I'm getting these:

Nov 16 16:40:49 zfs kernel: ata6: port is not ready (timeout 15000ms)
tfd = 0080
Nov 16 16:40:49 zfs kernel: ata6: hardware reset timeout
Nov 16 16:41:50 zfs kernel: ata6: port is not ready (timeout 15000ms)
tfd = 0080
Nov 16 16:41:50 zfs kernel: ata6: hardware reset timeout

When inserting the tray with a SSD disk connected to that controller.

Which is probably due to a BIOS upgrade
At least it started after upgrading the BIOS. So I'm asking SuperMicro
for an older version.

When this happens, the system sometimes panics, haven't written the
details yet down right now. somewhere in get_devices...

After the panic I really need to powerdown the machine, otherwise it
boots but stalls at finding any disks. It does not just find no disks,
it "freezes" at the point it should report the found disks in the
bios-boot.
So apparently the ata controller are left in a very confused state.

Why is the controller found at boot, and works as it should.
And why later it just starts generating these hardware resets??


Looking on messages, I would say that you are using AHCI controller with 
old ata(4) driver. I would recommend you to try new ahci(4) driver. It 
has better hot-plug support and also supports NCQ and some other 
features. Note that disks connected to it will be reported as adaX 
instead of adY.


--
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Possible to build 9-stable kernel on 8.2?

2011-11-16 Thread Chuck Tuffli
On Wed, Nov 16, 2011 at 1:34 AM, Dimitry Andric  wrote:
> On 2011-11-16 01:29, Glen Barber wrote:
>> On Tue, Nov 15, 2011 at 11:45:02AM -0800, Chuck Tuffli wrote:
> ...
>>> ld:/usr/home/ctuffli/dev/releng_9/src/sys/conf/ldscript.amd64:9: syntax 
>>> error
>>> *** Error code 1
>> You'll need to do 'buildworld' first.
>
> Actually, doing "make kernel-toolchain" is enough.  This builds just the
> required tools, e.g. binutils, gcc and so on.
>

Perfect! This is the gem I needed. Thanks to all for the help.

---chuck
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: mfi timeouts

2011-11-16 Thread Jan Mikkelsen

On 16/11/2011, at 9:43 PM, Vincent Hoffman wrote:

> On 14/11/2011 19:42, John Baldwin wrote:
>> On Thursday, November 10, 2011 5:59:28 am Vincent Hoffman wrote:
>>> Well the dell has been up for about 19 hours now using MSI, I ran 
>>> bonnie++ a few times on it and have now stuck it in a permanent loop 
>>> (will look in from time to time.) Are there any tests you'd like 
>>> run/info you'd like?
>> Actually, can you please test www.freebsd.org/~jhb/patches/mfi_msi.patch?
>> You will have to set the hw.mfi.msi=1 tunable to enable MSI support.  This
>> is a commit candidate if it works.  Thanks.
>> 
> Applied and running with bonnie++ overnight. All good for me at least.


Boots for me with hw.mfi.msi=1, fails to boot with mw.mfi.msi=0, giving 
repeated timeout messages pretty much as expected. Won't be able to put load on 
it until later tomorrow or next week.

Regards,

Jan.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.2 + apache == a LOT of sigprocmask

2011-11-16 Thread Doug Barton
On 11/15/2011 02:09, Jeremy Chadwick wrote:
> On Tue, Nov 15, 2011 at 11:07:45AM +0200, Kostik Belousov wrote:
>> On Mon, Nov 14, 2011 at 12:51:35PM -0800, Doug Barton wrote:
>>> On 11/14/2011 12:31, Doug Barton wrote:
 Trying to track down a load problem we're seeing on 8.2-RELEASE-p4 i386
 in a busy web hosting environment I came across the following post:

 http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234520.html

 That basically describes what we're seeing as well, including the
 "doesn't happen on Linux" part.

 Does anyone have any ideas about this?

 With incredibly similar stuff running on 7.x we didn't see this problem,
 so it seems to be something new in 8.
>>>
>>> Just took a closer look at our ktrace, and actually our pattern is
>>> slightly different than the one in that post. In ours the second option
>>> is null, but the third is set:
>>>
>>> 74195 httpd0.17 RET   sigprocmask 0
>>> 74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
>>> 74195 httpd0.09 RET   sigprocmask 0
>>> 74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
>>> 74195 httpd0.09 RET   sigprocmask 0
>>> 74195 httpd0.12 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
>>>
>>> But repeated hundreds of times in a row.
>>
>> The calls cannot come from rtld, they are generated by some setjmp()
>> invocation. If signal-safety is not needed, sigsetjmp() should be used
>> instead.
>>
>> Quick grep of the apache httpd source shows a single setjmp() in their
>> copy of pcre. No idea is it to safe to change setjmp() into sigsetjmp(?, 0).
> 
> I hate cross-posting, but: adding freebsd-apache@ to the list.  Some of
> the Apache folks (not just port committers) may have some insight to
> Kostik's findings.

Thanks to everyone for the responses. We tried Kostik's suggestion and
unfortunately it didn't reduce the number of sigprocmask() calls to a
statistically significant degree.

Does anyone have any other ideas on ways to debug this? We're sort of
running out of things to test. :-/

Given how important (and prevalent) the Apache + FreeBSD combination is,
I'm kind of disturbed that we're seeing this performance problem, and if
it's something in 8.x that's also in 9.x, it would be better to fix it
prior to 9.0-RELEASE.


Doug

-- 

"We could put the whole Internet into a book."
"Too practical."

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.2 + apache == a LOT of sigprocmask

2011-11-16 Thread Kostik Belousov
On Wed, Nov 16, 2011 at 10:46:27PM -0800, Doug Barton wrote:
> On 11/15/2011 02:09, Jeremy Chadwick wrote:
> > On Tue, Nov 15, 2011 at 11:07:45AM +0200, Kostik Belousov wrote:
> >> On Mon, Nov 14, 2011 at 12:51:35PM -0800, Doug Barton wrote:
> >>> On 11/14/2011 12:31, Doug Barton wrote:
>  Trying to track down a load problem we're seeing on 8.2-RELEASE-p4 i386
>  in a busy web hosting environment I came across the following post:
> 
>  http://lists.freebsd.org/pipermail/freebsd-questions/2011-October/234520.html
> 
>  That basically describes what we're seeing as well, including the
>  "doesn't happen on Linux" part.
> 
>  Does anyone have any ideas about this?
> 
>  With incredibly similar stuff running on 7.x we didn't see this problem,
>  so it seems to be something new in 8.
> >>>
> >>> Just took a closer look at our ktrace, and actually our pattern is
> >>> slightly different than the one in that post. In ours the second option
> >>> is null, but the third is set:
> >>>
> >>> 74195 httpd0.17 RET   sigprocmask 0
> >>> 74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
> >>> 74195 httpd0.09 RET   sigprocmask 0
> >>> 74195 httpd0.13 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
> >>> 74195 httpd0.09 RET   sigprocmask 0
> >>> 74195 httpd0.12 CALL  sigprocmask(SIG_BLOCK,0,0xbfbf89d4)
> >>>
> >>> But repeated hundreds of times in a row.
> >>
> >> The calls cannot come from rtld, they are generated by some setjmp()
> >> invocation. If signal-safety is not needed, sigsetjmp() should be used
> >> instead.
> >>
> >> Quick grep of the apache httpd source shows a single setjmp() in their
> >> copy of pcre. No idea is it to safe to change setjmp() into sigsetjmp(?, 
> >> 0).
> > 
> > I hate cross-posting, but: adding freebsd-apache@ to the list.  Some of
> > the Apache folks (not just port committers) may have some insight to
> > Kostik's findings.
> 
> Thanks to everyone for the responses. We tried Kostik's suggestion and
> unfortunately it didn't reduce the number of sigprocmask() calls to a
> statistically significant degree.
> 
> Does anyone have any other ideas on ways to debug this? We're sort of
> running out of things to test. :-/
> 
> Given how important (and prevalent) the Apache + FreeBSD combination is,
> I'm kind of disturbed that we're seeing this performance problem, and if
> it's something in 8.x that's also in 9.x, it would be better to fix it
> prior to 9.0-RELEASE.

Since my guess appeared to be not useful, the way forward is to identify
the location of the call(s) that cause the issue. I suggest compliling
at least apache itself, libc, rtld and libthr (if used) with debugging
information. Then, attach to the running apache worker with the gdb and
set breakpoint on sigprocmask. Several backtraces from the hit breakpoint
should give enough data.

High-tech solution is to link with libunwind and add code into sigprocmask()
to gather the stacks. But I expect that gdb attach is enough.


pgph4H6aDhzI5.pgp
Description: PGP signature