Re: Panic on boot after upgrade from r320827 -> r320869

2017-07-15 Thread Michael Butler

On 07/11/17 19:53, Michael Butler wrote:

On 07/11/17 13:13, I wrote:
Take sdhci out of the kernel and try again. If that works, it tells 
us one
thing (need to troubleshoot sdhci stuff more). If not it tells us 
another

(need to troubleshoot CAM more), do we get errors with the ATA_IDENTIFY
command? Does it try multiple times per AHCI port? What AHCI device 
do you
have? You may need to scroll back with the screen-lock / pageup keys 
to see

these messages.


  [ .. snip .. ]

I'll try this tonight when I'm back at home. The laptop concerned uses 
the ICH-7M part in "legacy mode" so it doesn't do AHCI at all :-(


Without sdhci and mmc, it actually boots but everything KDE aborts with 
signal 6 :-(


I'm not prepared to rebuild the ~1900 ports on this box to pursue this 
further,


Something about SVN r320844 causes almost all KDE applications to fail 
on a signal 6.


I've recompiled KDE and other components obviously dependent on kernel 
structures (e.g. everything dbus-related). I still get core-files with a 
back-trace that looks like:


(gdb) bt
#0  0x000804232f6a in thr_kill () from /lib/libc.so.7
#1  0x000804232f34 in raise () from /lib/libc.so.7
#2  0x000804232ea9 in abort () from /lib/libc.so.7
#3  0x0008188597af in ?? () from /usr/local/lib/libdbus-1.so.3
#4  0x00081884ef2c in _dbus_warn_check_failed () from 
/usr/local/lib/libdbus-1.so.3
#5  0x00081883f539 in dbus_message_new_method_call () from 
/usr/local/lib/libdbus-1.so.3

#6  0x000801bddfe8 in ?? () from /usr/local/lib/qt4/libQtDBus.so.4
#7  0x000801bd591e in ?? () from /usr/local/lib/qt4/libQtDBus.so.4
#8  0x000801bd9af6 in ?? () from /usr/local/lib/qt4/libQtDBus.so.4
#9  0x000801be656d in ?? () from /usr/local/lib/qt4/libQtDBus.so.4
#10 0x000801be6807 in QDBusInterface::QDBusInterface(QString const&, 
QString const&, QString const&, QDBusConnection const&, QObject*) ()

   from /usr/local/lib/qt4/libQtDBus.so.4
#11 0x00080d12728e in ?? () from /usr/local/lib/libsolid.so.4
#12 0x00080d11e68c in ?? () from /usr/local/lib/libsolid.so.4
#13 0x00080d12a525 in ?? () from /usr/local/lib/libsolid.so.4
#14 0x00080d0e7aac in 
Solid::Device::listFromType(Solid::DeviceInterface::Type const&, QString 
const&) () from /usr/local/lib/libsolid.so.4

#15 0x00080e7e889a in ?? () from /usr/local/lib/libplasma.so.3
#16 0x00080e7e6094 in Plasma::RunnerManager::RunnerManager(QObject*) 
() from /usr/local/lib/libplasma.so.3

#17 0x0008172fab42 in ?? () from /usr/local/lib/libkdeinit4_krunner.so
#18 0x0008172fa9b4 in ?? () from 
/usr/local/lib/libkdeinit4_krunner.so 

#19 0x0008172fd303 in kdemain () from 
/usr/local/lib/libkdeinit4_krunner.so 

#20 0x0040a015 in ?? () 



#21 0x0040aec0 in ?? () 




SVN r320843 works, r320844 doesn't :-(

imb

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on boot after upgrade from r320827 -> r320869

2017-07-15 Thread Warner Losh
On Sat, Jul 15, 2017 at 1:32 PM, Michael Butler 
wrote:

> On 07/11/17 19:53, Michael Butler wrote:
>
>> On 07/11/17 13:13, I wrote:
>>
>>> Take sdhci out of the kernel and try again. If that works, it tells us
 one
 thing (need to troubleshoot sdhci stuff more). If not it tells us
 another
 (need to troubleshoot CAM more), do we get errors with the ATA_IDENTIFY
 command? Does it try multiple times per AHCI port? What AHCI device do
 you
 have? You may need to scroll back with the screen-lock / pageup keys to
 see
 these messages.

>>>
>>   [ .. snip .. ]
>>
>> I'll try this tonight when I'm back at home. The laptop concerned uses
>>> the ICH-7M part in "legacy mode" so it doesn't do AHCI at all :-(
>>>
>>
>> Without sdhci and mmc, it actually boots but everything KDE aborts with
>> signal 6 :-(
>>
>> I'm not prepared to rebuild the ~1900 ports on this box to pursue this
>> further,
>>
>
> Something about SVN r320844 causes almost all KDE applications to fail on
> a signal 6.
>

I don't think that's possible, unless (a) your build hit the 'not
everything in the kernel rebuilt' bug or (b) KDE is issuing raw CAM
requests. Since I don't know KDE, don't run KDE or have any clue about KDE,
I can't help you trace it down further.


> I've recompiled KDE and other components obviously dependent on kernel
> structures (e.g. everything dbus-related). I still get core-files with a
> back-trace that looks like:
>
> (gdb) bt
> #0  0x000804232f6a in thr_kill () from /lib/libc.so.7
> #1  0x000804232f34 in raise () from /lib/libc.so.7
> #2  0x000804232ea9 in abort () from /lib/libc.so.7
> #3  0x0008188597af in ?? () from /usr/local/lib/libdbus-1.so.3
> #4  0x00081884ef2c in _dbus_warn_check_failed () from
> /usr/local/lib/libdbus-1.so.3
> #5  0x00081883f539 in dbus_message_new_method_call () from
> /usr/local/lib/libdbus-1.so.3
> #6  0x000801bddfe8 in ?? () from /usr/local/lib/qt4/libQtDBus.so.4
> #7  0x000801bd591e in ?? () from /usr/local/lib/qt4/libQtDBus.so.4
> #8  0x000801bd9af6 in ?? () from /usr/local/lib/qt4/libQtDBus.so.4
> #9  0x000801be656d in ?? () from /usr/local/lib/qt4/libQtDBus.so.4
> #10 0x000801be6807 in QDBusInterface::QDBusInterface(QString const&,
> QString const&, QString const&, QDBusConnection const&, QObject*) ()
>from /usr/local/lib/qt4/libQtDBus.so.4
> #11 0x00080d12728e in ?? () from /usr/local/lib/libsolid.so.4
> #12 0x00080d11e68c in ?? () from /usr/local/lib/libsolid.so.4
> #13 0x00080d12a525 in ?? () from /usr/local/lib/libsolid.so.4
> #14 0x00080d0e7aac in 
> Solid::Device::listFromType(Solid::DeviceInterface::Type
> const&, QString const&) () from /usr/local/lib/libsolid.so.4
> #15 0x00080e7e889a in ?? () from /usr/local/lib/libplasma.so.3
> #16 0x00080e7e6094 in Plasma::RunnerManager::RunnerManager(QObject*)
> () from /usr/local/lib/libplasma.so.3
> #17 0x0008172fab42 in ?? () from /usr/local/lib/libkdeinit4_krunner.so
> #18 0x0008172fa9b4 in ?? () from /usr/local/lib/libkdeinit4_krunner.so
>
> #19 0x0008172fd303 in kdemain () from 
> /usr/local/lib/libkdeinit4_krunner.so
>
> #20 0x0040a015 in ?? ()
>
> #21 0x0040aec0 in ?? ()
>
>
> SVN r320843 works, r320844 doesn't :-(
>

I'd look to see if any of that software uses a CAM CCB for any reason.
That's the only thing I can think of that might have been affected. Perhaps
it's doing an identify? There was one CCB that changed size (and did so in
an incompatible way between rev 320844 and 320878), I didn't think it was
user visible.

Does camcontrol identify or camcontrol inquiry work?

Warner
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on boot after upgrade from r320827 -> r320869

2017-07-15 Thread Michael Butler

On 07/15/17 20:39, Mark Millard wrote:

FYI for Michael B.: the incomplete kernel rebuild problem has a fix: -r320919 .
See the fix (to the building problem that was created in -r320220 ):

https://lists.freebsd.org/pipermail/svn-src-head/2017-July/102622.html

If the KDE problem persists based on a -r320919 or later build, it would
be appropriate to report it again as a separate issue.

Unfortunately various odd problems have shown up over -r320220 through
-r320918 from incorrect rebuilds (and other oddities overlapping in the
time frame).

Of course if you built (or build) -r320844 based on a empty directory in
the first place so that it was a full-build but the KDE problem persisted
when using the rebuilt kernel then the above material does not apply. In
such a case reporting that about the context for the KDE problem would be
appropriate.

You may well have other things to be doing instead of what the above
suggests. If so, just take the above as background information.


Prior to testing this, I did 'rm -rf /usr/obj/*' so it is a clean build. 
I can run with user-land at SVN r321021 but any kernel at or after 
r320844 fails :-(


imb

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on boot after upgrade from r320827 -> r320869

2017-07-15 Thread Warner Losh
On Sat, Jul 15, 2017 at 6:49 PM, Michael Butler 
wrote:

> On 07/15/17 20:39, Mark Millard wrote:
>
>> FYI for Michael B.: the incomplete kernel rebuild problem has a fix:
>> -r320919 .
>> See the fix (to the building problem that was created in -r320220 ):
>>
>> https://lists.freebsd.org/pipermail/svn-src-head/2017-July/102622.html
>>
>> If the KDE problem persists based on a -r320919 or later build, it would
>> be appropriate to report it again as a separate issue.
>>
>> Unfortunately various odd problems have shown up over -r320220 through
>> -r320918 from incorrect rebuilds (and other oddities overlapping in the
>> time frame).
>>
>> Of course if you built (or build) -r320844 based on a empty directory in
>> the first place so that it was a full-build but the KDE problem persisted
>> when using the rebuilt kernel then the above material does not apply. In
>> such a case reporting that about the context for the KDE problem would be
>> appropriate.
>>
>> You may well have other things to be doing instead of what the above
>> suggests. If so, just take the above as background information.
>>
>
> Prior to testing this, I did 'rm -rf /usr/obj/*' so it is a clean build. I
> can run with user-land at SVN r321021 but any kernel at or after r320844
> fails :-(
>

Right. We need to find out what, exactly, is failing to make progress. I
have exactly one guess as to what might be going on, and it's a long shot
at best. To gather more evidence, I need to know if the kde thing that's
segfaulting is accessing /dev/pass* or /dev/xpt*. If you can confirm that
it is, then we'll need to see how to fix that.

Also, you'll need an installworld as well as an installkernel so the new
headers are installed prior to running kde. If that fixes it, then my guess
goes from a long shot to close to a sure thing.

Warner
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on boot after upgrade from r320827 -> r320869

2017-07-15 Thread Mark Millard
Warner Losh imp at bsdimp.com wrote on Sat Jul 15 23:22:22 UTC 2017 :

> On Sat, Jul 15, 2017 at 1:32 PM, Michael Butler  protected-networks.net>
> wrote:
> 
> > On 07/11/17 19:53, Michael Butler wrote:
> >
> . . .
> >
> > Something about SVN r320844 causes almost all KDE applications to fail on
> > a signal 6.
> >
> 
> I don't think that's possible, unless (a) your build hit the 'not
> everything in the kernel rebuilt' bug or (b) KDE is issuing raw CAM
> requests. Since I don't know KDE, don't run KDE or have any clue about KDE,
> I can't help you trace it down further.

FYI for Michael B.: the incomplete kernel rebuild problem has a fix: -r320919 .
See the fix (to the building problem that was created in -r320220 ):

https://lists.freebsd.org/pipermail/svn-src-head/2017-July/102622.html

If the KDE problem persists based on a -r320919 or later build, it would
be appropriate to report it again as a separate issue.

Unfortunately various odd problems have shown up over -r320220 through
-r320918 from incorrect rebuilds (and other oddities overlapping in the
time frame).

Of course if you built (or build) -r320844 based on a empty directory in
the first place so that it was a full-build but the KDE problem persisted
when using the rebuilt kernel then the above material does not apply. In
such a case reporting that about the context for the KDE problem would be
appropriate.

You may well have other things to be doing instead of what the above
suggests. If so, just take the above as background information.


===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"