On Fri, May 11, 2012 at 12:25:01PM -0300, gustavo panizzo <gfa> wrote: > adding debian-boot > > > i've installed unstable on the box (using debootstrap) and it boots > 3.2.0-2-sparc64 sucessfully, networking works > > obp diags shows no errors > > but when i boot from network using > http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012 > > i get the following error > > ┌───────────────┤ Detecting link on eth0; please wait... ├────────────────┐ > │ │ > │ 100% [ > 246.994391] Unable to handle kernel NULL pointer dereference > 247.074490] tsk->{mm,active_mm}->context = 000000000000019f │ > 14;10H[ 247.164534] tsk->{mm,active_mm}->pgd = fffff8001d48c000 │ > [ 247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │ > [ 247.328648] Call Trace: │ > [ 247.360793] [000000000045dcd4] do_exit+0x94/0x708 │ > [ 247.423821] [0000000000427550] die_if_kernel+0x2a0/0x2c8────────────────┘ > [ 247.494864] [0000000000768c84] unhandled_fault+0x8c/0x98 > [ 247.565915] [000000000076936c] do_sparc64_fault+0x6dc/0x780 > [ 247.640377] [0000000000407880] sparc64_realfault_common+0x10/0x20 > [ 247.721722] [0000000010015680] gem_poll+0x9fc/0x1328 [sungem] > [ 247.798478] [0000000000697110] net_rx_action+0x9c/0x234 > [ 247.868369] [00000000004607f0] __do_softirq+0xdc/0x1c4 > [ 247.937125] [000000000042a76c] do_softirq+0x54/0x80 > [ 248.002442] [0000000000460a6c] irq_exit+0x38/0x94 > [ 248.065474] [000000000042df38] timer_interrupt+0x90/0xa8 > [ 248.136516] [00000000004209d4] tl0_irq14+0x14/0x20 > [ 248.200692] [000000000049e764] touch_softlockup_watchdog+0x4/0xc > [ 248.280888] [00000000008f07e4] start_kernel+0x390/0x3a0 > [ 248.350783] [0000000000750b88] tlb_fixup_done+0x80/0x88 > [ 248.420672] [0000000000000000] (null) > [ 248.481416] Press Stop-A (L1-A) to return to the boot prom
Interesting, so we are doing something funky during link detection to trip this bug. The code which does it is in netcfg: http://anonscm.debian.org/gitweb/?p=d-i/netcfg.git;a=tree Here's the relevant code from netcfg-common.c: 1277 debconf_capb(client, "progresscancel"); 1278 debconf_subst(client, "netcfg/link_detect_progress", "interface", if_name); 1279 debconf_progress_start(client, 0, 100, "netcfg/link_detect_progress"); 1280 for (count = 0; count < link_waits; count++) { 1281 usleep(250000); 1282 if (debconf_progress_set(client, 50 * count / link_waits) == 30) { 1283 /* User cancelled on us... bugger */ 1284 rv = 0; 1285 break; 1286 } 1287 if (ethtool_lite(if_name) == 1) /* ethtool-lite's CONNECTED */ { 1288 if (gateway.s_addr && !is_wireless_iface(if_name)) { 1289 for (count = 0; count < gw_tries; count++) { 1290 if (di_exec_shell_log(arping) == 0) 1291 break; 1292 if (debconf_progress_set(client, 50 + 50 * count / gw_tries) == 30) 1293 break; 1294 } 1295 } 1296 rv = 1; 1297 break; 1298 } 1299 debconf_progress_set(client, 100); 1300 } Only two non-trivial things here: execution of ethtool_lite(if_name) and invocation of arping. I would put my money on the former (defined in ethtool_lite.c), because it uses low-level ioctls to query the interface state. You can test whether running it would trigger a failure on your machine by downloading ethtool_lite.c and building it as a standalone binary, the following commands appear to do the trick: $ sudo apt-get build-dep netcfg [...] $ gcc -o ethtool-lite -DTEST ethtool-lite.c -ldebconfclient -ldebian-installer $ sudo ./ethtool-lite eth0 ethtool-lite: eth0 is connected. $ If that triggers a null pointer exception on your machine (try it both with and without network brought up and check dmesg afterwards), we will be in a very good position to report it upstream for fixing. Best regards, -- Jurij Smakov ju...@wooyd.org Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120511220421.ga10...@wooyd.org