On Thu, 13 Jul 2017, Arthur Marsh wrote: > Thanks for the advice, I've enabled CONFIG_FRAME_POINTER and managed to > catch 1 more log triggered by issuing a > > blkid > > command after inserting a USB stick. > > The problem is that I can't be certain of triggering the problem when > testing kernels using git-bisect, so my git-bisect efforts are not > guaranteed to get a result.
> Jul 12 18:51:00 localhost kernel: [ 1206.802306] NMI watchdog: Watchdog > detected hard LOCKUP on cpu 1 > Jul 12 18:51:00 localhost kernel: [ 1206.802308] Modules linked in: cmac arc4 > ecb md4 nls_utf8 cifs ccm dns_resolver fscache bnep bluetooth hmac drbg > ansi_cprng ecdh_generic nfc rfkill cpufreq_userspace cpufreq_conservative > snd_hrtimer cpufreq_powersave binfmt_misc fuse snd_emu10k1_synth > snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi_event snd_seq > max6650 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp > libiscsi scsi_transport_iscsi parport_pc ppdev lp parport ir_lirc_codec > lirc_dev rtl2832_sdr videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 > videobuf2_core videodev media fc0012 rtl2832 i2c_mux dvb_usb_rtl28xxu > dvb_usb_v2 amdkfd dvb_core rc_core edac_mce_amd radeon kvm_amd kvm irqbypass > wmi_bmof snd_hda_codec_hdmi snd_emu10k1 snd_hda_intel ttm sg drm_kms_helper > snd_hda_codec snd_util_mem snd_hda_core > Jul 12 18:51:00 localhost kernel: [ 1206.802336] snd_ac97_codec snd_rawmidi > snd_seq_device snd_hwdep snd_pcm_oss drm snd_mixer_oss ac97_bus snd_pcm > pcspkr evdev k10temp i2c_algo_bit serio_raw snd_timer emu10k1_gp snd gameport > sp5100_tco soundcore button acpi_cpufreq asus_atk0110 wmi ext4 mbcache crc16 > jbd2 crc32c_generic btrfs raid6_pq xor uas usb_storage sr_mod cdrom sd_mod > hid_generic usbhid hid ata_generic ohci_pci ahci firewire_ohci libahci > pata_atiixp firewire_core crc_itu_t i2c_piix4 ehci_pci ohci_hcd libata > ehci_hcd usbcore scsi_mod r8169 mii > Jul 12 18:51:00 localhost kernel: [ 1206.802359] CPU: 1 PID: 142 Comm: > usb-storage Not tainted 4.12.0+ #2751 > Jul 12 18:51:00 localhost kernel: [ 1206.802359] Hardware name: System > manufacturer System Product Name/M3A78 PRO, BIOS 1701 01/27/2011 > Jul 12 18:51:00 localhost kernel: [ 1206.802361] task: ffff927c610eef80 > task.stack: ffffba9501b78000 > Jul 12 18:51:00 localhost kernel: [ 1206.802366] RIP: > 0010:native_queued_spin_lock_slowpath+0x159/0x1c0 > Jul 12 18:51:00 localhost kernel: [ 1206.802366] RSP: 0018:ffffba9501b7bdc8 > EFLAGS: 00000002 > Jul 12 18:51:00 localhost kernel: [ 1206.802368] RAX: 0000000000000101 RBX: > ffff927c662f3030 RCX: 0000000000000001 > Jul 12 18:51:00 localhost kernel: [ 1206.802368] RDX: 0000000000000101 RSI: > 0000000000000001 RDI: ffff927c662f3030 > Jul 12 18:51:00 localhost kernel: [ 1206.802369] RBP: ffffba9501b7bde0 R08: > 0000000000000101 R09: 0000000000000100 > Jul 12 18:51:00 localhost kernel: [ 1206.802370] R10: 0000000000000000 R11: > 7fffffffffffffff R12: 0000000000000082 > Jul 12 18:51:00 localhost kernel: [ 1206.802370] R13: 0000000000002003 R14: > 0000000000001d4c R15: ffff927c610eef80 > Jul 12 18:51:00 localhost kernel: [ 1206.802371] FS: 0000000000000000(0000) > GS:ffff927c6fc40000(0000) knlGS:0000000000000000 > Jul 12 18:51:00 localhost kernel: [ 1206.802372] CS: 0010 DS: 0000 ES: 0000 > CR0: 0000000080050033 > Jul 12 18:51:00 localhost kernel: [ 1206.802373] CR2: 00000030007995d8 CR3: > 000000017b144000 CR4: 00000000000006e0 > Jul 12 18:51:00 localhost kernel: [ 1206.802374] Call Trace: > Jul 12 18:51:00 localhost kernel: [ 1206.802378] ? > _raw_spin_lock_irqsave+0x59/0x68 > Jul 12 18:51:00 localhost kernel: [ 1206.802389] scsi_eh_scmd_add+0x3f/0x150 > [scsi_mod] > Jul 12 18:51:00 localhost kernel: [ 1206.802393] > scsi_softirq_done+0xb5/0x150 [scsi_mod] > Jul 12 18:51:00 localhost kernel: [ 1206.802396] > __blk_mq_complete_request+0xd4/0x160 > Jul 12 18:51:00 localhost kernel: [ 1206.802397] > blk_mq_complete_request+0x18/0x20 > Jul 12 18:51:00 localhost kernel: [ 1206.802402] scsi_mq_done+0x21/0x80 > [scsi_mod] > Jul 12 18:51:00 localhost kernel: [ 1206.802405] > usb_stor_control_thread+0xf4/0x250 [usb_storage] > Jul 12 18:51:00 localhost kernel: [ 1206.802407] kthread+0x125/0x140 > Jul 12 18:51:00 localhost kernel: [ 1206.802409] ? > fill_inquiry_response+0x20/0x20 [usb_storage] > Jul 12 18:51:00 localhost kernel: [ 1206.802410] ? > kthread_create_on_node+0x70/0x70 > Jul 12 18:51:00 localhost kernel: [ 1206.802412] ret_from_fork+0x25/0x30 > Jul 12 18:51:00 localhost kernel: [ 1206.802413] Code: c2 74 a2 89 c2 89 d0 > 66 31 c0 39 c6 74 ea 4d 85 c9 c6 07 01 74 21 41 c7 41 08 01 00 00 00 eb 85 41 > ff c9 75 04 f3 c3 f3 90 8b 07 <84> c0 75 f8 66 c7 07 01 00 c3 f3 90 4c 8b 09 > 4d 85 c9 74 f6 eb This is pretty conclusive. The problem comes about because usb_stor_control_thread() calls scsi_mq_done() while holding shost->host_lock, and then scsi_eh_scmd_add() tries to acquire that same lock. I don't know why this didn't show up in earlier kernels. I guess some element of the call chain listed above must be new in 4.12. Christoph, what's the best way to fix this? Should usb-storage release the host lock before issuing the ->scsi_done callback? If so, does that change need to be applied to any kernels before 4.12? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html