Hi Luke, > -----Original Message----- > From: Luke Jones <l...@ljones.dev> > Sent: Tuesday, June 10, 2025 6:00 AM > To: Borah, Chaitanya Kumar <chaitanya.kumar.bo...@intel.com>; Kurt Borja > <kuu...@gmail.com> > Cc: intel...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; Saarinen, > Jani <jani.saari...@intel.com>; Kurmi, Suresh Kumar > <suresh.kumar.ku...@intel.com>; De Marchi, Lucas > <lucas.demar...@intel.com>; Nikula, Jani <jani.nik...@intel.com>; linux- > in...@vger.kernel.org; platform-driver-...@vger.kernel.org > Subject: Re: [REGRESSION] on linux-next (next-20250509) > > On Mon, 9 Jun 2025, at 11:06 PM, Borah, Chaitanya Kumar wrote: > > Hi Luke, > > > > > >> -----Original Message----- > >> From: Kurt Borja <kuu...@gmail.com> > >> Sent: Wednesday, May 28, 2025 9:11 PM > >> To: Luke Jones <l...@ljones.dev>; Borah, Chaitanya Kumar > >> <chaitanya.kumar.bo...@intel.com> > >> Cc: intel...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; > >> Saarinen, Jani <jani.saari...@intel.com>; Kurmi, Suresh Kumar > >> <suresh.kumar.ku...@intel.com>; De Marchi, Lucas > >> <lucas.demar...@intel.com>; Nikula, Jani <jani.nik...@intel.com>; > >> linux- in...@vger.kernel.org; platform-driver-...@vger.kernel.org > >> Subject: Re: [REGRESSSION] on linux-next (next-20250509) > >> > >> Hi Luke, > >> > >> On Wed May 28, 2025 at 10:07 AM -03, Luke Jones wrote: > >> > On Wed, 28 May 2025, at 12:08 PM, Borah, Chaitanya Kumar wrote: > >> >> Hello Luke, > >> >> > >> >> Hope you are doing well. I am Chaitanya from the linux graphics > >> >> team in > >> Intel. > >> >> > >> >> This mail is regarding a regression we are seeing in our CI > >> >> runs[1] on linux-next repository. > >> > > >> > Can you tell me if the fix here was included? > >> > https://lkml.org/lkml/2025/5/24/152 > >> > > >> > I could change to: > >> > static void asus_s2idle_check_register(void) { > >> > // Only register for Ally devices > >> > if (dmi_check_system(asus_rog_ally_device)) { > >> > if (acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops)) > >> > pr_warn("failed to register LPS0 sleep handler in > >> > asus-wmi\n"); > >> > } > >> > } > >> > > >> > but I don't really understand what is happening here. The inner > >> > lps0 > >> functions won't run unless use_ally_mcu_hack is set. > >> > >> The RIP is caused by a "list_add double add" warning. > >> > >> After reading the log, I believe this is happening because > >> asus_wmi_register_driver() is called a second time by eeepc_wmi after > >> asus_nb_wmi, which implies > >> > >> asus_wmi_probe() > >> -> acpi_register_lps0_dev(&asus_ally_s2idle_dev_ops) > >> > >> is called twice and the warning is triggered. > >> > >> Line [1] makes me think this could be a race condition, as > >> asus_wmi_register_driver() may be called concurrently. > >> > >> [1] > >> https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-driver > >> s- > >> x86.git/tree/drivers/platform/x86/asus-wmi.c?h=for-next#n5101 > >> > > > > Any update on this? It has now hit 6.16-rc1 > > > > https://intel-gfx-ci.01.org/tree/drm-tip/igt@run...@aborted.html > > I will send a patch asap. Haven't been able to do so with work and 3 days of > flights. >
Gentle reminder. > > Regards > > > > Chaitanya > > > >> > > >> > I will do my best to fix but I need to understand what happened a bit > better. > >> > > >> > regards, > >> > Luke. > >> > > >> >> Since the version next-20250509 [2], we are seeing the following > >> >> regression > >> >> > >> >> ````````````````````````````````````````````````````````````````````````````````` > >> >> <4>[ 5.400826] ------------[ cut here ]------------ > >> >> <4>[ 5.400832] list_add double add: new=ffffffffa07c0ca0, > >> >> prev=ffffffff837e9a60, next=ffffffffa07c0ca0. > >> >> <4>[ 5.400845] WARNING: CPU: 0 PID: 379 at lib/list_debug.c:35 > >> >> __list_add_valid_or_report+0xdc/0xf0 > >> >> <4>[ 5.400850] Modules linked in: cmdlinepart(+) eeepc_wmi(+) > >> >> asus_nb_wmi(+) asus_wmi spi_nor(+) sparse_keymap mei_pxp mtd > >> >> platform_profile kvm_intel(+) mei_hdcp wmi_bmof kvm irqbypass > >> >> polyval_clmulni usbhid ghash_clmulni_intel snd_hda_intel hid > >> >> sha1_ssse3 > >> >> r8152(+) binfmt_misc aesni_intel snd_intel_dspcfg mii r8169 > >> >> snd_hda_codec rapl video snd_hda_core intel_cstate snd_hwdep > >> >> realtek snd_pcm snd_timer mei_me snd i2c_i801 i2c_mux > >> >> spi_intel_pci idma64 soundcore spi_intel i2c_smbus mei > >> >> intel_pmc_core nls_iso8859_1 pmt_telemetry pmt_class > >> >> intel_pmc_ssram_telemetry pinctrl_alderlake intel_vsec acpi_tad > >> >> wmi acpi_pad dm_multipath msr nvme_fabrics fuse efi_pstore nfnetlink > ip_tables x_tables autofs4 > >> >> <4>[ 5.400904] CPU: 0 UID: 0 PID: 379 Comm: (udev-worker) Tainted: > G > >> >> S > >> >> 6.15.0-rc7-next-20250526-next-20250526-g3be1a7a31fbd+ #1 > >> >> PREEMPT(voluntary) > >> >> <4>[ 5.400907] Tainted: [S]=CPU_OUT_OF_SPEC > >> >> <4>[ 5.400908] Hardware name: ASUS System Product Name/PRIME > >> Z790-P > >> >> WIFI, BIOS 0812 02/24/2023 > >> >> <4>[ 5.400909] RIP: 0010:__list_add_valid_or_report+0xdc/0xf0 > >> >> <4>[ 5.400912] Code: 16 48 89 f1 4c 89 e6 e8 a2 c5 5f ff 0f 0b 31 c0 > >> >> e9 72 ff ff ff 48 89 f2 4c 89 e1 48 89 fe 48 c7 c7 68 ba 0f 83 e8 > >> >> 84 > >> >> c5 5f ff <0f> 0b 31 c0 e9 54 ff ff ff 66 66 2e 0f 1f 84 00 00 00 > >> >> 00 > >> >> 00 90 > >> >> 90 > >> >> <4>[ 5.400914] RSP: 0018:ffffc90002763588 EFLAGS: 00010246 > >> >> <4>[ 5.400916] RAX: 0000000000000000 RBX: ffffffffa07c0ca0 RCX: > >> >> 0000000000000000 > >> >> <4>[ 5.400918] RDX: 0000000000000000 RSI: 0000000000000000 > RDI: > >> >> 0000000000000000 > >> >> <4>[ 5.400919] RBP: ffffc90002763598 R08: 0000000000000000 > R09: > >> >> 0000000000000000 > >> >> <4>[ 5.400920] R10: 0000000000000000 R11: 0000000000000000 > R12: > >> >> ffffffffa07c0ca0 > >> >> <4>[ 5.400921] R13: ffffffffa07c0ca0 R14: 0000000000000000 R15: > >> >> ffff8881212d6da0 > >> >> <4>[ 5.400923] FS: 0000778637b418c0(0000) > GS:ffff8888dad0c000(0000) > >> >> knlGS:0000000000000000 > >> >> <4>[ 5.400926] CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > >> >> <4>[ 5.400928] CR2: 00007786373b80b2 CR3: 0000000116faa000 > CR4: > >> >> 0000000000f50ef0 > >> >> <4>[ 5.400931] PKRU: 55555554 > >> >> <4>[ 5.400933] Call Trace: > >> >> <4>[ 5.400935] <TASK> > >> >> <4>[ 5.400937] ? lock_system_sleep+0x2b/0x40 > >> >> <4>[ 5.400942] acpi_register_lps0_dev+0x58/0xb0 > >> >> <4>[ 5.400949] asus_wmi_probe+0x7f/0x1930 [asus_wmi] > >> >> <4>[ 5.400956] ? kernfs_create_link+0x69/0xe0 > >> >> `````````````````````````````````````````````````````````````````` > >> >> ``` > >> >> ```````````` > >> >> Detailed log can be found in [3]. > >> >> > >> >> After bisecting the tree, the following patch [4] seems to be the first > "bad" > >> >> commit > >> >> > >> >> `````````````````````````````````````````````````````````````````` > >> >> ``` ```````````````````````````````````` > >> >> commit feea7bd6b02d43a794e3f065650d89cf8d8e8e59 > >> >> Author: Luke D. Jones mailto:l...@ljones.dev > >> >> Date: Sun Mar 23 15:34:21 2025 +1300 > >> >> > >> >> platform/x86: asus-wmi: Refactor Ally suspend/resume > >> >> `````````````````````````````````````````````````````````````````` > >> >> ``` ```````````````````````````````````` > >> >> > >> >> We could not revert the patch because of merge conflict but > >> >> resetting to the parent of the commit seems to fix the issue > >> >> > >> >> Could you please check why the patch causes this regression and > >> >> provide a fix if necessary? > >> >> > >> >> Thank you. > >> >> > >> >> Regards > >> >> > >> >> Chaitanya > >> >> > >> >> [1] https://intel-gfx-ci.01.org/tree/linux-next/combined-alt.html? > >> >> [2] > >> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.gi > >> >> t/c > >> >> ommit/?h=next-20250509 > >> >> [3] > >> >> https://intel-gfx-ci.01.org/tree/linux-next/next-20250526/bat-rpls > >> >> -4/ > >> >> boot0.txt > >> >> [4] > >> >> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.gi > >> >> t/c > >> >> ommit/?h=next- > >> 20250509&id=feea7bd6b02d43a794e3f065650d89cf8d8e8e59 > >> > >> > >> -- > >> ~ Kurt