To summary
#1 Symptom:
On AMD EPYC, ROME server platform, SATA hot plug not working on Ubuntu 22.04
LTS.
#2 Root cause:
Ubuntu kernel compile with configure CONFIG_SATA_MOBILE_LPM_POLICY=3. During
devices scan ( boot, pci scan, ahci driver load), if didn't detected any valid
sata
@Mario, May I ask you another 2 extending questions.
A. What is results in a AMD client( like laptop)? I thought this hot plug
works out of box in client, what cause this difference from kernel/code?
B. I was told that hot plug works out of box on their Intel's server.
What cause this
@KH:
Thanks for sharing that! I agree with you. I've sent this up to explicitly
document the new behavior.
https://lore.kernel.org/linux-ide/20220524170508.563-4-mario.limoncie...@amd.com/T/#u
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed t
@Mario, Thanks for those deep and detail analyse for the root cause.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Title:
SATA device hot plug regression on AMD EPYC (Asus) server
To manage
According to AHCI spec v1.3.1, "7.3 Native Hot Plug Support", once LPM
is enabled the hotplug needs to be disabled.
So I agree with 2), I think we should write document and let users know
how to change the LPM for hotplugging detection.
For 3) I don't think we need to change the way it works, bec
0-> ATA_LPM_UNKNOWN,
1-> ATA_LPM_MAX_POWER,
2-> ATA_LPM_MED_POWER,
3-> ATA_LPM_MED_POWER_WITH_DIPM, /* Med power + DIPM as win IRST does */
4-> ATA_LPM_MIN_POWER_WITH_PARTIAL, /* Min Power + partial and slumber */
5-> ATA_LPM_MIN_POWER, /* Min power + no partial (slumber onl
Hi Mario,
The test results for 5.18.0-4 are below:
Kernel parameters sata hot plug works or not
default No
hci.mobile_lpm_policy=0 Yes
hci.mobile_lpm_policy=1 Yes
hci.mobile_lpm_policy=2 Yes
By the way, what is differen
Can you please collect the trace again with "scsi" prefix:
$ sudo trace-cmd record -p function -l "*sata*" -l "*ahci*" -l "*scsi*"
Thanks again!
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
For #40, trace data for hotplug not working
** Attachment added: "trace-no-hotplug.zip"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971576/+attachment/5591529/+files/trace-no-hotplug.zip
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscri
For #39, Marios's,partner said all fails ( hotplug was not working).
I guess we were expected it can work. So I will check him if he set
kernel parameter correctly.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpa
For #40, I asked them to collect trace data on both hotplug working and
not.
** Attachment added: "trace-hotplug.zip"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971576/+attachment/5591527/+files/trace-hotplug.zip
--
You received this bug notification because you are a member of Ub
I guess the hotplug event is filtered out. Please collect ftrace log so we can
investigate the issue:
$ sudo trace-cmd record -p function -l "*sata*" -l "*ahci*"
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.n
On any failing kernel, can you try to add "ahci.mobile_lpm_policy=0" to
the kernel command line and confirm if that fixes things?
If it does, could you also compare "ahci.mobile_lpm_policy=1" and
"ahci.mobile_lpm_policy=2"?
--
You received this bug notification because you are a member of Ubuntu
Hey Haled, Mario,
Both 5.15 and 5.18 are failed, kernel were from (Khaled's) #37 building.
"uname -r "output from screenshots are:
5.15.0-32-generic
5.18.0-4-generic
thanks!
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
htt
Hey Zhanglei.
We do not have a 5.16 Ubuntu kernel. We do have 5.15 kernels.
The current mainline version is 5.18 not 5.16.
I have built 2 kernels, one 5.18 and one 5.15 kernel for you.
5.15:
https://kernel.ubuntu.com/~kmously/kernel-kmously-45afdcc-gwrv/
5.18:
https://kernel.ubuntu.com/~kmou
No need to build a kernel, at least for a quick test you can pick up one from
the mainline PPA and try it.
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.18-rc7/
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpa
Hello Kahled, For Mario's comments#33, will you or can you build a main
kernel (v5.16) for testing? //thanks.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Title:
SATA device hot plug regres
Hello Kahled,
First, thanks for your effort to find this root cause in a short time.
For your comment #31, parnter has confirmed it is same for other type of disk.
I also asked them to raise this issue to AMD and AMD technical guy
reply below.
"Yes, 1022:7901h is AMD SATA AHCI controlle
> but I believe you are using an AMD EPYC server, so I don't understand
why you would be affected at all.
It may be that this server silicon has the same HW IP as the client
chip. The change was tested on client chips before submitting. What it
is supposed to do is set the policy for the drives
Can you please give mainline kernel a try?
If mainline kernel still doesn't work, please run the following:
$ sudo trace-cmd record -p function -l "*ata*" -l "*ahci*"
... then plug the disk, Ctrl + C on trace-cmd, attach trace.dat here.
--
You received this bug notification because you are a mem
Thanks Zhanglei. Great. We have identified the problem patch, which is
this one:
380cd49e207ba4 ata: ahci: Add Green Sardine vendor ID as
board_ahci_mobile
But I am not really sure why this patch is causing a problem.
The patch only adds one new line as you can see here:
https://pastebin.ubunt
Hello Khaled, the Version 3007 failed. thanks! //Mao
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Title:
SATA device hot plug regression on AMD EPYC (Asus) server
To manage notifications
Hello Zhanglei. Thanks. This means:
3000: BUG
3002: NO BUG
3003: NO BUG
3004: NO BUG
3005: BUG
3006: NO BUG
There are only 5 patches between 3005 and 3006 so one of them is the problem.
You can see the list of patches here:
https://pastebin.canonical.com/p/TkvDGcfHWk/plain/
Only one of them
Hello Khaled, the 3006 kernel can pass. thanks //Mao
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Title:
SATA device hot plug regression on AMD EPYC (Asus) server
To manage notifications a
Hello Zhanglei. Thanks for the update. This means:
3000: BUG
3002: NO BUG
3003: NO BUG
3004: NO BUG
3005: BUG
We are getting very close now. Please try version 3006 from this link:
https://kernel.ubuntu.com/~kmously/kernel-kmously-c756bab-RGcD/
Please make sure you are running 3006. Thanks.
Hello Khaled, Version 3005 hot plug fail. Thanks.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Title:
SATA device hot plug regression on AMD EPYC (Asus) server
To manage notifications abo
Hello again Zhanglei.
This means that so far:
3000: BUG
3002: NO BUG
3003: NO BUG
3004: NO BUG
You can find kernel 3005 here:
https://kernel.ubuntu.com/~kmously/kernel-kmously-039f206-aRqC/
Please ensure you are running version -3005. Thank you
--
You received this bug notification because y
Thanks Zhanglei. I am building 3005 now. It should be ready in about 30
minutes. I will update again soon
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Title:
SATA device hot plug regression
Hello Kahled, Version 3004 hot plug can pass. thanks //Mao
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Title:
SATA device hot plug regression on AMD EPYC (Asus) server
To manage notificat
Thanks Zhanglei. This means that so far:
3000: BUG
3002: NO BUG
3003: NO BUG
We are getting closer. This is the remaining set of patches:
00501b41aaf73f (tag: test-3000, tag: fail1) s390/pci: move pseudo-MMIO to
prevent MIO overlap
14914e943b0ca5 cpufreq: Fix get_cpu_device() failure in add_cp
Hello Kahled, Version 3003 hot plug passed. //thanks
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Title:
SATA device hot plug regression on AMD EPYC (Asus) server
To manage notifications a
Hello Kahled, thanks for quick response. Your understanding for 3000 bug
and 3002 no bug is correct. I have asked them to verify 3003 now.
//thanks
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Hello Zhanglei. Thanks for the update.
From my understanding, so far:
3000: BUG
3002: NO BUG
I have the next kernel, 3003, available here:
https://kernel.ubuntu.com/~kmously/kernel-kmously-8243717-Kutu
Once again, please ensure that you are testing with -3003 when testing.
Thank you
--
Yo
Hello Khaled, Version 3002 hot plug can pass. //thanks.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Title:
SATA device hot plug regression on AMD EPYC (Asus) server
To manage notification
Hello Khaled, thanks for share detail information for your patching and
explains of unable building mulit-kernel.
The partner engineer did use 3000 kernel for sata hotplug test as they
sent me a screenshot of "uname -r" output.
I have asked them to test new 3002 kernel now.
--
You received this
Hello Zhanglei,
Thanks for the update. I am a little surprised that this kernel failed.
There are 2 SATA related changes in kernel -100 which I suspected were
the root cause. However, the kernel that I provided (version 3000) did
NOT contain those patches, so I expected it to work.
The patches th
Hello Khale, hot plug test is fail for this version kernel. Please build
next kernel. By the way, if it possible to build multi kernel, so that
they can test them all in one shot. You know, the partner engineer is
working on home this week and he have to look for someone else on office
each time.
Hello Khaled, I have asked partner engineer to test your build kernel.
thanks.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Title:
SATA device hot plug regression on AMD EPYC (Asus) server
Hello Zhanglei, thanks for confirming the working/broken versions.
I am not sure if I will be able to reproduce the issue myself. There are
270 changes between -99 and -100. If you can help me bisect them, we
should be able to quickly identify the problem. Would you be able to
test the kernels I
** Summary changed:
- SATA device hot plug regression on AMD EYPC (Asus) server
+ SATA device hot plug regression on AMD EPYC (Asus) server
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971576
Titl
40 matches
Mail list logo