On 2023-08-11 17:27, Chen, Xiaogang wrote:
On 8/11/2023 4:22 PM, Felix Kuehling wrote:
On 2023-08-11 17:12, Chen, Xiaogang wrote:
I know the original jira ticket. The system got RCU cpu stall, then
kernel enter panic, then no response or ssh. This patch let prange
list update task yield c
-Remove others, continue discussing internally
On 2023-08-11 17:12, Chen, Xiaogang wrote:
I know the original jira ticket. The system got RCU cpu stall, then
kernel enter panic, then no response or ssh. This patch let prange
list update task yield cpu after each range update. It can prevent
one checkpoint: I saw they use serial port for console at kernel
parameter: console=ttyS0,115200n8
*
Booting Linux using a console connection that is too slow to keep up
with the boot-time console-message rate. For example, a 115Kbaud
serial console can be/way/too slow to keep up w
If you have a complete kernel log, it may be worth looking at backtraces
from other threads, to better understand the interactions. I'd expect
that there is a thread there that's in an RCU read critical section. It
may not be in our driver, though. If it's a customer system, it may also
help to
On 8/11/2023 4:22 PM, Felix Kuehling wrote:
On 2023-08-11 17:12, Chen, Xiaogang wrote:
I know the original jira ticket. The system got RCU cpu stall, then
kernel enter panic, then no response or ssh. This patch let prange
list update task yield cpu after each range update. It can prevent
t
On 2023-08-11 17:12, Chen, Xiaogang wrote:
I know the original jira ticket. The system got RCU cpu stall, then
kernel enter panic, then no response or ssh. This patch let prange
list update task yield cpu after each range update. It can prevent
task holding mm lock too long.
Calling schedul
I don't understand why this loop is causing a stall. These stall
warnings indicate that there is an RCU grace period that's not making
progress. That means there must be an RCU read critical section that's
being blocked. But there is no RCU-read critical section in
svm_range_set_attr function.
I know the original jira ticket. The system got RCU cpu stall, then
kernel enter panic, then no response or ssh. This patch let prange list
update task yield cpu after each range update. It can prevent task
holding mm lock too long. mm lock is rw_semophore, not RCU mechanism.
Can you explain
On 2023-08-11 16:06, Felix Kuehling wrote:
On 2023-08-11 15:11, James Zhu wrote:
update_list could be big in list_for_each_entry(prange, &update_list,
update_list),
mmap_read_lock(mm) is kept hold all the time, adding schedule() can
remove
RCU stall on CPU for this case.
RIP: 0010:svm_rang
On 2023-08-11 15:11, James Zhu wrote:
update_list could be big in list_for_each_entry(prange, &update_list,
update_list),
mmap_read_lock(mm) is kept hold all the time, adding schedule() can remove
RCU stall on CPU for this case.
RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgp
update_list could be big in list_for_each_entry(prange, &update_list,
update_list),
mmap_read_lock(mm) is kept hold all the time, adding schedule() can remove
RCU stall on CPU for this case.
RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu]
Code: 00 00 00 bf 00 02 00 00 48 81 c2
11 matches
Mail list logo