jobs in mirror list, so we should not update the last sched
fences in TDR.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index
Make sure main thread won't update last_sched fence when entity
is cleanup.
Fix a racing issue which is caused by putting last_sched fence
twice. Running vulkaninfo in tight loop can produce this issue
as seeing wild fence pointer.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/sche
Fix the potential memleak since scheduler main thread always
hold one last_sched fence.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/scheduler/gpu_scheduler.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm
path
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 26 ++
2 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
b/drivers/gpu/drm/amd/a
Both doorbell and polling mem are working on Tonga VF. SDMA issue
happens because SDMA engine accepts doorbell writes even if it's
inactive, that introduces conflict when world switch routine update
wptr though polling memory. Use polling mem in driver too.
Signed-off-by: Pixel Ding
---
dr
Retry at drm_dev_register instead of amdgpu_device_init.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 11 +--
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 15 ++-
3 files changed, 13 insertions
KIQ ring submission is used for register accessing on SRIOV
VF that could happen both in irq enabled and irq disabled cases.
Inversion lock could happen on adev->ring_lru_list_lock, while
this operation is useless and just adds overhead in this use
case.
Signed-off-by: Pixel Ding
---
driv
From: pding
This lock is used during register accessing in SRIOV guest.
The register accessing could happen both in irq enabled and
irq disabled cases. Always use irq-safe lock.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 10 ++
1 file changed, 6 insertions(+),
From: pding
This lock is used during register accessing in SRIOV guest
since KIQ uses general ring submission (amdgpu_ring_commit).
The register accessing could happen both in irq enabled and
irq disabled cases. Always use irq-safe lock.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdg
From: pding
It introduces 900ms latency in exclusive mode which causes failure
of driver loading. Host can resize the BAR before guest staring,
so the resizing is not necessary here.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4
1 file changed, 4 insertions(+)
Hi Felix,
Please review.
[PATCH 1/2] drm/amdkfd: initialise kfd inside amdgpu_device_init
As you suggested, move kfd init/fini inside amdgpu_device_init.
Other changes for KFD interfaces are dropped.
[PATCH 2/2] drm/amdgpu: release exclusive mode after hw_init
__
From: pding
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 3 ---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
i
From: pding
Also finalize kfd inside amdgpu_device_fini. kfd device_init needs
SRIOV exclusive accessing. Try to gather exclusive accessing to
reduce time consuming.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 5 ---
From: pding
KFD device init requires exclusive mode. Driver can release
exclusive mode after hw_init if KFD is not enabled.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 5 +++--
2 files changed, 6 insertions(+), 2 del
From: pding
KGD is possible not fully initialised in probe phase, so it's not
safe to pass it in if kfd code tries to refer KGD here.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 6 +++---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 8
drivers/gpu/d
From: pding
Add amdgpu_device_alloc() which was part of previous
amdgpu_device_init(). Then it's flexible to handle init
sequence since kfd has dependency to amdgpu_device base
fields.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 4 +--
drivers/gpu/drm/amd/amdgpu/amdg
Hi Oded,
There're 3 patches for releasing exclusive mode after hw_init if
KIQ is not enabled.
[PATCH 1/3] drm/amdgpu: wrap allocation for amdgpu_device
Allocation of amdgpu_device and base fields are wrapped put it ahead.
[PATCH 2/3] drm/amdgpu: release exclusive mode after hw_init if no
[PATCH
From: pding
kgd field is dependent on kgd device_init. Move the assignment
to kfd device_init.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 6 +++---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 8
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 6
From: pding
Move kfd probe prior to device init. Release exclusive mode
after hw_init if kfd is not enabled.
v2:
- pass pdev param
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 3 ++-
drivers/gpu/drm/amd/amdgpu/amd
Hi Oded,
Please review.
[PATCH 1/2] drm/amdkfd: initialise kgd field inside kfd device_init
As you suggested, move kgd assignment to device_init
[PATCH 2/2] drm/amdgpu: release exclusive mode after hw_init if no
We still need this change because pdev is passed in.
___
From: pding
v2:
- readable
Reported-by: Sun Gary
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
index 818ec0f..2b435c0 1006
From: pding
This is caused of that hypervisor fails to handle request, one known
issue is MMIO unblocking timeout. In theory we can retry init here.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/g
From: pding
Reported-by: Sun Gary
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
index 818ec0f..f291fb2 100644
--- a/drivers/gpu/dr
From: pding
Move kfd probe prior to device init. Release exclusive mode
after hw_init if kfd is not enabled.
v2:
- pass pdev param
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 3 ++-
drivers/gpu/drm/amd/amdgpu/amd
From: pding
Move kfd probe prior to device init. Release exclusive mode
after hw_init if kfd is not enabled.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 5 +++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff
From: pding
The exclusive mode has real-time limitation in reality, such like being
done in 300ms. It's easy observed if running many VF/VMs in single host
with heavy CPU workload.
If we find the init fails due to exclusive mode timeout, try it again.
v2:
- rewrite the condition for readable v
From: pding
Hi Alex,
Split the wait_reset patch to 2. Part 2.
please review.
---
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 1 +
drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 6 ++
2 files changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
b/dr
From: pding
The exclusive mode has real-time limitation in reality, such like being
done in 300ms. It's easy observed if running many VF/VMs in single host
with heavy CPU workload.
If we find the init fails due to exclusive mode timeout, try it again.
v2:
- rewrite the condition for readable v
From: pding
Hi Alex,
Split the wait_reset patch to 2. Part 1.
please review.
---
Driver can use this interface to check if there's a function level
reset done in hypervisor. It's helpful when IRQ handler for reset
is not ready, or special handling is required.
Signed-off-by: pding
---
drive
From: pding
Normally all waiting get timeout if there's one.
Release the lock and return immediately when timeout happens.
v2:
- set the se_sh to broadcase before return
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 8
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 8 +++
From: pding
This is v2 of init log changing. init_log parm and SRIOV specific macro
are removed, so I rename the patch. Exclusive mode consumes 230ms with
this patch and log redirection, that is acceptable.
Please review.
---
When this VF stays in exclusive mode for long, other VFs will be
impa
From: pding
The exclusive mode has real-time limitation in reality, such like being
done in 300ms. It's easy observed if running many VF/VMs in single host
with heavy CPU workload.
If we find the init fails due to exclusive mode timeout, try it again.
Signed-off-by: pding
---
drivers/gpu/drm/
From: pding
Normally all waiting get timeout if there's one.
Release the lock and return immediately when timeout happens.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 ++
2 files changed, 12 insertions(+)
diff --git
From: pding
Driver can use this interface to check if there's a function level
reset done in hypervisor.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 16
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 ++
drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c| 1 +
d
From: pding
After calling pci_disable_msi() and pci_enable_msi(), VF can't
receive interrupt anymore. This may introduce problems in module
reloading or retrying init.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
d
From: pding
MMIO space can be blocked on virtualised device. Add this function
to check if MMIO is blocked or not.
Todo: need a reliable method such like communation with hypervisor.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu
The subsequent operations don't need exclusive accessing hardware.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 3 ---
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/a
From: pding
When this VF stays in exclusive mode for long, other VFs will be
impacted.
The redundant messages causes exclusive mode timeout when they're
redirected. That is a normal use case for cloud service to redirect
guest log to virtual serial port.
Introduce init_log param to control logs
This is the second patch series merged or reimplemented from SRIOV
branch. It changes the init time consuming.
Exclusive mode means that a VF occupies hardware and other VFs need
to wait until this VF releases exclusive mode. The timing of exclusive
mode is limited to avoid starvation causing unav
From: pding
v2:
- only change in IGP reading bios.
v3:
- merge functions and apply on all bios checking.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 41 +-
1 file changed, 18 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/am
From: pding
The post checking on scratch registers isn't reliable for virtual function.
v2: only change in IGP reading bios.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
From: pding
Register accessing is performed when IRQ is disabled. Never sleep in
this function.
Known issue: dead sleep in many use cases of index/data registers.
v2: wrap polling fence functions. don't trigger IRQ for polling in
case of wrongly fence signal.
v3: handle wrap round gracefully.
From: pding
Only for GFX ring. This can help checking MCBP feature.
v2: report more fence offs.
The fence at the end of the frame will indicate the completion status.
If the frame completed normally, the fence is written to the address
given in the EVENT_WRITE_EOP packet. If preemption occurred
From: pding
Register accessing is performed when IRQ is disabled. Never sleep in
this function.
Known issue: dead sleep in many use cases of index/data registers.
v2: wrap polling fence functions. don't trigger IRQ for polling in
case of wrongly fence signal.
Signed-off-by: pding
---
drivers
From: pding
The post checking on scratch registers isn't reliable for virtual
function.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/
This is the first patch series to make latest staging driver
stable for SRIOV VF on both Tonga and Vega. Patches are merged
from SRIOV branches or reimplemented, including bug fixes and
small features requested by SRIOV users.
v2: "drm/amdgpu: workaround for VM fault caused by SDMA" is dropped.
From: pding
Register accessing is performed when IRQ is disabled. Never sleep in
this function.
Known issue: dead sleep in many use cases of index/data registers.
v2: wrap polling fence functions. don't trigger IRQ for polling in
case of wrongly fence signal.
Signed-off-by: pding
---
drivers
From: pding
The polling memory was standalone in VRAM before, so the HDP flush
introduced latency that hides a VM fault issue. Now polling memory
leverages the WB in system memory and HDP flush is not required, the
VM fault at same page happens.
Add delay back to workaround until the root cause
This is the first patch series to make latest staging driver
stable for SRIOV VF on both Tonga and Vega. Patches are merged
from SRIOV branches or reimplemented, including bug fixes and
small features requested by SRIOV users.
v2: "drm/amdgpu: workaround for VM fault caused by SDMA" is dropped.
From: pding
Only for GFX ring. This can help checking MCBP feature.
v2: report more fence offs.
The fence at the end of the frame will indicate the completion status.
If the frame completed normally, the fence is written to the address
given in the EVENT_WRITE_EOP packet. If preemption occurred
From: pding
The polling memory was standalone in VRAM before, so the HDP flush
introduced latency that hides a VM fault issue. Now polling memory
leverages the WB in system memory and HDP flush is not required, the
VM fault at same page happens.
Add delay back to workaround until the root cause
From: pding
The post checking on scratch registers isn't reliable for virtual
function.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/
From: pding
Register accessing is performed when IRQ is disabled. Never sleep in
this function.
Known issue: dead sleep in many use cases of index/data registers.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++---
This is the first patch series to make latest staging driver
stable for SRIOV VF on both Tonga and Vega. Patches are merged
from SRIOV branches or reimplemented, including bug fixes and
small features requested by SRIOV users.
Please help reviewing, Thanks.
[PATCH 1/4] drm/amdgpu: always consid
From: pding
Only report fence for GFX ring. This can help checking MCBP feature.
Signed-off-by: pding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 +++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
Both Tonga and Vega register SPECs indicate that this registers only
use 31:2 bits in DW. SRIOV test case immediately fails withtout this
shift.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
2 files changed, 2
Both Tonga and Vega register SPECs indicate that this registers only
use 31:2 bits in DW. SRIOV test case immediately fails withtout this
shift.
v2: write to ADDR field
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 9 +
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
hout VALID bit for FLR completion,
driver should handle it without checking.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_
hout VALID bit for FLR completion,
driver should handle it without checking.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
ind
CPU is not efficient to do this job. There's a failure caused by this
is that handshaking gets timeout of SRIOV virtual function.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/dr
VF uses KIQ to access registers. When VM fault occurs, the driver
can't get back the fence of KIQ submission and runs into CPU soft
lockup.
v2: print IV entry info
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 7 +++
1 file changed, 7 insertions(+)
diff --
VF uses KIQ to access registers. When VM fault occurs, the driver
can't get back the fence of KIQ submission and runs into CPU soft
lockup.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/a
VF uses KIQ to access registers that invoking fence_wait to get the
accessing completed. When VM fault occurs, the driver can't sleep in
interrupt context.
For some test cases, VM fault is 'legal' and shouldn't cause driver soft
lockup.
Signed-off-by: Pixel Ding
---
driver
s sure the host driver has already recieved the ACK message and
handle it like:
A: send MSG-> clear VALID->
B: send ACK-> check VALID
Signed-off-by: Ken Xue
Acked-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 26 +-
1 fil
s sure the host driver has already recieved the ACK message and
handle it like:
A: send MSG-> clear VALID->
B: send ACK-> check VALID
Signed-off-by: Ken Xue
Acked-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 26 +-
1 fil
When mutiple VFs try to enter exclusive mode at the same time, the
looping mechansim doesn't help to ensure each can get it because it
only loops active VFs, then the last one has to wait for a long
interval.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h | 2 +-
1
The SRIOV host driver cleans framebuffer for each VF, guest driver
needn't this action which costs much time on some virtualization
platform, otherwise it might get timeout to initialize.
Signed-off-by: Pixel Ding
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c | 4 +++-
1 file changed, 3 inser
From: Ding Pixel
Return success when the ring is properly initialized, otherwise return
failure.
Tonga SRIOV VF doesn't have UVD and VCE engines, the initialization of
these IPs is bypassed. The system crashes if application submit IB to
their rings which are not ready to use. It could be a comm
From: Ding Pixel
Return success when the ring is properly initialized, otherwise return
failure.
Tonga SRIOV VF doesn't have UVD and VCE engines, the initialization of
these IPs is bypassed. The system crashes if application submit IB to
their rings which are not ready to use. It could be a comm
69 matches
Mail list logo