from:"Chen, Guchun"

RE: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test

2021-07-29 Thread Chen, Guchun

[Public]

Hi Christian,

Thanks for your feedback.

Originally, drm_sched_fini is part of amdgpu_fence_driver_hw_fini, I did not 
move it.
Former patch " cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence " 
has dropped amdgpu_fence_driver_suspend, and called amdgpu_fence_driver_hw_fini 
instead in function amdgpu_device_suspend. I checked the code difference 
between  amdgpu_fence_driver_hw_fini and amdgpu_device_suspend, they are almost 
the same except drm_sched_fini part, so I think we can leave it as it is, while 
skipping the execution of drm_sched_fini in suspend/resume case.

Regards,
Guchun

-Original Message-
From: Koenig, Christian  
Sent: Thursday, July 29, 2021 7:11 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Gao, 
Likun ; Zhang, Hawking ; Deucher, 
Alexander 
Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test

Am 29.07.21 um 12:49 schrieb Guchun Chen:
> In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop 
> scheduler in s3 test, otherwise, fence errors will occur after resume.
> So introduce a new parameter to distingiush the case in this API.

NAK, the problem is rather that drm_sched_fini() is part of the sw shutdown and 
should never be called during hw_fini.

Christian.

>
> Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
> Signed-off-by: Guchun Chen 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 8 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   | 2 +-
>   3 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index b1d2dc39e8be..aaff8ebbb7dc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3844,7 +3844,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>   else
>   drm_atomic_helper_shutdown(adev_to_drm(adev));
>   }
> - amdgpu_fence_driver_hw_fini(adev);
> + amdgpu_fence_driver_hw_fini(adev, false);
>   
>   if (adev->pm_sysfs_en)
>   amdgpu_pm_sysfs_fini(adev);
> @@ -3941,7 +3941,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
> fbcon)
>   /* evict vram memory */
>   amdgpu_bo_evict_vram(adev);
>   
> - amdgpu_fence_driver_hw_fini(adev);
> + amdgpu_fence_driver_hw_fini(adev, adev->in_suspend);
>   
>   amdgpu_device_ip_suspend_phase2(adev);
>   /* evict remaining vram memory
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 49c5c7331c53..7e6a73c2919d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -515,14 +515,15 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
>   }
>   
>   /**
> - * amdgpu_fence_driver_fini - tear down the fence driver
> + * amdgpu_fence_driver_hw_fini - tear down the fence driver
>* for all possible rings.
>*
>* @adev: amdgpu device pointer
> + * @in_reset: indicator to distingiush device removal case or s3 case
>*
>* Tear down the fence driver for all possible rings (all asics).
>*/
> -void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev)
> +void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev, bool 
> +in_reset)
>   {
>   int i, r;
>   
> @@ -531,8 +532,9 @@ void amdgpu_fence_driver_hw_fini(struct 
> amdgpu_device *adev)
>   
>   if (!ring || !ring->fence_drv.initialized)
>   continue;
> - if (!ring->no_scheduler)
> + if (!ring->no_scheduler && !in_reset)
>   drm_sched_fini(&ring->sched);
> +
>   /* You can't wait for HW to signal if it's gone */
>   if (!drm_dev_is_unplugged(&adev->ddev))
>   r = amdgpu_fence_wait_empty(ring); diff --git 
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 27adffa7658d..42cbecfc26a3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -115,7 +115,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
> *ring,
>   int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring,
>  struct amdgpu_irq_src *irq_src,
>  unsigned irq_type);
> -void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev);
> +void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev, bool 
> +in_reset);
>   void amdgpu_fence_driver_sw_fi

RE: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test

2021-07-29 Thread Chen, Guchun

[Public]

Got you, so the target is to take this chance to make the code logic more clear 
instead of making it just workable on top of mixed functionality code.

I will post a more reasonable patch later on to address this. Thank you.

Regards,
Guchun

-Original Message-
From: Christian König  
Sent: Thursday, July 29, 2021 8:50 PM
To: Chen, Guchun ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org; Gao, Likun 
; Zhang, Hawking ; Deucher, Alexander 

Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test

Hi Guchun,

see below.

Am 29.07.21 um 14:39 schrieb Chen, Guchun:
> [Public]
>
> Hi Christian,
>
> Thanks for your feedback.
>
> Originally, drm_sched_fini is part of amdgpu_fence_driver_hw_fini, I did not 
> move it.

Yeah, not saying that this is your fault, you should just clean that up more 
thoughtfully.

> Former patch " cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence " 
> has dropped amdgpu_fence_driver_suspend, and called 
> amdgpu_fence_driver_hw_fini instead in function amdgpu_device_suspend. I 
> checked the code difference between  amdgpu_fence_driver_hw_fini and 
> amdgpu_device_suspend, they are almost the same except drm_sched_fini part, 
> so I think we can leave it as it is, while skipping the execution of 
> drm_sched_fini in suspend/resume case.

And exactly that's a bad idea and the reason why I already said during the 
review of patch "cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence" 
that the callers of those functions need to be adjusted instead.

1. If not already done please rename the functions as Hawking suggested as 
well, they should be amdgpu_fence_driver_hw_(init|fini) and 
amdgpu_fence_driver_sw_(init|fini).

2. Remove drm_sched_fini() from amdgpu_fence_driver_hw_fini() and add that to 
sw_fini instead.

3. Adjust the callers so that we have the same functionality as before. 
E.g. by calling both hw_fini and sw_fini at the same time.

Regards,
Christian.

>
> Regards,
> Guchun
>
> -Original Message-
> From: Koenig, Christian 
> Sent: Thursday, July 29, 2021 7:11 PM
> To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; 
> Gao, Likun ; Zhang, Hawking 
> ; Deucher, Alexander 
> 
> Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver 
> fini in s3 test
>
> Am 29.07.21 um 12:49 schrieb Guchun Chen:
>> In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to 
>> stop scheduler in s3 test, otherwise, fence errors will occur after resume.
>> So introduce a new parameter to distingiush the case in this API.
> NAK, the problem is rather that drm_sched_fini() is part of the sw shutdown 
> and should never be called during hw_fini.
>
> Christian.
>
>> Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
>> Signed-off-by: Guchun Chen 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>>drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 8 +---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   | 2 +-
>>3 files changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index b1d2dc39e8be..aaff8ebbb7dc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3844,7 +3844,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>>  else
>>  drm_atomic_helper_shutdown(adev_to_drm(adev));
>>  }
>> -amdgpu_fence_driver_hw_fini(adev);
>> +amdgpu_fence_driver_hw_fini(adev, false);
>>
>>  if (adev->pm_sysfs_en)
>>  amdgpu_pm_sysfs_fini(adev);
>> @@ -3941,7 +3941,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
>> fbcon)
>>  /* evict vram memory */
>>  amdgpu_bo_evict_vram(adev);
>>
>> -amdgpu_fence_driver_hw_fini(adev);
>> +amdgpu_fence_driver_hw_fini(adev, adev->in_suspend);
>>
>>  amdgpu_device_ip_suspend_phase2(adev);
>>  /* evict remaining vram memory
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 49c5c7331c53..7e6a73c2919d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -515,14 +515,15 @@ int amdgpu_fence_driver_init(struct amdgpu_device 
>> *adev)
>>}
>>
>>/**
>> - * amdgpu_fence_driver_fini - tear down the fence driver
>> + * amdgpu_fence_driver_hw_fini - tear down the fence driver
>> * for all possible rings.
>

RE: [PATCH] drm/amdgpu: Fix channel_index table layout for Aldebaran

2021-08-01 Thread Chen, Guchun

[Public]

/* number of umc channel instance with memory map register access */
-#define UMC_V6_7_CHANNEL_INSTANCE_NUM  4
+#define UMC_V6_7_UMC_INSTANCE_NUM  4
 /* number of umc instance with memory map register access */
-#define UMC_V6_7_UMC_INSTANCE_NUM  8
+#define UMC_V6_7_CHANNEL_INSTANCE_NUM  8

Please update the comments accordingly as well.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Mukul Joshi
Sent: Thursday, July 29, 2021 11:38 PM
To: amd-gfx@lists.freedesktop.org
Cc: Joshi, Mukul ; Clements, John ; 
Zhang, Hawking 
Subject: [PATCH] drm/amdgpu: Fix channel_index table layout for Aldebaran

Fix the channel_index table layout to fetch the correct channel_index when 
calculating physical address from normalized address during page retirement.
Also, fix the number of UMC instances and number of channels within each UMC 
instance for Aldebaran.

Signed-off-by: Mukul Joshi 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |  4 ++--  
drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 16   
drivers/gpu/drm/amd/amdgpu/umc_v6_7.h |  4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 7cf653f9e9a7..097230b5e946 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1171,8 +1171,8 @@ static void gmc_v9_0_set_umc_funcs(struct amdgpu_device 
*adev)
break;
case CHIP_ALDEBARAN:
adev->umc.max_ras_err_cnt_per_query = 
UMC_V6_7_TOTAL_CHANNEL_NUM;
-   adev->umc.channel_inst_num = UMC_V6_7_UMC_INSTANCE_NUM;
-   adev->umc.umc_inst_num = UMC_V6_7_CHANNEL_INSTANCE_NUM;
+   adev->umc.channel_inst_num = UMC_V6_7_CHANNEL_INSTANCE_NUM;
+   adev->umc.umc_inst_num = UMC_V6_7_UMC_INSTANCE_NUM;
adev->umc.channel_offs = UMC_V6_7_PER_CHANNEL_OFFSET;
if (!adev->gmc.xgmi.connected_to_cpu)
adev->umc.ras_funcs = &umc_v6_7_ras_funcs; diff --git 
a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
index 7da12110425c..bb30336b1e8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.c
@@ -30,17 +30,17 @@
 
 const uint32_t

umc_v6_7_channel_idx_tbl_second[UMC_V6_7_UMC_INSTANCE_NUM][UMC_V6_7_CHANNEL_INSTANCE_NUM]
 = {
-   {28, 12, 6, 22},{19, 3, 9, 25},
-   {20, 4, 30, 14},{11, 27, 1, 17},
-   {24, 8, 2, 18}, {15, 31, 5, 21},
-   {16, 0, 26, 10},{7, 23, 29, 13}
+   {28, 20, 24, 16, 12, 4, 8, 0},
+   {6, 30, 2, 26, 22, 14, 18, 10},
+   {19, 11, 15, 7, 3, 27, 31, 23},
+   {9, 1, 5, 29, 25, 17, 21, 13}
 };
 const uint32_t

umc_v6_7_channel_idx_tbl_first[UMC_V6_7_UMC_INSTANCE_NUM][UMC_V6_7_CHANNEL_INSTANCE_NUM]
 = {
-   {19, 3, 9, 25}, {28, 12, 6, 22},
-   {11, 27, 1, 17},{20, 4, 30, 14},
-   {15, 31, 5, 21},{24, 8, 2, 18},
-   {7, 23, 29, 13},{16, 0, 26, 10}
+   {19, 11, 15, 7, 3, 27, 31, 23},
+   {9, 1, 5, 29, 25, 17, 21, 13},
+   {28, 20, 24, 16, 12, 4, 8, 0},
+   {6, 30, 2, 26, 22, 14, 18, 10},
 };
 
 static inline uint32_t get_umc_v6_7_reg_offset(struct amdgpu_device *adev, 
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.h 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.h
index 81b8f1844091..57f2557e7aca 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_7.h
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_7.h
@@ -36,9 +36,9 @@
 #define UMC_V6_7_INST_DIST 0x4
 
 /* number of umc channel instance with memory map register access */
-#define UMC_V6_7_CHANNEL_INSTANCE_NUM  4
+#define UMC_V6_7_UMC_INSTANCE_NUM  4
 /* number of umc instance with memory map register access */
-#define UMC_V6_7_UMC_INSTANCE_NUM  8
+#define UMC_V6_7_CHANNEL_INSTANCE_NUM  8
 /* total channel instances in one umc block */
 #define UMC_V6_7_TOTAL_CHANNEL_NUM (UMC_V6_7_CHANNEL_INSTANCE_NUM * 
UMC_V6_7_UMC_INSTANCE_NUM)
 /* UMC regiser per channel offset */
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cguchun.chen%40amd.com%7C3e76860245e3435cc73608d952a6e901%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637631699096172598%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IDlUT%2B%2BIBKOPgWPQQ%2Fxrxh8ZQD7SpVn%2B4uiEvT3KPw4%3D&reserved=0

RE: [PATCH] drm/amdgpu: adjust fence driver enable sequence

2021-08-01 Thread Chen, Guchun

[Public]

Hi Lothian,

Thanks for your report. I have a following fix for this problem, will send it 
out soon for review.

Regards,
Guchun

From: amd-gfx  On Behalf Of Mike Lothian
Sent: Sunday, August 1, 2021 7:57 PM
To: Gao, Likun 
Cc: amd-gfx list ; Zhang, Hawking 

Subject: Re: [PATCH] drm/amdgpu: adjust fence driver enable sequence

Hi

This patch is causing me issues on my Skylake/Tonga PRIME laptop, reverting 
sorts it

More details here: 
https://gitlab.freedesktop.org/drm/amd/-/issues/1668

Cheers

Mike

On Wed, 28 Jul 2021 at 05:07, Likun Gao 
mailto:likun@amd.com>> wrote:
From: Likun Gao mailto:likun@amd.com>>

Fence driver was enabled per ring when sw init on per IP block before.
Change to enable all the fence driver at the same time after
amdgpu_device_ip_init finished.
Rename some function related to fence to make it reasonable for read.

Signed-off-by: Likun Gao mailto:likun@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  6 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d3a4299b1f30..77195a4e5c59 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3675,6 +3675,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
goto release_ras_con;
}

+   amdgpu_fence_driver_hw_init(adev);
+
dev_info(adev->dev,
"SE %d, SH per SE %d, CU per SH %d, active_cu_number %d\n",
adev->gfx.config.max_shader_engines,
@@ -3939,7 +3941,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
/* evict vram memory */
amdgpu_bo_evict_vram(adev);

-   amdgpu_fence_driver_suspend(adev);
+   amdgpu_fence_driver_hw_fini(adev);

amdgpu_device_ip_suspend_phase2(adev);
/* evict remaining vram memory
@@ -3984,7 +3986,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
fbcon)
dev_err(adev->dev, "amdgpu_device_ip_resume failed (%d).\n", r);
return r;
}
-   amdgpu_fence_driver_resume(adev);
+   amdgpu_fence_driver_hw_init(adev);


r = amdgpu_device_ip_late_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 72d9b92b1754..e2f606bca779 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -417,9 +417,6 @@ int amdgpu_fence_driver_start_ring(struct amdgpu_ring *ring,
}
amdgpu_fence_write(ring, atomic_read(&ring->fence_drv.last_seq));

-   if (irq_src)
-   amdgpu_irq_get(adev, irq_src, irq_type);
-
ring->fence_drv.irq_src = irq_src;
ring->fence_drv.irq_type = irq_type;
ring->fence_drv.initialized = true;
@@ -572,14 +569,14 @@ void amdgpu_fence_driver_fini_sw(struct amdgpu_device 
*adev)
 }

 /**
- * amdgpu_fence_driver_suspend - suspend the fence driver
+ * amdgpu_fence_driver_hw_fini - disable the fence driver
  * for all possible rings.
  *
  * @adev: amdgpu device pointer
  *
- * Suspend the fence driver for all possible rings (all asics).
+ * Disable the fence driver for all possible rings (all asics).
  */
-void amdgpu_fence_driver_suspend(struct amdgpu_device *adev)
+void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev)
 {
int i, r;

@@ -603,18 +600,18 @@ void amdgpu_fence_driver_suspend(struct amdgpu_device 
*adev)
 }

 /**
- * amdgpu_fence_driver_resume - resume the fence driver
+ * amdgpu_fence_driver_hw_init - enable the fence driver
  * for all possible rings.
  *
  * @adev: amdgpu device pointer
  *
- * Resume the fence driver for all possible rings (all asics).
+ * Enable the fence driver for all possible rings (all asics).
  * Not all asics have all rings, so each asic will only
  * start the fence driver on the rings it has using
  * amdgpu_fence_driver_start_ring().
  * Returns 0 for success.
  */
-void amdgpu_fence_driver_resume(struct amdgpu_device *adev)
+void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev)
 {
int i;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index e7d3d0dbdd96..64471018f5e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -117,8 +117,8 @@ int amdgpu_fence_driver_

RE: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-02 Thread Chen, Guchun

[Public]

Thank you, Christian.

Regarding fence_drv.initialized, it looks to a bit redundant, anyway let me 
look into this more.

Regards,
Guchun

-Original Message-
From: Christian König  
Sent: Monday, August 2, 2021 2:56 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Gao, 
Likun ; Koenig, Christian ; Zhang, 
Hawking ; Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)

Am 02.08.21 um 07:16 schrieb Guchun Chen:
> In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to stop 
> scheduler in s3 test, otherwise, fence related failure will arrive 
> after resume. To fix this and for a better clean up, move 
> drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of 
> driver shutdown, and should never be called in hw_fini.
>
> v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, to 
> keep sw_init and sw_fini paired.
>
> Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
> Suggested-by: Christian König 
> Signed-off-by: Guchun Chen 

It's a bit ambiguous now what fence_drv.initialized means, but I think we can 
live with that for now.

Patch is Reviewed-by: Christian König .

Regards,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
>   3 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index b1d2dc39e8be..9e53ff851496 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device 
> *adev,
>   
>   fence_driver_init:
>   /* Fence driver */
> - r = amdgpu_fence_driver_init(adev);
> + r = amdgpu_fence_driver_sw_init(adev);
>   if (r) {
> - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
> + dev_err(adev->dev, "amdgpu_fence_driver_sw_init failed\n");
>   amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 
> 0);
>   goto failed;
>   }
> @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
> fbcon)
>   }
>   amdgpu_fence_driver_hw_init(adev);
>   
> -
>   r = amdgpu_device_ip_late_init(adev);
>   if (r)
>   return r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 49c5c7331c53..7495911516c2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
> *ring,
>   }
>   
>   /**
> - * amdgpu_fence_driver_init - init the fence driver
> + * amdgpu_fence_driver_sw_init - init the fence driver
>* for all possible rings.
>*
>* @adev: amdgpu device pointer
> @@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
> *ring,
>* amdgpu_fence_driver_start_ring().
>* Returns 0 for success.
>*/
> -int amdgpu_fence_driver_init(struct amdgpu_device *adev)
> +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
>   {
>   return 0;
>   }
>   
>   /**
> - * amdgpu_fence_driver_fini - tear down the fence driver
> + * amdgpu_fence_driver_hw_fini - tear down the fence driver
>* for all possible rings.
>*
>* @adev: amdgpu device pointer
> @@ -531,8 +531,7 @@ void amdgpu_fence_driver_hw_fini(struct 
> amdgpu_device *adev)
>   
>   if (!ring || !ring->fence_drv.initialized)
>   continue;
> - if (!ring->no_scheduler)
> - drm_sched_fini(&ring->sched);
> +
>   /* You can't wait for HW to signal if it's gone */
>   if (!drm_dev_is_unplugged(&adev->ddev))
>   r = amdgpu_fence_wait_empty(ring); @@ -560,6 +559,9 @@ 
> void 
> amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev)
>   if (!ring || !ring->fence_drv.initialized)
>   continue;
>   
> + if (!ring->no_scheduler)
> + drm_sched_fini(&ring->sched);
> +
>   for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>   dma_fence_put(ring->fence_drv.fences[j]);
>   kfree(ring->fence_drv.fences);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
&g

RE: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-02 Thread Chen, Guchun

[Public]

Hi Alex,

I submitted the patch before your message, I will take care of this next time.

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Monday, August 2, 2021 9:35 PM
To: Chen, Guchun 
Cc: Christian König ; 
amd-gfx@lists.freedesktop.org; Gao, Likun ; Koenig, 
Christian ; Zhang, Hawking ; 
Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)

On Mon, Aug 2, 2021 at 4:23 AM Chen, Guchun  wrote:
>
> [Public]
>
> Thank you, Christian.
>
> Regarding fence_drv.initialized, it looks to a bit redundant, anyway let me 
> look into this more.

Does this patch fix this bug?
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1668&data=04%7C01%7CGuchun.Chen%40amd.com%7C2bf8bebf5b424751572408d955ba66e8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635081353279181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FuAo44Ws5SnuCxt45A%2Fqmu%2B3OfEkat1G%2BixO8G9uDVc%3D&reserved=0

If so, please add:
Bug: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1668&data=04%7C01%7CGuchun.Chen%40amd.com%7C2bf8bebf5b424751572408d955ba66e8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635081353279181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FuAo44Ws5SnuCxt45A%2Fqmu%2B3OfEkat1G%2BixO8G9uDVc%3D&reserved=0
to the commit message.

Alex

>
> Regards,
> Guchun
>
> -Original Message-
> From: Christian König 
> Sent: Monday, August 2, 2021 2:56 PM
> To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; 
> Gao, Likun ; Koenig, Christian 
> ; Zhang, Hawking ; 
> Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver 
> fini in s3 test (v2)
>
> Am 02.08.21 um 07:16 schrieb Guchun Chen:
> > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to 
> > stop scheduler in s3 test, otherwise, fence related failure will 
> > arrive after resume. To fix this and for a better clean up, move 
> > drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of 
> > driver shutdown, and should never be called in hw_fini.
> >
> > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, 
> > to keep sw_init and sw_fini paired.
> >
> > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
> > Suggested-by: Christian König 
> > Signed-off-by: Guchun Chen 
>
> It's a bit ambiguous now what fence_drv.initialized means, but I think we can 
> live with that for now.
>
> Patch is Reviewed-by: Christian König .
>
> Regards,
> Christian.
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
> >   3 files changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index b1d2dc39e8be..9e53ff851496 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device 
> > *adev,
> >
> >   fence_driver_init:
> >   /* Fence driver */
> > - r = amdgpu_fence_driver_init(adev);
> > + r = amdgpu_fence_driver_sw_init(adev);
> >   if (r) {
> > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
> > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init 
> > + failed\n");
> >   amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 0, 
> > 0);
> >   goto failed;
> >   }
> > @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
> > fbcon)
> >   }
> >   amdgpu_fence_driver_hw_init(adev);
> >
> > -
> >   r = amdgpu_device_ip_late_init(adev);
> >   if (r)
> >   return r;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 49c5c7331c53..7495911516c2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
> > *ring,
> >   }
> >
> >   /**
> > - * amdgpu_fence_driv

RE: [PATCH] drm/amd/amdgpu: remove redundant host to psp cmd buf

2021-08-02 Thread Chen, Guchun

[Public]

+if (psp->asd_fw) {
+   release_firmware(psp->asd_fw);
adev->psp.asd_fw = NULL;
 }

Use "psp->asd_fw = NULL" should be more simple?

Regards,
Guchun

From: amd-gfx  On Behalf Of Li, Candice
Sent: Tuesday, August 3, 2021 11:06 AM
To: amd-gfx@lists.freedesktop.org
Cc: Clements, John ; Li, Candice 
Subject: [PATCH] drm/amd/amdgpu: remove redundant host to psp cmd buf


[Public]

Signed-off-by: Candice Li candice...@amd.com
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 253 
1 file changed, 78 insertions(+), 175 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index ed731144ca7f..9c18558e3bc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -253,6 +253,12 @@ static int psp_sw_init(void *handle)
 struct psp_runtime_boot_cfg_entry boot_cfg_entry;
 struct psp_memory_training_context *mem_training_ctx = 
&psp->mem_train_ctx;

+psp->cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
+if (!psp->cmd) {
+   DRM_ERROR("Failed to allocate memory to command 
buffer!\n");
+   ret = -ENOMEM;
+}
+
 if (!amdgpu_sriov_vf(adev)) {
ret = psp_init_microcode(psp);
if (ret) {
@@ -315,25 +321,30 @@ static int psp_sw_init(void *handle)
static int psp_sw_fini(void *handle)
{
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+struct psp_context *psp = &adev->psp;
+struct psp_gfx_cmd_resp *cmd = psp->cmd;

- psp_memory_training_fini(&adev->psp);
- if (adev->psp.sos_fw) {
-   release_firmware(adev->psp.sos_fw);
-   adev->psp.sos_fw = NULL;
+psp_memory_training_fini(psp);
+if (psp->sos_fw) {
+   release_firmware(psp->sos_fw);
+   psp->sos_fw = NULL;
 }
- if (adev->psp.asd_fw) {
-   release_firmware(adev->psp.asd_fw);
+if (psp->asd_fw) {
+   release_firmware(psp->asd_fw);
adev->psp.asd_fw = NULL;
 }
- if (adev->psp.ta_fw) {
-   release_firmware(adev->psp.ta_fw);
-   adev->psp.ta_fw = NULL;
+if (psp->ta_fw) {
+   release_firmware(psp->ta_fw);
+   psp->ta_fw = NULL;
 }

  if (adev->asic_type == CHIP_NAVI10 ||
 adev->asic_type == CHIP_SIENNA_CICHLID)
psp_sysfs_fini(adev);

+kfree(cmd);
+cmd = NULL;
+
 return 0;
}

@@ -491,6 +502,8 @@ static void psp_prep_tmr_cmd_buf(struct psp_context *psp,
 uint32_t size = amdgpu_bo_size(tmr_bo);
 uint64_t tmr_pa = amdgpu_gmc_vram_pa(adev, tmr_bo);

+memset(cmd, 0, sizeof(struct psp_gfx_cmd_resp));
+
 if (amdgpu_sriov_vf(psp->adev))
cmd->cmd_id = GFX_CMD_ID_SETUP_VMR;
 else
@@ -506,6 +519,8 @@ static void psp_prep_tmr_cmd_buf(struct psp_context *psp,
static void psp_prep_load_toc_cmd_buf(struct psp_gfx_cmd_resp *cmd,
   uint64_t 
pri_buf_mc, uint32_t size)
{
+memset(cmd, 0, sizeof(struct psp_gfx_cmd_resp));
+
 cmd->cmd_id = GFX_CMD_ID_LOAD_TOC;
 cmd->cmd.cmd_load_toc.toc_phy_addr_lo = lower_32_bits(pri_buf_mc);
 cmd->cmd.cmd_load_toc.toc_phy_addr_hi = upper_32_bits(pri_buf_mc);
@@ -517,11 +532,8 @@ static int psp_load_toc(struct psp_context *psp,
  uint32_t *tmr_size)
{
 int ret;
- struct psp_gfx_cmd_resp *cmd;
+struct psp_gfx_cmd_resp *cmd = psp->cmd;

- cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), GFP_KERNEL);
- if (!cmd)
-   return -ENOMEM;
 /* Copy toc to psp firmware private buffer */
 psp_copy_fw(psp, psp->toc.start_addr, psp->toc.size_bytes);

@@ -531,7 +543,7 @@ static int psp_load_toc(struct psp_context *psp,
  
psp->fence_buf_mc_addr);
 if (!ret)
*tmr_size = psp->cmd_buf_mem->resp.tmr_size;
- kfree(cmd);
+
 return ret;
}

@@ -588,7 +600,7 @@ static bool psp_skip_tmr(struct psp_context *psp)
static int psp_tmr_load(struct psp_context *psp)
{
 int ret;
- struct psp_gfx_cmd_resp *cmd;
+struct psp_gfx_cmd_resp *cmd = psp->cmd;

  /* For Navi

RE: [PATCH] drm/amd/amdgpu: remove redundant host to psp cmd buf

2021-08-03 Thread Chen, Guchun

[Public]

In psp_cmd_submit_buf, it has psp->mutex to guard this, so it should be fine.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Lazar, Lijo
Sent: Tuesday, August 3, 2021 2:30 PM
To: Li, Candice ; amd-gfx@lists.freedesktop.org
Cc: Clements, John 
Subject: Re: [PATCH] drm/amd/amdgpu: remove redundant host to psp cmd buf

Command buffer in psp context means different command buffers cannot be 
prepared in parallel. Any case of submitting commands for different TAs in 
parallel - ex: for RAS and some other TA?

Thanks,
Lijo

On 8/3/2021 8:35 AM, Li, Candice wrote:
> [Public]
> 
> 
> Signed-off-by: Candice Li candice...@amd.com 
> 
> 
> ---
> 
> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 253 
> 
> 1 file changed, 78 insertions(+), 175 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> 
> index ed731144ca7f..9c18558e3bc0 100644
> 
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> 
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> 
> @@ -253,6 +253,12 @@ static int psp_sw_init(void *handle)
> 
>   struct psp_runtime_boot_cfg_entry boot_cfg_entry;
> 
>   struct psp_memory_training_context *mem_training_ctx = 
> &psp->mem_train_ctx;
> 
> +    psp->cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp),
> GFP_KERNEL);
> 
> +    if (!psp->cmd) {
> 
> +   DRM_ERROR("Failed to allocate memory to
> command buffer!\n");
> 
> +   ret = -ENOMEM;
> 
> +    }
> 
> +
> 
>   if (!amdgpu_sriov_vf(adev)) {
> 
>      ret = psp_init_microcode(psp);
> 
>      if (ret) {
> 
> @@ -315,25 +321,30 @@ static int psp_sw_init(void *handle)
> 
> static int psp_sw_fini(void *handle)
> 
> {
> 
>   struct amdgpu_device *adev = (struct amdgpu_device 
> *)handle;
> 
> +    struct psp_context *psp = &adev->psp;
> 
> +    struct psp_gfx_cmd_resp *cmd = psp->cmd;
> 
> - psp_memory_training_fini(&adev->psp);
> 
> - if (adev->psp.sos_fw) {
> 
> -   release_firmware(adev->psp.sos_fw);
> 
> -   adev->psp.sos_fw = NULL;
> 
> +    psp_memory_training_fini(psp);
> 
> +    if (psp->sos_fw) {
> 
> +   release_firmware(psp->sos_fw);
> 
> +   psp->sos_fw = NULL;
> 
>   }
> 
> - if (adev->psp.asd_fw) {
> 
> -   release_firmware(adev->psp.asd_fw);
> 
> +    if (psp->asd_fw) {
> 
> +   release_firmware(psp->asd_fw);
> 
>      adev->psp.asd_fw = NULL;
> 
>   }
> 
> - if (adev->psp.ta_fw) {
> 
> -   release_firmware(adev->psp.ta_fw);
> 
> -   adev->psp.ta_fw = NULL;
> 
> +    if (psp->ta_fw) {
> 
> +   release_firmware(psp->ta_fw);
> 
> +   psp->ta_fw = NULL;
> 
>   }
> 
>    if (adev->asic_type == CHIP_NAVI10 ||
> 
>   adev->asic_type == CHIP_SIENNA_CICHLID)
> 
>      psp_sysfs_fini(adev);
> 
> +    kfree(cmd);
> 
> +    cmd = NULL;
> 
> +
> 
>   return 0;
> 
> }
> 
> @@ -491,6 +502,8 @@ static void psp_prep_tmr_cmd_buf(struct 
> psp_context *psp,
> 
>   uint32_t size = amdgpu_bo_size(tmr_bo);
> 
>   uint64_t tmr_pa = amdgpu_gmc_vram_pa(adev, tmr_bo);
> 
> +    memset(cmd, 0, sizeof(struct psp_gfx_cmd_resp));
> 
> +
> 
>   if (amdgpu_sriov_vf(psp->adev))
> 
>      cmd->cmd_id = GFX_CMD_ID_SETUP_VMR;
> 
>   else
> 
> @@ -506,6 +519,8 @@ static void psp_prep_tmr_cmd_buf(struct 
> psp_context *psp,
> 
> static void psp_prep_load_toc_cmd_buf(struct psp_gfx_cmd_resp *cmd,
> 
>     
> uint64_t pri_buf_mc, uint32_t size)
> 
> {
> 
> +    memset(cmd, 0, sizeof(struct psp_gfx_cmd_resp));
> 
> +
> 
>   cmd->cmd_id = GFX_CMD_ID_LOAD_TOC;
> 
>   cmd->cmd.cmd_load_toc.toc_phy_addr_lo = 
> lower_32_bits(pri_buf_mc);
> 
>   cmd->cmd.cmd_load_toc.toc_phy_addr_hi = 
> upper_32_bits(pri_buf_mc);
> 
> @@ -517,11 +532,8 @@ static int psp_load_toc(struct psp_context *psp,
> 
>    uint32_t *tmr_size)
> 
> {
> 
>   int ret;
> 
> - struct psp_gfx_cmd_resp *cmd;
> 
> +    struct psp_gfx_cmd_resp *cmd = psp->cmd;
> 
> - cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp), 
> GFP_KERNEL);
> 
> - if (!cmd)
> 
> -   return -ENOMEM;
> 
>   /* Copy toc to psp firmware private buffer */
> 
>   psp_copy_fw(psp

RE: [PATCH] drm/amd/amdgpu: remove redundant host to psp cmd buf

2021-08-03 Thread Chen, Guchun

[Public]

Yeah, you are right, Lijo. @Li, Candice @Clements, John please address this 
before submitting this patch.

Regards,
Guchun

-Original Message-
From: Lazar, Lijo  
Sent: Tuesday, August 3, 2021 4:16 PM
To: Chen, Guchun ; Li, Candice ; 
amd-gfx@lists.freedesktop.org
Cc: Clements, John 
Subject: Re: [PATCH] drm/amd/amdgpu: remove redundant host to psp cmd buf

Not really. Below clearing and changing the buffer happens outside of the 
mutex, so it's not fine till the memcpy happens in psp_cmd_submit_buf.

memset(cmd, 0, sizeof(struct psp_gfx_cmd_resp));


Thanks,
Lijo


On 8/3/2021 1:37 PM, Chen, Guchun wrote:
> [Public]
> 
> In psp_cmd_submit_buf, it has psp->mutex to guard this, so it should be fine.
> 
> Regards,
> Guchun
> 
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Lazar, Lijo
> Sent: Tuesday, August 3, 2021 2:30 PM
> To: Li, Candice ; amd-gfx@lists.freedesktop.org
> Cc: Clements, John 
> Subject: Re: [PATCH] drm/amd/amdgpu: remove redundant host to psp cmd 
> buf
> 
> Command buffer in psp context means different command buffers cannot be 
> prepared in parallel. Any case of submitting commands for different TAs in 
> parallel - ex: for RAS and some other TA?
> 
> Thanks,
> Lijo
> 
> On 8/3/2021 8:35 AM, Li, Candice wrote:
>> [Public]
>>
>>
>> Signed-off-by: Candice Li candice...@amd.com 
>> <mailto:candice...@amd.com>
>>
>> ---
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 253 
>> 
>>
>> 1 file changed, 78 insertions(+), 175 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>
>> index ed731144ca7f..9c18558e3bc0 100644
>>
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>
>> @@ -253,6 +253,12 @@ static int psp_sw_init(void *handle)
>>
>>    struct psp_runtime_boot_cfg_entry boot_cfg_entry;
>>
>>    struct psp_memory_training_context *mem_training_ctx = 
>> &psp->mem_train_ctx;
>>
>> +    psp->cmd = kzalloc(sizeof(struct psp_gfx_cmd_resp),
>> GFP_KERNEL);
>>
>> +    if (!psp->cmd) {
>>
>> +   DRM_ERROR("Failed to allocate memory to
>> command buffer!\n");
>>
>> +   ret = -ENOMEM;
>>
>> +    }
>>
>> +
>>
>>    if (!amdgpu_sriov_vf(adev)) {
>>
>>       ret = psp_init_microcode(psp);
>>
>>       if (ret) {
>>
>> @@ -315,25 +321,30 @@ static int psp_sw_init(void *handle)
>>
>> static int psp_sw_fini(void *handle)
>>
>> {
>>
>>    struct amdgpu_device *adev = (struct amdgpu_device 
>> *)handle;
>>
>> +    struct psp_context *psp = &adev->psp;
>>
>> +    struct psp_gfx_cmd_resp *cmd = psp->cmd;
>>
>> - psp_memory_training_fini(&adev->psp);
>>
>> - if (adev->psp.sos_fw) {
>>
>> -   release_firmware(adev->psp.sos_fw);
>>
>> -   adev->psp.sos_fw = NULL;
>>
>> +    psp_memory_training_fini(psp);
>>
>> +    if (psp->sos_fw) {
>>
>> +   release_firmware(psp->sos_fw);
>>
>> +   psp->sos_fw = NULL;
>>
>>    }
>>
>> - if (adev->psp.asd_fw) {
>>
>> -   release_firmware(adev->psp.asd_fw);
>>
>> +    if (psp->asd_fw) {
>>
>> +   release_firmware(psp->asd_fw);
>>
>>       adev->psp.asd_fw = NULL;
>>
>>    }
>>
>> - if (adev->psp.ta_fw) {
>>
>> -   release_firmware(adev->psp.ta_fw);
>>
>> -   adev->psp.ta_fw = NULL;
>>
>> +    if (psp->ta_fw) {
>>
>> +   release_firmware(psp->ta_fw);
>>
>> +   psp->ta_fw = NULL;
>>
>>    }
>>
>>     if (adev->asic_type == CHIP_NAVI10 ||
>>
>>    adev->asic_type == CHIP_SIENNA_CICHLID)
>>
>>       psp_sysfs_fini(adev);
>>
>> +    kfree(cmd);
>>
&g

RE: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access

2021-08-03 Thread Chen, Guchun

[Public]

Before calling into psp_cmd_submit_buf, a mutex psp->cmd_buf_mutex is there, 
and after entering psp_cmd_submit_buf, there is another mutex psp->mutex, is it 
a bit redundant?

Regards,
Guchun

From: Clements, John 
Sent: Tuesday, August 3, 2021 5:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Li, Candice ; 
Lazar, Lijo ; Chen, Guchun 
Subject: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access


[AMD Official Use Only]

Submitting patch to synchronize access to psp cmd submission memory to resolve 
potential race conditions.

RE: [PATCH] drm/amdgpu: add DID for beige goby

2021-08-03 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Wednesday, August 4, 2021 3:39 AM
To: amd-gfx@lists.freedesktop.org
Cc: Gui, Jack ; Deucher, Alexander 
Subject: [PATCH] drm/amdgpu: add DID for beige goby

From: Chengming Gui 

Add device ids.

Signed-off-by: Chengming Gui 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 91a5ed96bfbe..b02c87ae4245 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1215,6 +1215,13 @@ static const struct pci_device_id pciidlist[] = {
/* CYAN_SKILLFISH */
{0x1002, 0x13FE, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 
CHIP_CYAN_SKILLFISH|AMD_IS_APU},
 
+   /* BEIGE_GOBY */
+   {0x1002, 0x7420, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BEIGE_GOBY},
+   {0x1002, 0x7421, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BEIGE_GOBY},
+   {0x1002, 0x7422, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BEIGE_GOBY},
+   {0x1002, 0x7423, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BEIGE_GOBY},
+   {0x1002, 0x743F, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_BEIGE_GOBY},
+
{0, 0, 0}
 };
 
-- 
2.31.1

RE: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access

2021-08-03 Thread Chen, Guchun

[Public]

Thanks John. As in the same context, it's meaningless that two mutex target 
almost the same thing.

Regards,
Guchun

From: Clements, John 
Sent: Wednesday, August 4, 2021 11:34 AM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Li, Candice ; 
Lazar, Lijo 
Subject: RE: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access


[Public]

@Chen, Guchun<mailto:guchun.c...@amd.com>,
Based off your feedback I double checked the code, and I changed my opinion 
about it, I think it's better just to reuse the original mutex for now. I've 
submitted an updated patch for review

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Tuesday, August 3, 2021 10:07 PM
To: Chen, Guchun mailto:guchun.c...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Zhang, Hawking mailto:hawking.zh...@amd.com>>; Li, 
Candice mailto:candice...@amd.com>>; Lazar, Lijo 
mailto:lijo.la...@amd.com>>
Subject: Re: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access

Hello Guchun,

In most of those cases you are right it is redundant, the reason i kept them 
separate for now is to resolve this bug while also keeping those interfaces 
modular, and not affecting the psp submit sequence yet. We are planning a 
bigger change to that source to remove alot of the duplicate code regarding the 
cmd buffer prepare/submit flow and will probably go back down to one mutex 
there.

Thank you,
John Clements


From: Chen, Guchun mailto:guchun.c...@amd.com>>
Sent: Tuesday, August 3, 2021 9:58 PM
To: Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Zhang, Hawking mailto:hawking.zh...@amd.com>>; Li, 
Candice mailto:candice...@amd.com>>; Lazar, Lijo 
mailto:lijo.la...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access


[Public]



Before calling into psp_cmd_submit_buf, a mutex psp->cmd_buf_mutex is there, 
and after entering psp_cmd_submit_buf, there is another mutex psp->mutex, is it 
a bit redundant?



Regards,

Guchun



From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Tuesday, August 3, 2021 5:50 PM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Zhang, Hawking mailto:hawking.zh...@amd.com>>; Li, 
Candice mailto:candice...@amd.com>>; Lazar, Lijo 
mailto:lijo.la...@amd.com>>; Chen, Guchun 
mailto:guchun.c...@amd.com>>
Subject: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access



[AMD Official Use Only]



Submitting patch to synchronize access to psp cmd submission memory to resolve 
potential race conditions.

RE: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access

2021-08-03 Thread Chen, Guchun

[Public]

Sorry for missing RB.

This patch is:
Reviewed-by: Guchun Chen 

Regards,
Guchun

From: amd-gfx  On Behalf Of Chen, Guchun
Sent: Wednesday, August 4, 2021 11:40 AM
To: Clements, John ; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Li, Candice ; 
Lazar, Lijo 
Subject: RE: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access


[Public]

Thanks John. As in the same context, it's meaningless that two mutex target 
almost the same thing.

Regards,
Guchun

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Wednesday, August 4, 2021 11:34 AM
To: Chen, Guchun mailto:guchun.c...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Zhang, Hawking mailto:hawking.zh...@amd.com>>; Li, 
Candice mailto:candice...@amd.com>>; Lazar, Lijo 
mailto:lijo.la...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access


[Public]

@Chen, Guchun<mailto:guchun.c...@amd.com>,
Based off your feedback I double checked the code, and I changed my opinion 
about it, I think it's better just to reuse the original mutex for now. I've 
submitted an updated patch for review

From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Tuesday, August 3, 2021 10:07 PM
To: Chen, Guchun mailto:guchun.c...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Zhang, Hawking mailto:hawking.zh...@amd.com>>; Li, 
Candice mailto:candice...@amd.com>>; Lazar, Lijo 
mailto:lijo.la...@amd.com>>
Subject: Re: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access

Hello Guchun,

In most of those cases you are right it is redundant, the reason i kept them 
separate for now is to resolve this bug while also keeping those interfaces 
modular, and not affecting the psp submit sequence yet. We are planning a 
bigger change to that source to remove alot of the duplicate code regarding the 
cmd buffer prepare/submit flow and will probably go back down to one mutex 
there.

Thank you,
John Clements


From: Chen, Guchun mailto:guchun.c...@amd.com>>
Sent: Tuesday, August 3, 2021 9:58 PM
To: Clements, John mailto:john.cleme...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Zhang, Hawking mailto:hawking.zh...@amd.com>>; Li, 
Candice mailto:candice...@amd.com>>; Lazar, Lijo 
mailto:lijo.la...@amd.com>>
Subject: RE: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access


[Public]



Before calling into psp_cmd_submit_buf, a mutex psp->cmd_buf_mutex is there, 
and after entering psp_cmd_submit_buf, there is another mutex psp->mutex, is it 
a bit redundant?



Regards,

Guchun



From: Clements, John mailto:john.cleme...@amd.com>>
Sent: Tuesday, August 3, 2021 5:50 PM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Zhang, Hawking mailto:hawking.zh...@amd.com>>; Li, 
Candice mailto:candice...@amd.com>>; Lazar, Lijo 
mailto:lijo.la...@amd.com>>; Chen, Guchun 
mailto:guchun.c...@amd.com>>
Subject: [PATCH] drm/amdgpu: added synchronization for psp cmd buf access



[AMD Official Use Only]



Submitting patch to synchronize access to psp cmd submission memory to resolve 
potential race conditions.

RE: [PATCH] drm/amdgpu: set RAS EEPROM address from VBIOS

2021-08-04 Thread Chen, Guchun

[Public]

+/*
+ * Helper function to query RAS EEPROM address
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Return true if vbios supports ras rom address reporting

As you have documented the first argument of function 
amdgpu_atomfirmware_ras_rom_addr, the other one "uint8_t* i2c_address" should 
be documented as well.

BTW, if this patch fixes https://gitlab.freedesktop.org/drm/amd/-/issues/1670?

Regards,
Guchun

From: amd-gfx  On Behalf Of Clements, 
John
Sent: Wednesday, August 4, 2021 5:14 PM
To: Clements, John ; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking 
Subject: RE: [PATCH] drm/amdgpu: set RAS EEPROM address from VBIOS


[AMD Official Use Only]

Updated patch with reviewed changes

From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Clements, John
Sent: Wednesday, August 4, 2021 4:48 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking mailto:hawking.zh...@amd.com>>
Subject: [PATCH] drm/amdgpu: set RAS EEPROM address from VBIOS


[AMD Official Use Only]

Submitting patch to get RAS EEPROM I2C address from VBIOS FW info table.

Thank you,
John Clements
<>

RE: [PATCH] drm/amdgpu: handle VCN instances when harvesting

2021-08-09 Thread Chen, Guchun

[Public]

A spelling typo in commit body.

There may be multiple instances an only one is harvested.

s/an/and

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Tuesday, August 10, 2021 10:05 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhu, James 

Subject: [PATCH] drm/amdgpu: handle VCN instances when harvesting

There may be multiple instances an only one is harvested.

Fixes: 83a0b8639185 ("drm/amdgpu: add judgement when add ip blocks (v2)")
Bug: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1673&data=04%7C01%7Cguchun.chen%40amd.com%7Cd3b39300779148333e6008d95ba34697%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637641579088850586%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Z0tcI%2BPoHB2qwA2PVN4YBg5IjxLiaKdsm4KgE%2Bvf6WE%3D&reserved=0
Reviewed-by: James Zhu 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 43e7b61d1c5c..ada7bc19118a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -299,6 +299,9 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev)
  ip->major, ip->minor,
  ip->revision);
 
+   if (le16_to_cpu(ip->hw_id) == VCN_HWID)
+   adev->vcn.num_vcn_inst++;
+
for (k = 0; k < num_base_address; k++) {
/*
 * convert the endianness of base addresses in 
place, @@ -385,7 +388,7 @@ void amdgpu_discovery_harvest_ip(struct 
amdgpu_device *adev)  {
struct binary_header *bhdr;
struct harvest_table *harvest_info;
-   int i;
+   int i, vcn_harvest_count = 0;
 
bhdr = (struct binary_header *)adev->mman.discovery_bin;
harvest_info = (struct harvest_table *)(adev->mman.discovery_bin + @@ 
-397,8 +400,7 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev)
 
switch (le32_to_cpu(harvest_info->list[i].hw_id)) {
case VCN_HWID:
-   adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK;
-   adev->harvest_ip_mask |= AMD_HARVEST_IP_JPEG_MASK;
+   vcn_harvest_count++;
break;
case DMU_HWID:
adev->harvest_ip_mask |= AMD_HARVEST_IP_DMU_MASK; @@ 
-407,6 +409,10 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev)
break;
}
}
+   if (vcn_harvest_count == adev->vcn.num_vcn_inst) {
+   adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK;
+   adev->harvest_ip_mask |= AMD_HARVEST_IP_JPEG_MASK;
+   }
 }
 
 int amdgpu_discovery_get_gfx_info(struct amdgpu_device *adev)
--
2.31.1

RE: [PATCH] drm/amdgpu: handle VCN instances when harvesting (v2)

2021-08-09 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Tuesday, August 10, 2021 11:03 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhu, James 

Subject: [PATCH] drm/amdgpu: handle VCN instances when harvesting (v2)

There may be multiple instances and only one is harvested.

v2: fix typo in commit message

Fixes: 83a0b8639185 ("drm/amdgpu: add judgement when add ip blocks (v2)")
Bug: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1673&data=04%7C01%7Cguchun.chen%40amd.com%7Cff725e2787674280255308d95bab7325%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637641614198052677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=oNLIYokjXIL2CfqeVMevwNjj9jJJo%2BmhZJ6az8TLW2c%3D&reserved=0
Reviewed-by: James Zhu 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 43e7b61d1c5c..ada7bc19118a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -299,6 +299,9 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev)
  ip->major, ip->minor,
  ip->revision);
 
+   if (le16_to_cpu(ip->hw_id) == VCN_HWID)
+   adev->vcn.num_vcn_inst++;
+
for (k = 0; k < num_base_address; k++) {
/*
 * convert the endianness of base addresses in 
place, @@ -385,7 +388,7 @@ void amdgpu_discovery_harvest_ip(struct 
amdgpu_device *adev)  {
struct binary_header *bhdr;
struct harvest_table *harvest_info;
-   int i;
+   int i, vcn_harvest_count = 0;
 
bhdr = (struct binary_header *)adev->mman.discovery_bin;
harvest_info = (struct harvest_table *)(adev->mman.discovery_bin + @@ 
-397,8 +400,7 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev)
 
switch (le32_to_cpu(harvest_info->list[i].hw_id)) {
case VCN_HWID:
-   adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK;
-   adev->harvest_ip_mask |= AMD_HARVEST_IP_JPEG_MASK;
+   vcn_harvest_count++;
break;
case DMU_HWID:
adev->harvest_ip_mask |= AMD_HARVEST_IP_DMU_MASK; @@ 
-407,6 +409,10 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev)
break;
}
}
+   if (vcn_harvest_count == adev->vcn.num_vcn_inst) {
+   adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK;
+   adev->harvest_ip_mask |= AMD_HARVEST_IP_JPEG_MASK;
+   }
 }
 
 int amdgpu_discovery_get_gfx_info(struct amdgpu_device *adev)
--
2.31.1

RE: [PATCH] drm/display: fix possible null-pointer dereference in dcn10_set_clock()

2021-08-10 Thread Chen, Guchun

[Public]

Thanks for your patch.

I suggest moving the check of function pointer dc->clk_mgr->funcs->get_clock 
earlier, and return early if it's NULL, as if it's NULL, it's meaningless to 
continue the clock setting.


if (!dc->clk_mgr || !dc->clk_mgr->funcs->get_clock)
return DC_FAIL_UNSUPPORTED_1;

dc->clk_mgr->funcs->get_clock(dc->clk_mgr,
context, clock_type, &clock_cfg);


Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Tuo Li
Sent: Tuesday, August 10, 2021 5:20 PM
To: Wentland, Harry ; Li, Sun peng (Leo) 
; Deucher, Alexander ; Koenig, 
Christian ; Pan, Xinhui ; 
airl...@linux.ie; dan...@ffwll.ch; Cyr, Aric ; Lei, Jun 
; Zhuo, Qingqing ; Siqueira, Rodrigo 
; Lee, Alvin ; Stempen, Vladimir 
; isabel.zh...@amd.com; Lee, Sung ; 
Po-Yu Hsieh Paul ; Wood, Wyatt 
Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
linux-ker...@vger.kernel.org; baijiaju1...@gmail.com; Tuo Li 
; TOTE Robot 
Subject: [PATCH] drm/display: fix possible null-pointer dereference in 
dcn10_set_clock()

The variable dc->clk_mgr is checked in:
  if (dc->clk_mgr && dc->clk_mgr->funcs->get_clock)

This indicates dc->clk_mgr can be NULL.
However, it is dereferenced in:
  if (!dc->clk_mgr->funcs->get_clock)

To fix this possible null-pointer dereference, check dc->clk_mgr before 
dereferencing it.

Reported-by: TOTE Robot 
Signed-off-by: Tuo Li 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index c545eddabdcc..3a7c7c7efa68 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -3635,7 +3635,7 @@ enum dc_status dcn10_set_clock(struct dc *dc,
dc->clk_mgr->funcs->get_clock(dc->clk_mgr,
context, clock_type, 
&clock_cfg);
 
-   if (!dc->clk_mgr->funcs->get_clock)
+   if (dc->clk_mgr && !dc->clk_mgr->funcs->get_clock)
return DC_FAIL_UNSUPPORTED_1;
 
if (clk_khz > clock_cfg.max_clock_khz)
--
2.25.1

RE: [PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-11 Thread Chen, Guchun

[Public]

Hi Jingwen,

Your patch has caused amdgpu driver load failure on all ASICs. Please revert it 
first and come up with a proper fix.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Andrey 
Grodzovsky
Sent: Wednesday, August 11, 2021 12:41 AM
To: Chen, JingWen ; amd-gfx@lists.freedesktop.org
Cc: Liu, Monk ; Koenig, Christian ; 
Jack Zhang ; Jack Zhang 
Subject: Re: [PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job

Reviewed-by: Andrey Grodzovsky 

Andrey

On 2021-08-09 11:22 p.m., Jingwen Chen wrote:
> From: Jack Zhang 
>
> Why: Previously hw fence is alloced separately with job.
> It caused historical lifetime issues and corner cases.
> The ideal situation is to take fence to manage both job and fence's 
> lifetime, and simplify the design of gpu-scheduler.
>
> How:
> We propose to embed hw_fence into amdgpu_job.
> 1. We cover the normal job submission by this method.
> 2. For ib_test, and submit without a parent job keep the legacy way to 
> create a hw fence separately.
> v2:
> use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is 
> embeded in a job.
> v3:
> remove redundant variable ring in amdgpu_job
> v4:
> add tdr sequence support for this feature. Add a job_run_counter to 
> indicate whether this job is a resubmit job.
>
> Signed-off-by: Jingwen Chen 
> Signed-off-by: Jack Zhang 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |  1 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 12 +++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 73 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 39 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  6 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h|  5 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  2 +-
>   9 files changed, 108 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index 7b46ba551cb2..3003ee1c9487 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -714,7 +714,6 @@ int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum 
> kgd_engine_type engine,
>   ret = dma_fence_wait(f, false);
>   
>   err_ib_sched:
> - dma_fence_put(f);
>   amdgpu_job_free(job);
>   err:
>   return ret;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index 536005bff24a..277128846dd1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -1414,7 +1414,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct 
> amdgpu_ring *ring)
>   continue;
>   }
>   job = to_amdgpu_job(s_job);
> - if (preempted && job->fence == fence)
> + if (preempted && (&job->hw_fence) == fence)
>   /* mark the job as preempted */
>   job->preemption_status |= AMDGPU_IB_PREEMPTED;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 9e53ff851496..ade2fa07a50a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4447,7 +4447,7 @@ int amdgpu_device_mode1_reset(struct amdgpu_device 
> *adev)
>   int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
>struct amdgpu_reset_context *reset_context)
>   {
> - int i, r = 0;
> + int i, j, r = 0;
>   struct amdgpu_job *job = NULL;
>   bool need_full_reset =
>   test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags); @@ 
> -4471,6 +4471,16 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
> *adev,
>   if (!ring || !ring->sched.thread)
>   continue;
>   
> + /*clear job fence from fence drv to avoid force_completion
> +  *leave NULL and vm flush fence in fence drv */
> + for (j = 0; j <= ring->fence_drv.num_fences_mask; j ++) {
> + struct dma_fence *old,**ptr;
> + ptr = &ring->fence_drv.fences[j];
> + old = rcu_dereference_protected(*ptr, 1);
> + if (old && test_bit(AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT, 
> &old->flags)) {
> + RCU_INIT_POINTER(*ptr, NULL);
> + }
> + }
>   /* after all hw jobs are reset, hw fence is meaningless, so 
> force_completion */
>   amdgpu_fence_driver_force_completion(ring);
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 7495911516c2..a8302e324110 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/

RE: [PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job

2021-08-11 Thread Chen, Guchun

[Public]

Attach the error log.

[   99.534964] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[   99.535531] amdgpu: SRAT table not found
[   99.535532] amdgpu: Virtual CRAT table created for GPU
[   99.536695] amdgpu: Topology: Add dGPU node [0x73a3:0x1002]
[   99.536697] kfd kfd: amdgpu: added device 1002:73a3
[   99.536717] amdgpu :03:00.0: amdgpu: SE 4, SH per SE 2, CU per SH 10, 
active_cu_number 60
[   99.536904] BUG: kernel NULL pointer dereference, address: 0048
[   99.536906] #PF: supervisor read access in kernel mode
[   99.536907] #PF: error_code(0x) - not-present page
[   99.536908] PGD 0 P4D 0
[   99.536910] Oops:  [#1] SMP PTI
[   99.536911] CPU: 8 PID: 2282 Comm: sdma0 Not tainted 5.13.0-guchchen #1
[   99.536913] Hardware name: System manufacturer System Product Name/TUF 
Z370-PLUS GAMING II, BIOS 0411 09/21/2018
[   99.536914] RIP: 0010:amdgpu_fence_enable_signaling+0x15/0x40 [amdgpu]
[   99.537023] [drm] Unknown EDID CEA parser results
[   99.537044] Code: 00 e9 4f 55 ab ed 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 
00 00 0f 1f 44 00 00 48 81 7f 08 20 c7 b1 c0 74 02 31 ff 48 8b 7f 40 <48> 8b 47 
48 48 85 c0 74 06 b8 01 00 00 00 c3 48 8b 35 95 9c e5 ee
[   99.537046] RSP: 0018:b50b01dcfe58 EFLAGS: 00010046
[   99.537047] RAX: c07adcc0 RBX: 9bd53c3f4d90 RCX: 0017
[   99.537048] RDX: 0001 RSI: 9bd53c3f4c58 RDI: 
[   99.537049] RBP: 9bd53c3f4c00 R08:  R09: b918
[   99.537050] R10: 0001 R11: 0074 R12: c06e4d10
[   99.537050] R13: 0246 R14: 9bd53b60b9a0 R15: 9bd53c3f4d90
[   99.537051] FS:  () GS:9bd826c0() 
knlGS:
[   99.537052] CS:  0010 DS:  ES:  CR0: 80050033
[   99.537053] CR2: 0048 CR3: 00021360a005 CR4: 003706e0
[   99.537054] DR0:  DR1:  DR2: 
[   99.537055] DR3:  DR6: fffe0ff0 DR7: 0400
[   99.537056] Call Trace:
[   99.537057]  __dma_fence_enable_signaling+0x3c/0xa0
[   99.537060]  dma_fence_add_callback+0x39/0xa0
[   99.537062]  drm_sched_main+0x1aa/0x390 [gpu_sched]
[   99.537065]  ? wait_woken+0x80/0x80
[   99.537068]  ? drm_sched_get_cleanup_job+0x120/0x120 [gpu_sched]
[   99.537070]  kthread+0x117/0x130
[   99.537071]  ? kthread_park+0x90/0x9

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Chen, Guchun
Sent: Wednesday, August 11, 2021 5:24 PM
To: Grodzovsky, Andrey ; Chen, JingWen 
; amd-gfx@lists.freedesktop.org
Cc: Liu, Monk ; Koenig, Christian ; 
Jack Zhang ; Jack Zhang 
Subject: RE: [PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job

[Public]

Hi Jingwen,

Your patch has caused amdgpu driver load failure on all ASICs. Please revert it 
first and come up with a proper fix.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Andrey 
Grodzovsky
Sent: Wednesday, August 11, 2021 12:41 AM
To: Chen, JingWen ; amd-gfx@lists.freedesktop.org
Cc: Liu, Monk ; Koenig, Christian ; 
Jack Zhang ; Jack Zhang 
Subject: Re: [PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job

Reviewed-by: Andrey Grodzovsky 

Andrey

On 2021-08-09 11:22 p.m., Jingwen Chen wrote:
> From: Jack Zhang 
>
> Why: Previously hw fence is alloced separately with job.
> It caused historical lifetime issues and corner cases.
> The ideal situation is to take fence to manage both job and fence's 
> lifetime, and simplify the design of gpu-scheduler.
>
> How:
> We propose to embed hw_fence into amdgpu_job.
> 1. We cover the normal job submission by this method.
> 2. For ib_test, and submit without a parent job keep the legacy way to 
> create a hw fence separately.
> v2:
> use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is 
> embeded in a job.
> v3:
> remove redundant variable ring in amdgpu_job
> v4:
> add tdr sequence support for this feature. Add a job_run_counter to 
> indicate whether this job is a resubmit job.
>
> Signed-off-by: Jingwen Chen 
> Signed-off-by: Jack Zhang 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  |  1 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 12 +++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c   | 73 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 39 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  6 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h|  5 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  |  2 +-
>   9 files changed, 108 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index 7b46ba551cb2..3003ee1c

RE: [PATCH] drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU temporarily

2021-08-15 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Quan, Evan  
Sent: Friday, August 13, 2021 4:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Chen, Guchun 
; Quan, Evan 
Subject: [PATCH] drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU 
temporarily

We have a S3 issue on that SKU with BACO enabled. Will bring back this when 
that root caused.

Change-Id: I56d4830e6275e20a415808896eecbadfe944070b
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/vi.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c 
index fe9a7cc8d9eb..7210f80815b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/vi.c
@@ -904,7 +904,12 @@ static bool vi_asic_supports_baco(struct amdgpu_device 
*adev)
case CHIP_POLARIS11:
case CHIP_POLARIS12:
case CHIP_TOPAZ:
-   return amdgpu_dpm_is_baco_supported(adev);
+   /* Disable BACO support for the specific polaris12 SKU 
temporarily */
+   if ((adev->pdev->device == 0x699F) &&
+   (adev->pdev->revision == 0xC7))
+   return false;
+   else
+   return amdgpu_dpm_is_baco_supported(adev);
default:
return false;
}
--
2.29.0

RE: [PATCH] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend

2021-08-18 Thread Chen, Guchun

[Public]

+Leo and James to review as well.

This patch is:
Acked-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Quan, Evan  
Sent: Thursday, August 19, 2021 11:09 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Chen, Guchun 
; Lazar, Lijo ; Quan, Evan 
; Pan, Xinhui 
Subject: [PATCH] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on 
suspend

Perform proper cleanups on UVD/VCE suspend: powergate enablement, clockgating 
enablement and dpm disablement. This can fix some hangs observed on suspending 
when UVD/VCE still using(e.g. issue "pm-suspend" when video is still playing).

Change-Id: I36f39d9731e0a9638b52d5d92558b0ee9c23a9ed
Signed-off-by: Evan Quan 
Signed-off-by: xinhui pan 
---
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 24   
drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 23 +++
 2 files changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index 4eebf973a065..d0fc6ec18c29 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -554,6 +554,30 @@ static int uvd_v6_0_suspend(void *handle)
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   /*
+* Proper cleanups before halting the HW engine:
+*   - cancel the delayed idle work
+*   - enable powergating
+*   - enable clockgating
+*   - disable dpm
+*
+* TODO: to align with the VCN implementation, move the
+* jobs for clockgating/powergating/dpm setting to
+* ->set_powergating_state().
+*/
+   cancel_delayed_work_sync(&adev->uvd.idle_work);
+
+   if (adev->pm.dpm_enabled) {
+   amdgpu_dpm_enable_uvd(adev, false);
+   } else {
+   amdgpu_asic_set_uvd_clocks(adev, 0, 0);
+   /* shutdown the UVD block */
+   amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_UVD,
+  AMD_PG_STATE_GATE);
+   amdgpu_device_ip_set_clockgating_state(adev, 
AMD_IP_BLOCK_TYPE_UVD,
+  AMD_CG_STATE_GATE);
+   }
+
r = uvd_v6_0_hw_fini(adev);
if (r)
return r;
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index 6d9108fa22e0..a594ade5d30a 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -503,6 +503,29 @@ static int vce_v3_0_suspend(void *handle)
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   /*
+* Proper cleanups before halting the HW engine:
+*   - cancel the delayed idle work
+*   - enable powergating
+*   - enable clockgating
+*   - disable dpm
+*
+* TODO: to align with the VCN implementation, move the
+* jobs for clockgating/powergating/dpm setting to
+* ->set_powergating_state().
+*/
+   cancel_delayed_work_sync(&adev->vce.idle_work);
+
+   if (adev->pm.dpm_enabled) {
+   amdgpu_dpm_enable_vce(adev, false);
+   } else {
+   amdgpu_asic_set_vce_clocks(adev, 0, 0);
+   amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_VCE,
+  AMD_PG_STATE_GATE);
+   amdgpu_device_ip_set_clockgating_state(adev, 
AMD_IP_BLOCK_TYPE_VCE,
+  AMD_CG_STATE_GATE);
+   }
+
r = vce_v3_0_hw_fini(adev);
if (r)
return r;
--
2.29.0

RE: [PATCH] drm/amd/pm: a quick fix for "divided by zero" error

2021-08-20 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Quan, Evan  
Sent: Friday, August 20, 2021 5:01 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Chen, Guchun 
; Teng, Rui ; Quan, Evan 

Subject: [PATCH] drm/amd/pm: a quick fix for "divided by zero" error

Considering Arcturus is a dedicated ASIC for computing, it will be more proper 
to drop the support for fan speed reading and setting. That's on the TODO list.

Change-Id: Id83a7a88f26644ba66c4fd15034b4fc861cc6901
Signed-off-by: Evan Quan 
Reported-by: Rui Teng 
---
 .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 20 ---
 .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c|  9 +++--
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
index fbf71fc92b16..273df66cac14 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
@@ -1227,8 +1227,12 @@ static int arcturus_get_fan_speed_rpm(struct smu_context 
*smu,
 
tmp64 = (uint64_t)crystal_clock_freq * 60 * 1;
tach_status = RREG32_SOC15(THM, 0, mmCG_TACH_STATUS_ARCT);
-   do_div(tmp64, tach_status);
-   *speed = (uint32_t)tmp64;
+   if (tach_status) {
+   do_div(tmp64, tach_status);
+   *speed = (uint32_t)tmp64;
+   } else {
+   *speed = 0;
+   }
 
break;
}
@@ -1303,12 +1307,14 @@ static int arcturus_get_fan_speed_pwm(struct 
smu_context *smu,
CG_FDO_CTRL1, FMAX_DUTY100);
duty = REG_GET_FIELD(RREG32_SOC15(THM, 0, mmCG_THERMAL_STATUS_ARCT),
CG_THERMAL_STATUS, FDO_PWM_DUTY);
-   if (!duty100)
-   return -EINVAL;
 
-   tmp64 = (uint64_t)duty * 255;
-   do_div(tmp64, duty100);
-   *speed = MIN((uint32_t)tmp64, 255);
+   if (duty100) {
+   tmp64 = (uint64_t)duty * 255;
+   do_div(tmp64, duty100);
+   *speed = MIN((uint32_t)tmp64, 255);
+   } else {
+   *speed = 0;
+   }
 
return 0;
 }
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
index 01b9653c39c7..87b055466a33 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
@@ -1306,8 +1306,13 @@ int smu_v11_0_get_fan_speed_rpm(struct smu_context *smu,
tmp64 = (uint64_t)crystal_clock_freq * 60 * 1;
 
tach_status = RREG32_SOC15(THM, 0, mmCG_TACH_STATUS);
-   do_div(tmp64, tach_status);
-   *speed = (uint32_t)tmp64;
+   if (tach_status) {
+   do_div(tmp64, tach_status);
+   *speed = (uint32_t)tmp64;
+   } else {
+   dev_warn_once(adev->dev, "Got zero output on CG_TACH_STATUS 
reading!\n");
+   *speed = 0;
+   }
 
return 0;
 }
--
2.29.0

RE: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-22 Thread Chen, Guchun

[Public]

Hi Andrey,

Thanks for your notice. The cause why moving drm_sched_fini to sw_fini is it's 
a SW behavior and part of SW shutdown, so hw_fini should not touch it. But if 
the race, that scheduler on the ring possibly keeps submitting jobs which 
causes un-empty ring is there, possibly we still need to call drm_sched_fini 
first in hw_fini to stop job submission first.

@Koenig, Christian what's your opinion?

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Friday, August 20, 2021 2:13 AM
To: Mike Lothian 
Cc: Grodzovsky, Andrey ; Chen, Guchun 
; amd-gfx list ; Gao, Likun 
; Koenig, Christian ; Zhang, 
Hawking ; Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)

Please go ahead.  Thanks!

Alex

On Thu, Aug 19, 2021 at 8:05 AM Mike Lothian  wrote:
>
> Hi
>
> Do I need to open a new bug report for this?
>
> Cheers
>
> Mike
>
> On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky  
> wrote:
>>
>>
>> On 2021-08-02 1:16 a.m., Guchun Chen wrote:
>> > In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to 
>> > stop scheduler in s3 test, otherwise, fence related failure will 
>> > arrive after resume. To fix this and for a better clean up, move 
>> > drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part of 
>> > driver shutdown, and should never be called in hw_fini.
>> >
>> > v2: rename amdgpu_fence_driver_init to amdgpu_fence_driver_sw_init, 
>> > to keep sw_init and sw_fini paired.
>> >
>> > Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable sequence
>> > Suggested-by: Christian König 
>> > Signed-off-by: Guchun Chen 
>> > ---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
>> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
>> >   3 files changed, 11 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > index b1d2dc39e8be..9e53ff851496 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device 
>> > *adev,
>> >
>> >   fence_driver_init:
>> >   /* Fence driver */
>> > - r = amdgpu_fence_driver_init(adev);
>> > + r = amdgpu_fence_driver_sw_init(adev);
>> >   if (r) {
>> > - dev_err(adev->dev, "amdgpu_fence_driver_init failed\n");
>> > + dev_err(adev->dev, "amdgpu_fence_driver_sw_init 
>> > + failed\n");
>> >   amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_FENCE_INIT_FAIL, 
>> > 0, 0);
>> >   goto failed;
>> >   }
>> > @@ -3988,7 +3988,6 @@ int amdgpu_device_resume(struct drm_device *dev, 
>> > bool fbcon)
>> >   }
>> >   amdgpu_fence_driver_hw_init(adev);
>> >
>> > -
>> >   r = amdgpu_device_ip_late_init(adev);
>> >   if (r)
>> >   return r;
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> > index 49c5c7331c53..7495911516c2 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> > @@ -498,7 +498,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
>> > *ring,
>> >   }
>> >
>> >   /**
>> > - * amdgpu_fence_driver_init - init the fence driver
>> > + * amdgpu_fence_driver_sw_init - init the fence driver
>> >* for all possible rings.
>> >*
>> >* @adev: amdgpu device pointer
>> > @@ -509,13 +509,13 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring 
>> > *ring,
>> >* amdgpu_fence_driver_start_ring().
>> >* Returns 0 for success.
>> >*/
>> > -int amdgpu_fence_driver_init(struct amdgpu_device *adev)
>> > +int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
>> >   {
>> >   return 0;
>> >   }
>> >
>> >   /**
>> > - * amdgpu_fence_driver_fini - tear down the fence driver
>> > + * amdgpu_fence_driver_hw_fini - tear down the fence driver
>> >* for all possible rings.
>> >*
>> >* @adev: amdgpu device pointer
>> > @@ -531,8 +531,7 @@

RE: [PATCH V2 1/3] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend

2021-08-23 Thread Chen, Guchun

[Public]

Series is:
Reviewed-by: Guchun Chen 

As we have rooted cause this issue, shall we revert former patch "drm/amdgpu: 
disable BACO support for 699F:C7 polaris12 SKU temporarily"?

Regards,
Guchun

-Original Message-
From: Quan, Evan  
Sent: Monday, August 23, 2021 4:35 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Chen, Guchun 
; Lazar, Lijo ; Zhu, James 
; Liu, Leo ; Quan, Evan 
; Pan, Xinhui 
Subject: [PATCH V2 1/3] drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE 
on suspend

Perform proper cleanups on UVD/VCE suspend: powergate enablement, clockgating 
enablement and dpm disablement. This can fix some hangs observed on suspending 
when UVD/VCE still using(e.g. issue "pm-suspend" when video is still playing).

Change-Id: I36f39d9731e0a9638b52d5d92558b0ee9c23a9ed
Signed-off-by: Evan Quan 
Signed-off-by: xinhui pan 
--
v1->v2:
  - move the changes to ->hw_fini() (James Zhu)
---
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 24   
drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 23 +++
 2 files changed, 47 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
index 4eebf973a065..c238aa2014fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
@@ -543,6 +543,30 @@ static int uvd_v6_0_hw_fini(void *handle)  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   /*
+* Proper cleanups before halting the HW engine:
+*   - cancel the delayed idle work
+*   - enable powergating
+*   - enable clockgating
+*   - disable dpm
+*
+* TODO: to align with the VCN implementation, move the
+* jobs for clockgating/powergating/dpm setting to
+* ->set_powergating_state().
+*/
+   cancel_delayed_work_sync(&adev->uvd.idle_work);
+
+   if (adev->pm.dpm_enabled) {
+   amdgpu_dpm_enable_uvd(adev, false);
+   } else {
+   amdgpu_asic_set_uvd_clocks(adev, 0, 0);
+   /* shutdown the UVD block */
+   amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_UVD,
+  AMD_PG_STATE_GATE);
+   amdgpu_device_ip_set_clockgating_state(adev, 
AMD_IP_BLOCK_TYPE_UVD,
+  AMD_CG_STATE_GATE);
+   }
+
if (RREG32(mmUVD_STATUS) != 0)
uvd_v6_0_stop(adev);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
index 6d9108fa22e0..e99877c13d5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v3_0.c
@@ -490,6 +490,29 @@ static int vce_v3_0_hw_fini(void *handle)
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+   /*
+* Proper cleanups before halting the HW engine:
+*   - cancel the delayed idle work
+*   - enable powergating
+*   - enable clockgating
+*   - disable dpm
+*
+* TODO: to align with the VCN implementation, move the
+* jobs for clockgating/powergating/dpm setting to
+* ->set_powergating_state().
+*/
+   cancel_delayed_work_sync(&adev->vce.idle_work);
+
+   if (adev->pm.dpm_enabled) {
+   amdgpu_dpm_enable_vce(adev, false);
+   } else {
+   amdgpu_asic_set_vce_clocks(adev, 0, 0);
+   amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_VCE,
+  AMD_PG_STATE_GATE);
+   amdgpu_device_ip_set_clockgating_state(adev, 
AMD_IP_BLOCK_TYPE_VCE,
+  AMD_CG_STATE_GATE);
+   }
+
r = vce_v3_0_wait_for_idle(handle);
if (r)
return r;
--
2.29.0

RE: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)

2021-08-27 Thread Chen, Guchun

[Public]

Hi Andrey and Christian,

I just send out a new patch to address this, I am not sure if I understand your 
point correctly. Please review.

The patch is to stop scheduler in fence_hw_fini and start the scheduler in 
fence_hw_init.

Regards,
Guchun

-Original Message-
From: Grodzovsky, Andrey  
Sent: Monday, August 23, 2021 10:42 PM
To: Christian König ; Chen, Guchun 
; Alex Deucher ; Mike Lothian 
; Koenig, Christian 
Cc: amd-gfx list ; Gao, Likun 
; Zhang, Hawking ; Deucher, Alexander 

Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)


On 2021-08-23 2:50 a.m., Christian König wrote:
> Good mornings guys,
>
> Andrey has a rather valid concern here, but I think we need to 
> approach this from a more high level view.
>
> When hw_fini is called we should make sure that the scheduler can't 
> submit any more work to the hardware, because the hw is finalized and 
> not expected to response any more.
>
> As far as I can see the cleanest approach would be to stop the 
> scheduler in hw_fini and fully clean it up in sw_fini. That would also 
> fit quite nicely with how GPU reset is supposed to work I think.
>
> Problem is that this is currently done outside of the fence code for 
> the at least the reset case, so before we restructure that we need to 
> stick with what we have.
>
> Andrey do you think it would be any problem if we stop the scheduler 
> manually in the hot plug case as well?


As long as it's 'parked' inside HW fini - meaning the thread submitting to HW 
is done I think it should cover hot unplug as well.

Andrey


>
> Thanks,
> Christian.
>
> Am 23.08.21 um 08:36 schrieb Chen, Guchun:
>> [Public]
>>
>> Hi Andrey,
>>
>> Thanks for your notice. The cause why moving drm_sched_fini to 
>> sw_fini is it's a SW behavior and part of SW shutdown, so hw_fini 
>> should not touch it. But if the race, that scheduler on the ring 
>> possibly keeps submitting jobs which causes un-empty ring is there, 
>> possibly we still need to call drm_sched_fini first in hw_fini to 
>> stop job submission first.
>>
>> @Koenig, Christian what's your opinion?
>>
>> Regards,
>> Guchun
>>
>> -Original Message-
>> From: Alex Deucher 
>> Sent: Friday, August 20, 2021 2:13 AM
>> To: Mike Lothian 
>> Cc: Grodzovsky, Andrey ; Chen, Guchun 
>> ; amd-gfx list ; 
>> Gao, Likun ; Koenig, Christian 
>> ; Zhang, Hawking ; 
>> Deucher, Alexander 
>> Subject: Re: [PATCH] drm/amdgpu: avoid over-handle of fence driver 
>> fini in s3 test (v2)
>>
>> Please go ahead.  Thanks!
>>
>> Alex
>>
>> On Thu, Aug 19, 2021 at 8:05 AM Mike Lothian 
>> wrote:
>>> Hi
>>>
>>> Do I need to open a new bug report for this?
>>>
>>> Cheers
>>>
>>> Mike
>>>
>>> On Wed, 18 Aug 2021 at 06:26, Andrey Grodzovsky 
>>>  wrote:
>>>>
>>>> On 2021-08-02 1:16 a.m., Guchun Chen wrote:
>>>>> In amdgpu_fence_driver_hw_fini, no need to call drm_sched_fini to 
>>>>> stop scheduler in s3 test, otherwise, fence related failure will 
>>>>> arrive after resume. To fix this and for a better clean up, move 
>>>>> drm_sched_fini from fence_hw_fini to fence_sw_fini, as it's part 
>>>>> of driver shutdown, and should never be called in hw_fini.
>>>>>
>>>>> v2: rename amdgpu_fence_driver_init to 
>>>>> amdgpu_fence_driver_sw_init, to keep sw_init and sw_fini paired.
>>>>>
>>>>> Fixes: cd87a6dcf6af drm/amdgpu: adjust fence driver enable 
>>>>> sequence
>>>>> Suggested-by: Christian König 
>>>>> Signed-off-by: Guchun Chen 
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  5 ++---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +++-
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  4 ++--
>>>>>    3 files changed, 11 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> index b1d2dc39e8be..9e53ff851496 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> @@ -3646,9 +3646,9 @@ int amdgpu_device_init(struct amdgpu_device 
>>>>> *adev,
>>>>>
>>>>>    fence_driver_init:
>>>>>

RE: [PATCH] drm/amdgpu: stop scheduler when calling hw_fini

2021-08-29 Thread Chen, Guchun

[Public]

Hi Andrey and Christian,

Thanks for your comment, I will send out a new patch set later on after I 
verify it.

Regards,
Guchun

-Original Message-
From: Grodzovsky, Andrey  
Sent: Saturday, August 28, 2021 2:28 AM
To: Koenig, Christian ; Chen, Guchun 
; amd-gfx@lists.freedesktop.org; Zhang, Hawking 
; m...@fireburn.co.uk; Deucher, Alexander 

Subject: Re: [PATCH] drm/amdgpu: stop scheduler when calling hw_fini

I don't think it will start/stop twice because 
amdgpu_fence_driver_hw_fini/inint is not called during reset.

I am worried about calling drm_sched_start without calling 
drm_sched_resubmit_job first since that the place where the jobs are actually 
restarted. Also calling drm_sched_start with false flag  wrong here since it 
skips all the pending list handling.

Andrey

On 2021-08-27 7:34 a.m., Christian König wrote:
> In general that looks good to me, but what could be is that we now try 
> to stop/start the scheduler during reset twice.
>
> Andrey what do you think?
>
> Christian.
>
> Am 27.08.21 um 12:40 schrieb Guchun Chen:
>> This gurantees no more work on the ring can be submitted to hardware 
>> in suspend/resume case, otherwise the ring will not be empty before 
>> suspend.
>>
>> Suggested-by: Christian König 
>> Signed-off-by: Guchun Chen 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 6 ++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index b439eb7d4177..d6e429e63604 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -552,6 +552,9 @@ void amdgpu_fence_driver_hw_fini(struct
>> amdgpu_device *adev)
>>   if (!ring || !ring->fence_drv.initialized)
>>   continue;
>>   +    if (!ring->no_scheduler)
>> +    drm_sched_stop(&ring->sched, NULL);
>> +
>>   /* You can't wait for HW to signal if it's gone */
>>   if (!drm_dev_is_unplugged(&adev->ddev))
>>   r = amdgpu_fence_wait_empty(ring); @@ -611,6 +614,9 @@ 
>> void amdgpu_fence_driver_hw_init(struct
>> amdgpu_device *adev)
>>   if (!ring || !ring->fence_drv.initialized)
>>   continue;
>>   +    if (!ring->no_scheduler)
>> +    drm_sched_start(&ring->sched, false);
>> +
>>   /* enable the interrupt */
>>   if (ring->fence_drv.irq_src)
>>   amdgpu_irq_get(adev, ring->fence_drv.irq_src,
>

RE: [PATCH v3 1/1] drm/amdkfd: make needs_pcie_atomics FW-version dependent

2021-09-09 Thread Chen, Guchun

[Public]

Move PCIe atomic detection from kgf2kfd_probe into kgf2kfd_device_init because 
the MEC firmware is not loaded yet at the probe stage

A spelling typo, s/kgf2kfd_device_init/ kgd2kfd_device_init

With above fixed, the patch is: Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Felix 
Kuehling
Sent: Friday, September 10, 2021 1:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Shaoyun 
Subject: Re: [PATCH v3 1/1] drm/amdkfd: make needs_pcie_atomics FW-version 
dependent

Am 2021-09-08 um 6:48 p.m. schrieb Felix Kuehling:
> On some GPUs the PCIe atomic requirement for KFD depends on the MEC 
> firmware version. Add a firmware version check for this. The minimum 
> firmware version that works without atomics can be updated in the 
> device_info structure for each GPU type.
>
> Move PCIe atomic detection from kgf2kfd_probe into kgf2kfd_device_init 
> because the MEC firmware is not loaded yet at the probe stage.
>
> Signed-off-by: Felix Kuehling 
I tested this change on a Sienna Cichlid on a system without PCIe atomics, both 
with the old and the new firmware. This version of the change should be good to 
go if I can get an R-b.

Thanks,
  Felix


> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 44 -
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h   |  1 +
>  2 files changed, 29 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 16a57b70cc1a..30fde852af19 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -468,6 +468,7 @@ static const struct kfd_device_info navi10_device_info = {
>   .needs_iommu_device = false,
>   .supports_cwsr = true,
>   .needs_pci_atomics = true,
> + .no_atomic_fw_version = 145,
>   .num_sdma_engines = 2,
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 8,
> @@ -487,6 +488,7 @@ static const struct kfd_device_info navi12_device_info = {
>   .needs_iommu_device = false,
>   .supports_cwsr = true,
>   .needs_pci_atomics = true,
> + .no_atomic_fw_version = 145,
>   .num_sdma_engines = 2,
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 8,
> @@ -506,6 +508,7 @@ static const struct kfd_device_info navi14_device_info = {
>   .needs_iommu_device = false,
>   .supports_cwsr = true,
>   .needs_pci_atomics = true,
> + .no_atomic_fw_version = 145,
>   .num_sdma_engines = 2,
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 8,
> @@ -525,6 +528,7 @@ static const struct kfd_device_info 
> sienna_cichlid_device_info = {
>   .needs_iommu_device = false,
>   .supports_cwsr = true,
>   .needs_pci_atomics = true,
> + .no_atomic_fw_version = 92,
>   .num_sdma_engines = 4,
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 8,
> @@ -544,6 +548,7 @@ static const struct kfd_device_info 
> navy_flounder_device_info = {
>   .needs_iommu_device = false,
>   .supports_cwsr = true,
>   .needs_pci_atomics = true,
> + .no_atomic_fw_version = 92,
>   .num_sdma_engines = 2,
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 8,
> @@ -562,7 +567,8 @@ static const struct kfd_device_info vangogh_device_info = 
> {
>   .mqd_size_aligned = MQD_SIZE_ALIGNED,
>   .needs_iommu_device = false,
>   .supports_cwsr = true,
> - .needs_pci_atomics = false,
> + .needs_pci_atomics = true,
> + .no_atomic_fw_version = 92,
>   .num_sdma_engines = 1,
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 2,
> @@ -582,6 +588,7 @@ static const struct kfd_device_info 
> dimgrey_cavefish_device_info = {
>   .needs_iommu_device = false,
>   .supports_cwsr = true,
>   .needs_pci_atomics = true,
> + .no_atomic_fw_version = 92,
>   .num_sdma_engines = 2,
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 8,
> @@ -601,6 +608,7 @@ static const struct kfd_device_info 
> beige_goby_device_info = {
>   .needs_iommu_device = false,
>   .supports_cwsr = true,
>   .needs_pci_atomics = true,
> + .no_atomic_fw_version = 92,
>   .num_sdma_engines = 1,
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 8,
> @@ -619,7 +627,8 @@ static const struct kfd_device_info 
> yellow_carp_device_info = {
>   .mqd_size_aligned = MQD_SIZE_ALIGNED,
>   .needs_iommu_device = false,
>   .supports_cwsr = true,
> - .needs_pci_atomics = false,
> + .needs_pci_atomics = true,
> + .no_atomic_fw_version = 92,
>   .num_sdma_engines = 1,
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 2,
> @@ -708,20 +717,6 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
>   if (!kfd)
>   return NULL;
>  
> - /* Allow BIF to recode atomics to PCIe 3.0 AtomicO

RE: [PATCH] drm/amd/pm: fix runpm hang when amdgpu loaded prior to sound driver

2021-09-09 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Friday, September 10, 2021 11:18 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Lazar, Lijo 
; Quan, Evan ; Pelloux-prayer, 
Pierre-eric 
Subject: [PATCH] drm/amd/pm: fix runpm hang when amdgpu loaded prior to sound 
driver

Current RUNPM mechanism relies on PMFW to master the timing for BACO in/exit. 
And that needs cooperation from sound driver for dstate change notification for 
function 1(audio). Otherwise(on sound driver missing), BACO cannot be kicked in 
correctly and hang will be observed on RUNPM exit.

By switching back to legacy message way on sound driver missing, we are able to 
fix the runpm hang observed for the scenario below:
amdgpu driver loaded -> runpm suspend kicked -> sound driver loaded

Change-Id: I0e44fef11349b5e45e6102913eb46c8c7d279c65
Signed-off-by: Evan Quan 
Reported-by: Pierre-Eric Pelloux-Prayer 
---
 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 24 +--
 .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   |  4 ++--
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 21 
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h|  2 ++
 4 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 7bc90f841a11..bcafccf7f07a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -2272,7 +2272,27 @@ static int navi10_baco_enter(struct smu_context *smu)  {
struct amdgpu_device *adev = smu->adev;
 
-   if (adev->in_runpm)
+   /*
+* This aims the case below:
+*   amdgpu driver loaded -> runpm suspend kicked -> sound driver loaded
+*
+* For NAVI10 and later ASICs, we rely on PMFW to handle the runpm. To
+* make that possible, PMFW needs to acknowledge the dstate transition
+* process for both gfx(function 0) and audio(function 1) function of
+* the ASIC.
+*
+* The PCI device's initial runpm status is RUNPM_SUSPENDED. So as the
+* device representing the audio function of the ASIC. And that means
+* even if the sound driver(snd_hda_intel) was not loaded yet, it's 
still
+* possible runpm suspend kicked on the ASIC. However without the dstate
+* transition notification from audio function, pmfw cannot handle the
+* BACO in/exit correctly. And that will cause driver hang on runpm
+* resuming.
+*
+* To address this, we revert to legacy message way(driver masters the
+* timing for BACO in/exit) on sound driver missing.
+*/
+   if (adev->in_runpm && smu_cmn_is_audio_func_enabled(adev))
return smu_v11_0_baco_set_armd3_sequence(smu, BACO_SEQ_BACO);
else
return smu_v11_0_baco_enter(smu);
@@ -2282,7 +2302,7 @@ static int navi10_baco_exit(struct smu_context *smu)  {
struct amdgpu_device *adev = smu->adev;
 
-   if (adev->in_runpm) {
+   if (adev->in_runpm && smu_cmn_is_audio_func_enabled(adev)) {
/* Wait for PMFW handling for the Dstate change */
msleep(10);
return smu_v11_0_baco_set_armd3_sequence(smu, BACO_SEQ_ULPS); 
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index 43c7580a4ea6..f9b730c5ba9e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -2361,7 +2361,7 @@ static int sienna_cichlid_baco_enter(struct smu_context 
*smu)  {
struct amdgpu_device *adev = smu->adev;
 
-   if (adev->in_runpm)
+   if (adev->in_runpm && smu_cmn_is_audio_func_enabled(adev))
return smu_v11_0_baco_set_armd3_sequence(smu, BACO_SEQ_BACO);
else
return smu_v11_0_baco_enter(smu);
@@ -2371,7 +2371,7 @@ static int sienna_cichlid_baco_exit(struct smu_context 
*smu)  {
struct amdgpu_device *adev = smu->adev;
 
-   if (adev->in_runpm) {
+   if (adev->in_runpm && smu_cmn_is_audio_func_enabled(adev)) {
/* Wait for PMFW handling for the Dstate change */
msleep(10);
return smu_v11_0_baco_set_armd3_sequence(smu, BACO_SEQ_ULPS); 
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 69da9a7b665f..d61403e917df 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -1055,3 +1055,24 @@ int smu_cmn_set_mp1_state(struct smu_context *smu,
 
return ret;
 }
+
+bool smu_cmn_is_audio_func_enabled(struct amdgpu_device *adev) {
+   struct pci_dev *p = NULL;
+   bool snd_driver_loaded;
+
+   /*
+* If the ASIC comes with no audio function, we always assume

RE: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array bounds

2021-09-10 Thread Chen, Guchun

[Public]

Hi Christian and Xinhui,

Thanks for your suggestion. The cause is I saw data corruption in several 
proprietary use cases. BUILD_BUG_ON will have build variation per gcc 
difference?

Anyway, WARN_ON is fine to me, and I will send a new patch set soon to address 
this.

Regards,
Guchun

From: Koenig, Christian 
Sent: Friday, September 10, 2021 2:37 PM
To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; Deucher, Alexander 
; Chen, Guchun 
Cc: Shi, Leslie 
Subject: Re: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array 
bounds

Yeah, that's a good point.

If build_bug_on() doesn't works for some reason then we at least need to lower 
this to a WARN_ON.

A BUG_ON() is only justified if we prevent strong data corruption with it or 
note a NULL pointer earlier on or similar.

Regards,
Christian.
Am 10.09.21 um 06:36 schrieb Pan, Xinhui:

[AMD Official Use Only]

looks good to me.
But maybe build_bug_on works too and more reasonable to detect such wrong usage.
____
From: Chen, Guchun <mailto:guchun.c...@amd.com>
Sent: Friday, September 10, 2021 12:30:14 PM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>; 
dri-de...@lists.freedesktop.org<mailto:dri-de...@lists.freedesktop.org> 
<mailto:dri-de...@lists.freedesktop.org>; 
Koenig, Christian <mailto:christian.koe...@amd.com>; 
Pan, Xinhui <mailto:xinhui@amd.com>; Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Cc: Chen, Guchun <mailto:guchun.c...@amd.com>; Shi, Leslie 
<mailto:yuliang@amd.com>
Subject: [PATCH] drm/ttm: add a BUG_ON in ttm_set_driver_manager when array 
bounds

Vendor will define their own memory types on top of TTM_PL_PRIV,
but call ttm_set_driver_manager directly without checking mem_type
value when setting up memory manager. So add such check to aware
the case when array bounds.

Signed-off-by: Leslie Shi <mailto:yuliang@amd.com>
Signed-off-by: Guchun Chen <mailto:guchun.c...@amd.com>
---
 include/drm/ttm/ttm_device.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 7a0f561c57ee..24ad76ca8022 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -308,6 +308,7 @@ ttm_manager_type(struct ttm_device *bdev, int mem_type)
 static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,
   struct ttm_resource_manager *manager)
 {
+   BUG_ON(type >= TTM_NUM_MEM_TYPES);
 bdev->man_drv[type] = manager;
 }

--
2.17.1

RE: [PATCH] drm/amdgpu: Get atomicOps info from Host for sriov setup

2021-09-10 Thread Chen, Guchun

[Public]

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
old mode 100644
new mode 100755

Please don't modify the file mode.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of shaoyunl
Sent: Friday, September 10, 2021 10:26 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Shaoyun 
Subject: [PATCH] drm/amdgpu: Get atomicOps info from Host for sriov setup

The AtomicOp Requester Enable bit is reserved in VFs and the PF value applies 
to all associated VFs. so guest driver can not directly enable the atomicOps 
for VF, it depends on PF to enable it. In current design, amdgpu driver  will 
get the enabled atomicOps bits through private pf2vf data

Signed-off-by: shaoyunl 
Change-Id: Ifdbcb4396d64e3f3cbf6bcbf7ab9c7b2cb061052
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 24 +++--  
drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h |  4 +++-
 2 files changed, 16 insertions(+), 12 deletions(-)  mode change 100644 => 
100755 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
 mode change 100644 => 100755 drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
old mode 100644
new mode 100755
index 653bd8fdaa33..3ae1721ca859
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3529,17 +3529,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
DRM_INFO("register mmio base: 0x%08X\n", (uint32_t)adev->rmmio_base);
DRM_INFO("register mmio size: %u\n", (unsigned)adev->rmmio_size);
 
-   /* enable PCIE atomic ops */
-   r = pci_enable_atomic_ops_to_root(adev->pdev,
- PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
- PCI_EXP_DEVCAP2_ATOMIC_COMP64);
-   if (r) {
-   adev->have_atomics_support = false;
-   DRM_INFO("PCIE atomic ops is not supported\n");
-   } else {
-   adev->have_atomics_support = true;
-   }
-
amdgpu_device_get_pcie_info(adev);
 
if (amdgpu_mcbp)
@@ -3562,6 +3551,19 @@ int amdgpu_device_init(struct amdgpu_device *adev,
if (r)
return r;
 
+   /* enable PCIE atomic ops */
+   if (amdgpu_sriov_vf(adev))
+   adev->have_atomics_support = ((struct amd_sriov_msg_pf2vf_info 
*)
+   
adev->virt.fw_reserve.p_pf2vf)->pcie_atomic_ops_enabled_flags ==
+   (PCI_EXP_DEVCAP2_ATOMIC_COMP32 | 
PCI_EXP_DEVCAP2_ATOMIC_COMP64);
+   else
+   adev->have_atomics_support =
+   !pci_enable_atomic_ops_to_root(adev->pdev,
+ PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
+ PCI_EXP_DEVCAP2_ATOMIC_COMP64);
+   if (!adev->have_atomics_support)
+   dev_info(adev->dev, "PCIE atomic ops is not supported\n");
+
/* doorbell bar mapping and doorbell index init*/
amdgpu_device_doorbell_init(adev);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h 
b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
old mode 100644
new mode 100755
index a434c71fde8e..995899191288
--- a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
@@ -204,8 +204,10 @@ struct amd_sriov_msg_pf2vf_info {
} mm_bw_management[AMD_SRIOV_MSG_RESERVE_VCN_INST];
/* UUID info */
struct amd_sriov_msg_uuid_info uuid_info;
+   /* pcie atomic Ops info */
+   uint32_t pcie_atomic_ops_enabled_flags;
/* reserved */
-   uint32_t reserved[256 - 47];
+   uint32_t reserved[256 - 48];
 };
 
 struct amd_sriov_msg_vf2pf_info_header {
--
2.17.1

RE: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when array bounds (v2)

2021-09-12 Thread Chen, Guchun

[Public]

Thanks for your suggestion, Robin. Do you agree with this as well, Christian 
and Xinhui?

Regards,
Guchun

-Original Message-
From: Robin Murphy  
Sent: Saturday, September 11, 2021 2:25 AM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; Koenig, Christian ; 
Pan, Xinhui ; Deucher, Alexander 
Cc: Shi, Leslie 
Subject: Re: [PATCH] drm/ttm: add a WARN_ON in ttm_set_driver_manager when 
array bounds (v2)

On 2021-09-10 11:09, Guchun Chen wrote:
> Vendor will define their own memory types on top of TTM_PL_PRIV, but 
> call ttm_set_driver_manager directly without checking mem_type value 
> when setting up memory manager. So add such check to aware the case 
> when array bounds.
> 
> v2: lower check level to WARN_ON
> 
> Signed-off-by: Leslie Shi 
> Signed-off-by: Guchun Chen 
> ---
>   include/drm/ttm/ttm_device.h | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/include/drm/ttm/ttm_device.h 
> b/include/drm/ttm/ttm_device.h index 07d722950d5b..aa79953c807c 100644
> --- a/include/drm/ttm/ttm_device.h
> +++ b/include/drm/ttm/ttm_device.h
> @@ -291,6 +291,7 @@ ttm_manager_type(struct ttm_device *bdev, int mem_type)
>   static inline void ttm_set_driver_manager(struct ttm_device *bdev, int type,
> struct ttm_resource_manager *manager)
>   {
> + WARN_ON(type >= TTM_NUM_MEM_TYPES);

Nit: I know nothing about this code, but from the context alone it would seem 
sensible to do

if (WARN_ON(type >= TTM_NUM_MEM_TYPES))
return;

to avoid making the subsequent assignment when we *know* it's invalid and 
likely to corrupt memory.

Robin.

>   bdev->man_drv[type] = manager;
>   }
>   
>

RE: [PATCH] drm/amdgpu: move amdgpu_virt_release_full_gpu to fini_early stage

2021-09-21 Thread Chen, Guchun

[Public]

Ping...

Regards,
Guchun

-Original Message-
From: Chen, Guchun  
Sent: Saturday, September 18, 2021 2:09 PM
To: amd-gfx@lists.freedesktop.org; Koenig, Christian 
; Pan, Xinhui ; Deucher, 
Alexander ; Grodzovsky, Andrey 
; Liu, Monk 
Cc: Chen, Guchun ; Shi, Leslie 
Subject: [PATCH] drm/amdgpu: move amdgpu_virt_release_full_gpu to fini_early 
stage

adev->rmmio is set to be NULL in amdgpu_device_unmap_mmio to prevent
access after pci_remove, however, in SRIOV case, amdgpu_virt_release_full_gpu 
will still use adev->rmmio for access after amdgpu_device_unmap_mmio.
The patch is to move such SRIOV calling earlier to fini_early stage.

Fixes: 07775fc13878("drm/amdgpu: Unmap all MMIO mappings")
Cc: Andrey Grodzovsky 
Signed-off-by: Leslie Shi 
Signed-off-by: Guchun Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f3da97086f7d..2a75c09c4884 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2810,6 +2810,11 @@ static int amdgpu_device_ip_fini_early(struct 
amdgpu_device *adev)
adev->ip_blocks[i].status.hw = false;
}
 
+   if (amdgpu_sriov_vf(adev)) {
+   if (amdgpu_virt_release_full_gpu(adev, false))
+   DRM_ERROR("failed to release exclusive mode on fini\n");
+   }
+
return 0;
 }
 
@@ -2870,10 +2875,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
 
amdgpu_ras_fini(adev);
 
-   if (amdgpu_sriov_vf(adev))
-   if (amdgpu_virt_release_full_gpu(adev, false))
-   DRM_ERROR("failed to release exclusive mode on fini\n");
-
return 0;
 }
 
--
2.17.1

RE: [PATCH 09/66] drm/amdgpu/sdma5.2: convert to IP version checking

2021-09-21 Thread Chen, Guchun

[Public]

> + switch (adev->ip_versions[SDMA0_HWIP]) {
> + case IP_VERSION(5, 2, 0):
>   adev->sdma.num_instances = 4;
>   break;
Isn't the instance count also expected from discovery table?

This will be addressed in patch 54 of the series.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Lazar, Lijo
Sent: Wednesday, September 22, 2021 1:56 PM
To: Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 09/66] drm/amdgpu/sdma5.2: convert to IP version checking



On 9/21/2021 11:36 PM, Alex Deucher wrote:
> Use IP versions rather than asic_type to differentiate IP version 
> specific features.
> 
> Signed-off-by: Alex Deucher 
> ---
>   drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 48 +-
>   1 file changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> index e4a96e7e386d..c5252f12eee9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
> @@ -136,23 +136,23 @@ static int sdma_v5_2_init_microcode(struct 
> amdgpu_device *adev)
>   
>   DRM_DEBUG("\n");
>   
> - switch (adev->asic_type) {
> - case CHIP_SIENNA_CICHLID:
> + switch (adev->ip_versions[SDMA0_HWIP]) {
> + case IP_VERSION(5, 2, 0):
>   chip_name = "sienna_cichlid";
>   break;
> - case CHIP_NAVY_FLOUNDER:
> + case IP_VERSION(5, 2, 2):
>   chip_name = "navy_flounder";
>   break;
> - case CHIP_VANGOGH:
> + case IP_VERSION(5, 2, 1):
>   chip_name = "vangogh";
>   break;
> - case CHIP_DIMGREY_CAVEFISH:
> + case IP_VERSION(5, 2, 4):
>   chip_name = "dimgrey_cavefish";
>   break;
> - case CHIP_BEIGE_GOBY:
> + case IP_VERSION(5, 2, 5):
>   chip_name = "beige_goby";
>   break;
> - case CHIP_YELLOW_CARP:
> + case IP_VERSION(5, 2, 3):
>   chip_name = "yellow_carp";
>   break;
>   default:
> @@ -174,7 +174,7 @@ static int sdma_v5_2_init_microcode(struct amdgpu_device 
> *adev)
>  (void *)&adev->sdma.instance[0],
>  sizeof(struct amdgpu_sdma_instance));
>   
> - if (amdgpu_sriov_vf(adev) && (adev->asic_type == CHIP_SIENNA_CICHLID))
> + if (amdgpu_sriov_vf(adev) && (adev->ip_versions[SDMA0_HWIP] == 
> +IP_VERSION(5, 2, 0)))
>   return 0;
>   
>   DRM_DEBUG("psp_load == '%s'\n",
> @@ -1209,17 +1209,17 @@ static int sdma_v5_2_early_init(void *handle)
>   {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>   
> - switch (adev->asic_type) {
> - case CHIP_SIENNA_CICHLID:
> + switch (adev->ip_versions[SDMA0_HWIP]) {
> + case IP_VERSION(5, 2, 0):
>   adev->sdma.num_instances = 4;
>   break;
Isn't the instance count also expected from discovery table?

Thanks,
Lijo

> - case CHIP_NAVY_FLOUNDER:
> - case CHIP_DIMGREY_CAVEFISH:
> + case IP_VERSION(5, 2, 2):
> + case IP_VERSION(5, 2, 4):
>   adev->sdma.num_instances = 2;
>   break;
> - case CHIP_VANGOGH:
> - case CHIP_BEIGE_GOBY:
> - case CHIP_YELLOW_CARP:
> + case IP_VERSION(5, 2, 1):
> + case IP_VERSION(5, 2, 5):
> + case IP_VERSION(5, 2, 3):
>   adev->sdma.num_instances = 1;
>   break;
>   default:
> @@ -1547,7 +1547,7 @@ static void 
> sdma_v5_2_update_medium_grain_clock_gating(struct amdgpu_device *ade
>   
>   for (i = 0; i < adev->sdma.num_instances; i++) {
>   
> - if (adev->sdma.instance[i].fw_version < 70 && adev->asic_type 
> == CHIP_VANGOGH)
> + if (adev->sdma.instance[i].fw_version < 70 && 
> +adev->ip_versions[SDMA0_HWIP] == IP_VERSION(5, 2, 1))
>   adev->cg_flags &= ~AMD_CG_SUPPORT_SDMA_MGCG;
>   
>   if (enable && (adev->cg_flags & AMD_CG_SUPPORT_SDMA_MGCG)) { @@ 
> -1584,7 +1584,7 @@ static void 
> sdma_v5_2_update_medium_grain_light_sleep(struct amdgpu_device *adev
>   
>   for (i = 0; i < adev->sdma.num_instances; i++) {
>   
> - if (adev->sdma.instance[i].fw_version < 70 && adev->asic_type 
> == CHIP_VANGOGH)
> + if (adev->sdma.instance[i].fw_version < 70 && 
> +adev->ip_versions[SDMA0_HWIP] == IP_VERSION(5, 2, 1))
>   adev->cg_flags &= ~AMD_CG_SUPPORT_SDMA_LS;
>   
>   if (enable && (adev->cg_flags & AMD_CG_SUPPORT_SDMA_LS)) { @@ 
> -1613,13 +1613,13 @@ static int sdma_v5_2_set_clockgating_state(void *handle,
>   if (amdgpu_sriov_vf(adev))
>   return 0;
>   
> - switch (adev->asic_type) {
> - case CHIP_SIENNA_CICHLID:
> - case CHIP_NAVY_FLOUNDER:
> - case CHIP_VANGOGH:
> - case CHIP_DIMGREY_CAVEFISH:
> - case CHIP_BEIGE_GOBY:
> - case CHIP_YELLOW_CARP:
> + switch (adev->ip_versions[SDMA0_HWIP])

RE: [PATCH] drm/amdgpu: add missed write lock for pci detected state pci_channel_io_normal

2021-10-01 Thread Chen, Guchun

[Public]

Hi Andrey,

Do you mean to move the code of drm_sched_resubmit_jobs and drm_sched_start in 
amdgpu_pci_resume to amdgpu_pci_error_detected, under the case 
pci_channel_io_frozen?
Then leave amdgpu_pci_resume as a null function, and in this way, we can drop 
the acquire/lock write lock for case of pci_channel_io_normal as well?

Regards,
Guchun

-Original Message-
From: Grodzovsky, Andrey  
Sent: Friday, October 1, 2021 10:22 AM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Koenig, 
Christian ; Pan, Xinhui ; 
Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: add missed write lock for pci detected state 
pci_channel_io_normal

On 2021-09-30 10:00 p.m., Guchun Chen wrote:

> When a PCI error state pci_channel_io_normal is detectd, it will 
> report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI driver 
> will continue the execution of PCI resume callback report_resume by 
> pci_walk_bridge, and the callback will go into amdgpu_pci_resume 
> finally, where write lock is releasd unconditionally without acquiring 
> such lock.

Good catch but, the issue is even wider in scope, what about 
drm_sched_resubmit_jobs and drm_sched_start called without being stopped before 
? Better to put the entire scope of code in this function under flag that set 
only in pci_channel_io_frozen. As far as i remember we don't need to do 
anything in case of pci_channel_io_normal.

Andrey

>
> Fixes: c9a6b82f45e2("drm/amdgpu: Implement DPC recovery")
> Signed-off-by: Guchun Chen 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index bb5ad2b6ca13..12f822d51de2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5370,6 +5370,7 @@ pci_ers_result_t 
> amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
>   
>   switch (state) {
>   case pci_channel_io_normal:
> + amdgpu_device_lock_adev(adev, NULL);
>   return PCI_ERS_RESULT_CAN_RECOVER;
>   /* Fatal error, prepare for slot reset */
>   case pci_channel_io_frozen:

RE: [PATCH] drm/amdgpu: add missed write lock for pci detected state pci_channel_io_normal

2021-10-01 Thread Chen, Guchun

[Public]

Got your point. Will send a new patch to address this.

Regards,
Guchun

-Original Message-
From: Grodzovsky, Andrey  
Sent: Friday, October 1, 2021 10:29 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Koenig, 
Christian ; Pan, Xinhui ; 
Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: add missed write lock for pci detected state 
pci_channel_io_normal

No, scheduler restart and device unlock must take place inamdgpu_pci_resume 
(see struct pci_error_handlers for the various states of PCI recovery). So just 
add a flag (probably in amdgpu_device) so we can remember what 
pci_channel_state_t we came from (unfortunately it's not passed to us in  
amdgpu_pci_resume) and unless it's set don't do anything in amdgpu_pci_resume.

Andrey

On 2021-10-01 4:21 a.m., Chen, Guchun wrote:
> [Public]
>
> Hi Andrey,
>
> Do you mean to move the code of drm_sched_resubmit_jobs and drm_sched_start 
> in amdgpu_pci_resume to amdgpu_pci_error_detected, under the case 
> pci_channel_io_frozen?
> Then leave amdgpu_pci_resume as a null function, and in this way, we can drop 
> the acquire/lock write lock for case of pci_channel_io_normal as well?
>
> Regards,
> Guchun
>
> -Original Message-
> From: Grodzovsky, Andrey 
> Sent: Friday, October 1, 2021 10:22 AM
> To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; 
> Koenig, Christian ; Pan, Xinhui 
> ; Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu: add missed write lock for pci 
> detected state pci_channel_io_normal
>
> On 2021-09-30 10:00 p.m., Guchun Chen wrote:
>
>> When a PCI error state pci_channel_io_normal is detectd, it will 
>> report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI 
>> driver will continue the execution of PCI resume callback 
>> report_resume by pci_walk_bridge, and the callback will go into 
>> amdgpu_pci_resume finally, where write lock is releasd 
>> unconditionally without acquiring such lock.
>
> Good catch but, the issue is even wider in scope, what about 
> drm_sched_resubmit_jobs and drm_sched_start called without being stopped 
> before ? Better to put the entire scope of code in this function under flag 
> that set only in pci_channel_io_frozen. As far as i remember we don't need to 
> do anything in case of pci_channel_io_normal.
>
> Andrey
>
>
>> Fixes: c9a6b82f45e2("drm/amdgpu: Implement DPC recovery")
>> Signed-off-by: Guchun Chen 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>>1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index bb5ad2b6ca13..12f822d51de2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -5370,6 +5370,7 @@ pci_ers_result_t 
>> amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
>>
>>  switch (state) {
>>  case pci_channel_io_normal:
>> +amdgpu_device_lock_adev(adev, NULL);
>>  return PCI_ERS_RESULT_CAN_RECOVER;
>>  /* Fatal error, prepare for slot reset */
>>  case pci_channel_io_frozen:

RE: [PATCH] drm/amdgpu: add missed write lock for pci detected state pci_channel_io_normal

2021-10-02 Thread Chen, Guchun

[Public]

Hi Andrey,

A new patch with subject "drm/amdgpu: handle the case of pci_channel_io_frozen 
only in amdgpu_pci_resume" has been sent, pls review it. Thanks.

Regards,
Guchun

-Original Message-----
From: Chen, Guchun 
Sent: Friday, October 1, 2021 11:21 PM
To: Grodzovsky, Andrey ; 
amd-gfx@lists.freedesktop.org; Koenig, Christian ; 
Pan, Xinhui ; Deucher, Alexander 
Subject: RE: [PATCH] drm/amdgpu: add missed write lock for pci detected state 
pci_channel_io_normal

[Public]

Got your point. Will send a new patch to address this.

Regards,
Guchun

-Original Message-
From: Grodzovsky, Andrey 
Sent: Friday, October 1, 2021 10:29 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Koenig, 
Christian ; Pan, Xinhui ; 
Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: add missed write lock for pci detected state 
pci_channel_io_normal

No, scheduler restart and device unlock must take place inamdgpu_pci_resume 
(see struct pci_error_handlers for the various states of PCI recovery). So just 
add a flag (probably in amdgpu_device) so we can remember what 
pci_channel_state_t we came from (unfortunately it's not passed to us in  
amdgpu_pci_resume) and unless it's set don't do anything in amdgpu_pci_resume.

Andrey

On 2021-10-01 4:21 a.m., Chen, Guchun wrote:
> [Public]
>
> Hi Andrey,
>
> Do you mean to move the code of drm_sched_resubmit_jobs and drm_sched_start 
> in amdgpu_pci_resume to amdgpu_pci_error_detected, under the case 
> pci_channel_io_frozen?
> Then leave amdgpu_pci_resume as a null function, and in this way, we can drop 
> the acquire/lock write lock for case of pci_channel_io_normal as well?
>
> Regards,
> Guchun
>
> -Original Message-
> From: Grodzovsky, Andrey 
> Sent: Friday, October 1, 2021 10:22 AM
> To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; 
> Koenig, Christian ; Pan, Xinhui 
> ; Deucher, Alexander 
> Subject: Re: [PATCH] drm/amdgpu: add missed write lock for pci 
> detected state pci_channel_io_normal
>
> On 2021-09-30 10:00 p.m., Guchun Chen wrote:
>
>> When a PCI error state pci_channel_io_normal is detectd, it will 
>> report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI 
>> driver will continue the execution of PCI resume callback 
>> report_resume by pci_walk_bridge, and the callback will go into 
>> amdgpu_pci_resume finally, where write lock is releasd 
>> unconditionally without acquiring such lock.
>
> Good catch but, the issue is even wider in scope, what about 
> drm_sched_resubmit_jobs and drm_sched_start called without being stopped 
> before ? Better to put the entire scope of code in this function under flag 
> that set only in pci_channel_io_frozen. As far as i remember we don't need to 
> do anything in case of pci_channel_io_normal.
>
> Andrey
>
>
>> Fixes: c9a6b82f45e2("drm/amdgpu: Implement DPC recovery")
>> Signed-off-by: Guchun Chen 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>>1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index bb5ad2b6ca13..12f822d51de2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -5370,6 +5370,7 @@ pci_ers_result_t 
>> amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
>>
>>  switch (state) {
>>  case pci_channel_io_normal:
>> +amdgpu_device_lock_adev(adev, NULL);
>>  return PCI_ERS_RESULT_CAN_RECOVER;
>>  /* Fatal error, prepare for slot reset */
>>  case pci_channel_io_frozen:

RE: [PATCH] drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume

2021-10-05 Thread Chen, Guchun

[Public]

Thanks Andrey, I will update the name to be " pci_channel_state " when 
submitting.

Regards,
Guchun

-Original Message-
From: Grodzovsky, Andrey  
Sent: Tuesday, October 5, 2021 12:04 AM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Koenig, 
Christian ; Pan, Xinhui ; 
Deucher, Alexander ; Liu, Monk 
Subject: Re: [PATCH] drm/amdgpu: handle the case of pci_channel_io_frozen only 
in amdgpu_pci_resume


On 2021-10-02 11:18 a.m., Guchun Chen wrote:
> In current code, when a PCI error state pci_channel_io_normal is 
> detectd, it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI 
> driver, and PCI driver will continue the execution of PCI resume 
> callback report_resume by pci_walk_bridge, and the callback will go 
> into amdgpu_pci_resume finally, where write lock is releasd 
> unconditionally without acquiring such lock first. In this case, a 
> deadlock will happen when other threads start to acquire the read lock.
>
> To fix this, add a member in amdgpu_device strucutre to cache 
> pci_channel_state, and only continue the execution in 
> amdgpu_pci_resume when it's pci_channel_io_frozen.
>
> Fixes: c9a6b82f45e2("drm/amdgpu: Implement DPC recovery")
> Suggested-by: Andrey Grodzovsky 
> Signed-off-by: Guchun Chen 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++
>   2 files changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index f4bceb2624fb..720d0ccecfe0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1094,6 +1094,7 @@ struct amdgpu_device {
>   
>   boolno_hw_access;
>   struct pci_saved_state  *pci_state;
> + pci_channel_state_t cached_state;


I would give a more descriptive name to this (e.g. pci_channel_state) Other 
then that Reviewed-by: Andrey Grodzovsky 

Andrey


>   
>   struct amdgpu_reset_control *reset_cntl;
>   uint32_t
> ip_versions[HW_ID_MAX][HWIP_MAX_INSTANCE];
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index bb5ad2b6ca13..1aaeb4b30edc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5368,6 +5368,8 @@ pci_ers_result_t amdgpu_pci_error_detected(struct 
> pci_dev *pdev, pci_channel_sta
>   return PCI_ERS_RESULT_DISCONNECT;
>   }
>   
> + adev->cached_state = state;
> +
>   switch (state) {
>   case pci_channel_io_normal:
>   return PCI_ERS_RESULT_CAN_RECOVER; @@ -5510,6 +5512,10 @@ void 
> amdgpu_pci_resume(struct pci_dev *pdev)
>   
>   DRM_INFO("PCI error: resume callback!!\n");
>   
> + /* Only continue execution for the case of pci_channel_io_frozen */
> + if (adev->cached_state != pci_channel_io_frozen)
> + return;
> +
>   for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>   struct amdgpu_ring *ring = adev->rings[i];
>

RE: [PATCH] drm/amdgpu/discovery: add missing case for SMU 11.0.5

2021-10-07 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, October 7, 2021 10:06 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH] drm/amdgpu/discovery: add missing case for SMU 11.0.5

Was missed when converting the driver over to IP based initialization.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index daa798c5b882..90d7de17d81c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -700,6 +700,7 @@ static int amdgpu_discovery_set_smu_ip_blocks(struct 
amdgpu_device *adev)
amdgpu_device_ip_block_add(adev, &pp_smu_ip_block);
break;
case IP_VERSION(11, 0, 0):
+   case IP_VERSION(11, 0, 5):
case IP_VERSION(11, 0, 9):
case IP_VERSION(11, 0, 7):
case IP_VERSION(11, 0, 8):
--
2.31.1

RE: [PATCH] drm/amdgpu: query default sclk from smu for cyan_skillfish

2021-10-11 Thread Chen, Guchun

[Public]

Global variable to carry the sclk value looks a bit over-killed. Is it possible 
that move all into cyan_skillfish_od_edit_dpm_table, like querying sclk first 
and setting it to cyan_skillfish_user_settings.sclk?

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Lazar, Lijo
Sent: Monday, October 11, 2021 4:54 PM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray 

Subject: Re: [PATCH] drm/amdgpu: query default sclk from smu for cyan_skillfish



On 10/11/2021 2:01 PM, Lang Yu wrote:
> Query default sclk instead of hard code.
> 
> Signed-off-by: Lang Yu 
> ---
>   .../gpu/drm/amd/pm/swsmu/smu11/cyan_skillfish_ppt.c  | 12 +---
>   1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/cyan_skillfish_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/cyan_skillfish_ppt.c
> index 3d4c65bc29dc..d98fd06a2574 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/cyan_skillfish_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/cyan_skillfish_ppt.c
> @@ -47,7 +47,6 @@
>   /* unit: MHz */
>   #define CYAN_SKILLFISH_SCLK_MIN 1000
>   #define CYAN_SKILLFISH_SCLK_MAX 2000
> -#define CYAN_SKILLFISH_SCLK_DEFAULT  1800
>   
>   /* unit: mV */
>   #define CYAN_SKILLFISH_VDDC_MIN 700
> @@ -59,6 +58,8 @@ static struct gfx_user_settings {
>   uint32_t vddc;
>   } cyan_skillfish_user_settings;
>   
> +static uint32_t cyan_skillfish_sclk_default;
> +
>   #define FEATURE_MASK(feature) (1ULL << feature)
>   #define SMC_DPM_FEATURE ( \
>   FEATURE_MASK(FEATURE_FCLK_DPM_BIT)  |   \
> @@ -365,13 +366,18 @@ static bool cyan_skillfish_is_dpm_running(struct 
> smu_context *smu)
>   return false;
>   
>   ret = smu_cmn_get_enabled_32_bits_mask(smu, feature_mask, 2);
> -
>   if (ret)
>   return false;
>   
>   feature_enabled = (uint64_t)feature_mask[0] |
>   ((uint64_t)feature_mask[1] << 32);
>   
> + /*
> +  * cyan_skillfish specific, query default sclk inseted of hard code.
> +  */
> + cyan_skillfish_get_smu_metrics_data(smu, METRICS_CURR_GFXCLK,
> + &cyan_skillfish_sclk_default);
> +

Maybe add if (!cyan_skillfish_sclk_default) so that it's read only once during 
driver load and not on every suspend/resume.

Reviewed-by: Lijo Lazar 

Thanks,
Lijo

>   return !!(feature_enabled & SMC_DPM_FEATURE);
>   }
>   
> @@ -468,7 +474,7 @@ static int cyan_skillfish_od_edit_dpm_table(struct 
> smu_context *smu,
>   return -EINVAL;
>   }
>   
> - cyan_skillfish_user_settings.sclk = CYAN_SKILLFISH_SCLK_DEFAULT;
> + cyan_skillfish_user_settings.sclk = cyan_skillfish_sclk_default;
>   cyan_skillfish_user_settings.vddc = CYAN_SKILLFISH_VDDC_MAGIC;
>   
>   break;
>

RE: [PATCH] drm/amdgpu: limit VCN instance number to 1 for NAVY_FLOUNDER

2021-10-21 Thread Chen, Guchun

Hi Lijo,

Alex has a following fix "85db7fcb2e53 drm/amdgpu: get VCN harvest information 
from IP discovery table" to fix that logic.

For other ASCIs like DIMGREY_CAVEFISH and BEIGE_GOBY, its instance num is 1, 
match with VBIOS discovery table. So there is no need to handle it.

Regards,
Guchun

-Original Message-
From: Lazar, Lijo  
Sent: Thursday, October 21, 2021 5:45 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Koenig, 
Christian ; Pan, Xinhui ; 
Deucher, Alexander ; Liu, Leo 
Subject: Re: [PATCH] drm/amdgpu: limit VCN instance number to 1 for 
NAVY_FLOUNDER



On 10/21/2021 12:45 PM, Guchun Chen wrote:
> VCN instance 1 is power gated permanently by SMU.
> 
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743
> 
> Fixes: f6b6d7d6bc2d("drm/amdgpu/vcn: remove manual instance setting")

Nice find. Looking at the fix, the logic is already broken by
5e26e52adb46("drm/amdgpu/vcn3.0: convert to IP version checking")

Any ASIC other than Sienna which has same VCN IP version (3.0.0) may be broken. 
Any more extra checks?

Thanks,
Lijo

> Signed-off-by: Guchun Chen 
> ---
>   drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 9 +
>   1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
> b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> index dbfd92984655..4848922667f2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> @@ -103,6 +103,15 @@ static int vcn_v3_0_early_init(void *handle)
>   adev->vcn.num_enc_rings = 0;
>   else
>   adev->vcn.num_enc_rings = 2;
> +
> + /*
> +  * Fix ME.
> +  * VCN instance number is limited to 1 for below ASIC due to
> +  * VCN instnace 1 is permanently power gated.
> +  */
> + if ((adev->ip_versions[UVD_HWIP][0] == IP_VERSION(3, 0, 0)) &&
> + (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 2)))
> + adev->vcn.num_vcn_inst = 1;
>   }
>   
>   vcn_v3_0_set_dec_ring_funcs(adev);
>

RE: [PATCH] drm/amdgpu: limit VCN instance number to 1 for NAVY_FLOUNDER

2021-10-21 Thread Chen, Guchun

Re: But the logic applied in this fix tells that anything in IP discovery 
(version table or harvest table) doesn't solve the problem. This is equivalent 
to an ASIC specific logic similar to old ASIC enum checks.

Exactly, this is the challenge.

Regards,
Guchun

-Original Message-
From: Lazar, Lijo  
Sent: Thursday, October 21, 2021 8:56 PM
To: Chen, Guchun ; Koenig, Christian 
; Pan, Xinhui ; Deucher, 
Alexander ; Liu, Leo ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: limit VCN instance number to 1 for 
NAVY_FLOUNDER



On 10/21/2021 6:10 PM, Chen, Guchun wrote:
> Hi Lijo,
> 
> Alex has a following fix "85db7fcb2e53 drm/amdgpu: get VCN harvest 
> information from IP discovery table" to fix that logic.

But the logic applied in this fix tells that anything in IP discovery (version 
table or harvest table) doesn't solve the problem. This is equivalent to an 
ASIC specific logic similar to old ASIC enum checks.

> 
> For other ASCIs like DIMGREY_CAVEFISH and BEIGE_GOBY, its instance num is 1, 
> match with VBIOS discovery table. So there is no need to handle it.
> 

Thanks for the clarification! It looks good to me, will leave it to 
Alex/Leo/James.

Thanks,
Lijo

> Regards,
> Guchun
> 
> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, October 21, 2021 5:45 PM
> To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; 
> Koenig, Christian ; Pan, Xinhui 
> ; Deucher, Alexander ; 
> Liu, Leo 
> Subject: Re: [PATCH] drm/amdgpu: limit VCN instance number to 1 for 
> NAVY_FLOUNDER
> 
> 
> 
> On 10/21/2021 12:45 PM, Guchun Chen wrote:
>> VCN instance 1 is power gated permanently by SMU.
>>
>> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743
>>
>> Fixes: f6b6d7d6bc2d("drm/amdgpu/vcn: remove manual instance setting")
> 
> Nice find. Looking at the fix, the logic is already broken by
> 5e26e52adb46("drm/amdgpu/vcn3.0: convert to IP version checking")
> 
> Any ASIC other than Sienna which has same VCN IP version (3.0.0) may be 
> broken. Any more extra checks?
> 
> Thanks,
> Lijo
> 
>> Signed-off-by: Guchun Chen 
>> ---
>>drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 9 +
>>1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
>> b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
>> index dbfd92984655..4848922667f2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
>> @@ -103,6 +103,15 @@ static int vcn_v3_0_early_init(void *handle)
>>  adev->vcn.num_enc_rings = 0;
>>  else
>>  adev->vcn.num_enc_rings = 2;
>> +
>> +/*
>> + * Fix ME.
>> + * VCN instance number is limited to 1 for below ASIC due to
>> + * VCN instnace 1 is permanently power gated.
>> + */
>> +if ((adev->ip_versions[UVD_HWIP][0] == IP_VERSION(3, 0, 0)) &&
>> +(adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 2)))
>> +adev->vcn.num_vcn_inst = 1;
>>  }
>>
>>  vcn_v3_0_set_dec_ring_funcs(adev);
>>

RE: [PATCH] drm/amdgpu: limit VCN instance number to 1 for NAVY_FLOUNDER

2021-10-21 Thread Chen, Guchun

Hi Alex,

No, it does not help.

adev->vcn.harvest_config is 0 after retrieving harvest info from VBIOS. Looks 
that harvest info in VBIOs does not reflect the case that VCN1 is power gated.

I checked several navy flounders SKUs, the observation is the same, so this is 
likely a common case. Perhaps we need to check with VBIOS/SMU guys.

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Thursday, October 21, 2021 9:06 PM
To: Chen, Guchun 
Cc: amd-gfx list ; Koenig, Christian 
; Pan, Xinhui ; Deucher, 
Alexander ; Liu, Leo 
Subject: Re: [PATCH] drm/amdgpu: limit VCN instance number to 1 for 
NAVY_FLOUNDER

On Thu, Oct 21, 2021 at 3:15 AM Guchun Chen  wrote:
>
> VCN instance 1 is power gated permanently by SMU.
>
> Bug: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
> ab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1743&data=04%7C01%7C
> guchun.chen%40amd.com%7Cda80a308a28049d543ad08d99493847d%7C3dd8961fe48
> 84e608e11a82d994e183d%7C0%7C0%7C637704183581593964%7CUnknown%7CTWFpbGZ
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C1000&sdata=2vNLj9bXE2oV97rxBiUOiaFNpKopVSJefL%2BMcQE%2BSfo%3D&
> amp;reserved=0
>
> Fixes: f6b6d7d6bc2d("drm/amdgpu/vcn: remove manual instance setting")
> Signed-off-by: Guchun Chen 

Doesn't this patch effectively do the same thing?
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F460329%2F&data=04%7C01%7Cguchun.chen%40amd.com%7Cda80a308a28049d543ad08d99493847d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704183581593964%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jPu3Yh%2B6OHR4F1BS5MWL3VyZ3pui6c0dP97Zl7yBJKY%3D&reserved=0
Where else is num_vcn_inst used that it causes a problem?  Or is the VCN 
harvesting not set correctly on some navy flounders?

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
> b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> index dbfd92984655..4848922667f2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> @@ -103,6 +103,15 @@ static int vcn_v3_0_early_init(void *handle)
> adev->vcn.num_enc_rings = 0;
> else
> adev->vcn.num_enc_rings = 2;
> +
> +   /*
> +* Fix ME.
> +* VCN instance number is limited to 1 for below ASIC due to
> +* VCN instnace 1 is permanently power gated.
> +*/
> +   if ((adev->ip_versions[UVD_HWIP][0] == IP_VERSION(3, 0, 0)) &&
> +   (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 
> 2)))
> +   adev->vcn.num_vcn_inst = 1;
> }
>
> vcn_v3_0_set_dec_ring_funcs(adev);
> --
> 2.17.1
>

RE: [PATCH] drm/amdgpu: limit VCN instance number to 1 for NAVY_FLOUNDER

2021-10-21 Thread Chen, Guchun

Additionally, in sienna_cichlid_dpm_set_vcn_enable, we also use num_vcn_inst to 
set dpm for VCN1 if it's > 1.
The main problem here is VCN harvest info is not set correctly, so 
vcn.harvest_config is not reliable in this case.

if (smu_cmn_feature_is_enabled(smu, SMU_FEATURE_MM_DPM_PG_BIT)) {
ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_PowerUpVcn, 0, NULL);
if (ret)
return ret;
if (adev->vcn.num_vcn_inst > 1) {
ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_PowerUpVcn,
  0x1, 
NULL);
if (ret)
return ret;
}
}

Regards,
Guchun

-Original Message-----
From: Chen, Guchun 
Sent: Thursday, October 21, 2021 9:14 PM
To: Alex Deucher 
Cc: amd-gfx list ; Koenig, Christian 
; Pan, Xinhui ; Deucher, 
Alexander ; Liu, Leo 
Subject: RE: [PATCH] drm/amdgpu: limit VCN instance number to 1 for 
NAVY_FLOUNDER

Hi Alex,

No, it does not help.

adev->vcn.harvest_config is 0 after retrieving harvest info from VBIOS. Looks 
that harvest info in VBIOs does not reflect the case that VCN1 is power gated.

I checked several navy flounders SKUs, the observation is the same, so this is 
likely a common case. Perhaps we need to check with VBIOS/SMU guys.

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Thursday, October 21, 2021 9:06 PM
To: Chen, Guchun 
Cc: amd-gfx list ; Koenig, Christian 
; Pan, Xinhui ; Deucher, 
Alexander ; Liu, Leo 
Subject: Re: [PATCH] drm/amdgpu: limit VCN instance number to 1 for 
NAVY_FLOUNDER

On Thu, Oct 21, 2021 at 3:15 AM Guchun Chen  wrote:
>
> VCN instance 1 is power gated permanently by SMU.
>
> Bug: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
> ab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1743&data=04%7C01%7C
> guchun.chen%40amd.com%7Cda80a308a28049d543ad08d99493847d%7C3dd8961fe48
> 84e608e11a82d994e183d%7C0%7C0%7C637704183581593964%7CUnknown%7CTWFpbGZ
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C1000&sdata=2vNLj9bXE2oV97rxBiUOiaFNpKopVSJefL%2BMcQE%2BSfo%3D&
> amp;reserved=0
>
> Fixes: f6b6d7d6bc2d("drm/amdgpu/vcn: remove manual instance setting")
> Signed-off-by: Guchun Chen 

Doesn't this patch effectively do the same thing?
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F460329%2F&data=04%7C01%7Cguchun.chen%40amd.com%7Cda80a308a28049d543ad08d99493847d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704183581593964%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jPu3Yh%2B6OHR4F1BS5MWL3VyZ3pui6c0dP97Zl7yBJKY%3D&reserved=0
Where else is num_vcn_inst used that it causes a problem?  Or is the VCN 
harvesting not set correctly on some navy flounders?

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
> b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> index dbfd92984655..4848922667f2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
> @@ -103,6 +103,15 @@ static int vcn_v3_0_early_init(void *handle)
> adev->vcn.num_enc_rings = 0;
> else
> adev->vcn.num_enc_rings = 2;
> +
> +   /*
> +* Fix ME.
> +* VCN instance number is limited to 1 for below ASIC due to
> +* VCN instnace 1 is permanently power gated.
> +*/
> +   if ((adev->ip_versions[UVD_HWIP][0] == IP_VERSION(3, 0, 0)) &&
> +   (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 
> 2)))
> +   adev->vcn.num_vcn_inst = 1;
> }
>
> vcn_v3_0_set_dec_ring_funcs(adev);
> --
> 2.17.1
>

RE: [PATCH] drm/amdgpu: limit VCN instance number to 1 for NAVY_FLOUNDER

2021-10-21 Thread Chen, Guchun

Why using asic_type to check this? The issue is caused by the ip discovery 
series, and I thought that series' goal is to remove DID/asic_type as much as 
possible in kernel driver.

+   /* some IP discovery tables on NF don't have this set correctly */
+   if (adev->asic_type == CHIP_NAVY_FLOUNDER)
+   adev->vcn.harvest_config |= AMDGPU_VCN_HARVEST_VCN1;

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Thursday, October 21, 2021 10:02 PM
To: Chen, Guchun 
Cc: amd-gfx list ; Koenig, Christian 
; Pan, Xinhui ; Deucher, 
Alexander ; Liu, Leo 
Subject: Re: [PATCH] drm/amdgpu: limit VCN instance number to 1 for 
NAVY_FLOUNDER

Thanks.  I think this patch set fixes it in a bit more future proof way:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F96132%2F&data=04%7C01%7CGuchun.Chen%40amd.com%7C52fab5ccf8f64b6eb09b08d9949b548f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704217145304873%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2KMrUDLZZ1s3colyVy1WwY4Yz6GbyI9z53qixn%2BuUwQ%3D&reserved=0

Alex

On Thu, Oct 21, 2021 at 9:34 AM Chen, Guchun  wrote:
>
> Additionally, in sienna_cichlid_dpm_set_vcn_enable, we also use num_vcn_inst 
> to set dpm for VCN1 if it's > 1.
> The main problem here is VCN harvest info is not set correctly, so 
> vcn.harvest_config is not reliable in this case.
>
> if (smu_cmn_feature_is_enabled(smu, SMU_FEATURE_MM_DPM_PG_BIT)) {
> ret = smu_cmn_send_smc_msg_with_param(smu, 
> SMU_MSG_PowerUpVcn, 0, NULL);
> if (ret)
> return ret;
> if (adev->vcn.num_vcn_inst > 1) {
> ret = smu_cmn_send_smc_msg_with_param(smu, 
> SMU_MSG_PowerUpVcn,
>   0x1, 
> NULL);
> if (ret)
> return ret;
>     }
> }
>
> Regards,
> Guchun
>
> -Original Message-
> From: Chen, Guchun
> Sent: Thursday, October 21, 2021 9:14 PM
> To: Alex Deucher 
> Cc: amd-gfx list ; Koenig, Christian 
> ; Pan, Xinhui ; Deucher, 
> Alexander ; Liu, Leo 
> Subject: RE: [PATCH] drm/amdgpu: limit VCN instance number to 1 for 
> NAVY_FLOUNDER
>
> Hi Alex,
>
> No, it does not help.
>
> adev->vcn.harvest_config is 0 after retrieving harvest info from VBIOS. Looks 
> that harvest info in VBIOs does not reflect the case that VCN1 is power gated.
>
> I checked several navy flounders SKUs, the observation is the same, so this 
> is likely a common case. Perhaps we need to check with VBIOS/SMU guys.
>
> Regards,
> Guchun
>
> -Original Message-
> From: Alex Deucher 
> Sent: Thursday, October 21, 2021 9:06 PM
> To: Chen, Guchun 
> Cc: amd-gfx list ; Koenig, Christian 
> ; Pan, Xinhui ; Deucher, 
> Alexander ; Liu, Leo 
> Subject: Re: [PATCH] drm/amdgpu: limit VCN instance number to 1 for 
> NAVY_FLOUNDER
>
> On Thu, Oct 21, 2021 at 3:15 AM Guchun Chen  wrote:
> >
> > VCN instance 1 is power gated permanently by SMU.
> >
> > Bug:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > tl 
> > ab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1743&data=04%7C01%
> > 7C
> > guchun.chen%40amd.com%7Cda80a308a28049d543ad08d99493847d%7C3dd8961fe
> > 48 
> > 84e608e11a82d994e183d%7C0%7C0%7C637704183581593964%7CUnknown%7CTWFpb
> > GZ
> > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> > %3 
> > D%7C1000&sdata=2vNLj9bXE2oV97rxBiUOiaFNpKopVSJefL%2BMcQE%2BSfo%3
> > D&
> > amp;reserved=0
> >
> > Fixes: f6b6d7d6bc2d("drm/amdgpu/vcn: remove manual instance 
> > setting")
> > Signed-off-by: Guchun Chen 
>
> Doesn't this patch effectively do the same thing?
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
> hwork.freedesktop.org%2Fpatch%2F460329%2F&data=04%7C01%7CGuchun.Ch
> en%40amd.com%7C52fab5ccf8f64b6eb09b08d9949b548f%7C3dd8961fe4884e608e11
> a82d994e183d%7C0%7C0%7C637704217145304873%7CUnknown%7CTWFpbGZsb3d8eyJW
> IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&
> amp;sdata=EmyT%2BNBnV8rIhJSqncnyFwR94smOvu2AGeb4vESFhdE%3D&reserve
> d=0 Where else is num_vcn_inst used that it causes a problem?  Or is 
> the VCN harvesting not set correctly on some navy flounders?
>
> Alex
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 9 +
> >  1 file changed, 9

RE: [PATCH 1/2] drm/amdgpu: Workaround harvesting info for some navy flounder boards

2021-10-21 Thread Chen, Guchun

[Public]

I will try it.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, October 22, 2021 5:52 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH 1/2] drm/amdgpu: Workaround harvesting info for some navy 
flounder boards

Some navy flounder boards do not properly mark harvested VCN instances.  Fix 
that here.

v2: use IP versions

Fixes: 1b592d00b4ac83 ("drm/amdgpu/vcn: remove manual instance setting")
Bug: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1743&data=04%7C01%7Cguchun.chen%40amd.com%7Ca8087124988c4196bf8008d994dd00a9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704499192475535%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2B96nBOC0y9B%2BKQKAPGQHXcOlbv3EhPtKK93tKIXI3do%3D&reserved=0
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index dfb92f229748..814e9620fac5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -507,6 +507,10 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device 
*adev)
break;
}
}
+   /* some IP discovery tables on Navy Flounder don't have this set 
correctly */
+   if ((adev->ip_versions[UVD_HWIP][1] == IP_VERSION(3, 0, 1)) &&
+   (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 2)))
+   adev->vcn.harvest_config |= AMDGPU_VCN_HARVEST_VCN1;
if (vcn_harvest_count == adev->vcn.num_vcn_inst) {
adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK;
adev->harvest_ip_mask |= AMD_HARVEST_IP_JPEG_MASK;
--
2.31.1

RE: [PATCH 1/2] drm/amdgpu: Workaround harvesting info for some navy flounder boards

2021-10-21 Thread Chen, Guchun

[Public]

This series are: Reviewed-and-tested-by: Guchun Chen  , on 
top of "drm/amdgpu/vcn3.0: handle harvesting in firmware setup".

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Chen, Guchun
Sent: Friday, October 22, 2021 8:21 AM
To: Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: RE: [PATCH 1/2] drm/amdgpu: Workaround harvesting info for some navy 
flounder boards

[Public]

I will try it.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, October 22, 2021 5:52 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH 1/2] drm/amdgpu: Workaround harvesting info for some navy 
flounder boards

Some navy flounder boards do not properly mark harvested VCN instances.  Fix 
that here.

v2: use IP versions

Fixes: 1b592d00b4ac83 ("drm/amdgpu/vcn: remove manual instance setting")
Bug: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1743&data=04%7C01%7Cguchun.chen%40amd.com%7C80a3316b28a64ff87b1e08d994f1d6ec%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704588695008369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Oc66Y7qakmiS3UKOytgTY418mtBRs%2BCEnesLpwTAyIA%3D&reserved=0
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index dfb92f229748..814e9620fac5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -507,6 +507,10 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device 
*adev)
break;
}
}
+   /* some IP discovery tables on Navy Flounder don't have this set 
correctly */
+   if ((adev->ip_versions[UVD_HWIP][1] == IP_VERSION(3, 0, 1)) &&
+   (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 2)))
+   adev->vcn.harvest_config |= AMDGPU_VCN_HARVEST_VCN1;
if (vcn_harvest_count == adev->vcn.num_vcn_inst) {
adev->harvest_ip_mask |= AMD_HARVEST_IP_VCN_MASK;
adev->harvest_ip_mask |= AMD_HARVEST_IP_JPEG_MASK;
--
2.31.1

RE: [PATCH] drm/amdgpu/smu11.0: add missing IP version check

2021-10-21 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, October 22, 2021 11:19 AM
To: Deucher, Alexander 
Cc: amd-gfx list 
Subject: Re: [PATCH] drm/amdgpu/smu11.0: add missing IP version check

Ping?

On Tue, Oct 19, 2021 at 11:31 AM Alex Deucher  wrote:
>
> Add missing check in smu_v11_0_init_display_count(),
>
> Fixes: af3b89d3a639d5 ("drm/amdgpu/smu11.0: convert to IP version checking")
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> index 5c1703cc25fd..28b7c0562b99 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
> @@ -755,6 +755,7 @@ int smu_v11_0_init_display_count(struct smu_context *smu, 
> uint32_t count)
>  */
> if (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(11, 0, 11) ||
> adev->ip_versions[MP1_HWIP][0] == IP_VERSION(11, 5, 0) ||
> +   adev->ip_versions[MP1_HWIP][0] == IP_VERSION(11, 0, 12) ||
> adev->ip_versions[MP1_HWIP][0] == IP_VERSION(11, 0, 13))
> return 0;
>
> --
> 2.31.1
>

RE: [PATCH 1/2] drm/amdgpu/nbio7.4: don't use GPU_HDP_FLUSH bit 12

2021-10-21 Thread Chen, Guchun

[Public]

This patch caused ring test of SDMA failure on Vega20.

Oct 12 00:18:24 vega20-ebd-11 kernel: [   11.900968] IPv6: 
ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.007480] AMD-Vi: AMD IOMMUv2 driver 
by Joerg Roedel 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.007482] AMD-Vi: AMD IOMMUv2 
functionality not available on this system
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.069082] [drm] amdgpu kernel 
modesetting enabled.
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.069226] amdgpu: CRAT table not 
found
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.069229] amdgpu: Virtual CRAT table 
created for CPU
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.069288] amdgpu: Topology: Add CPU 
node
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.069415] checking generic (9000 
30) vs hw (9000 1000)
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.069416] fb0: switching to 
amdgpudrmfb from EFI VGA
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.069700] Console: switching to 
colour dummy device 80x25
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.069755] amdgpu :03:00.0: 
vgaarb: deactivate vga console
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070047] amdgpu :03:00.0: 
enabling device (0006 -> 0007)
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070241] [drm] initializing kernel 
modesetting (VEGA20 0x1002:0x66A1 0x1002:0x081E 0x06).
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070244] amdgpu :03:00.0: 
amdgpu: Trusted Memory Zone (TMZ) feature not supported
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070257] [drm] register mmio base: 
0xA030
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070258] [drm] register mmio size: 
524288
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070263] [drm] add ip block number 
0 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070264] [drm] add ip block number 
1 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070265] [drm] add ip block number 
2 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070266] [drm] add ip block number 
3 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070266] [drm] add ip block number 
4 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070267] [drm] add ip block number 
5 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070267] [drm] add ip block number 
6 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070268] [drm] add ip block number 
7 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070269] [drm] add ip block number 
8 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070269] [drm] add ip block number 
9 
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070286] amdgpu :03:00.0: 
amdgpu: Fetched VBIOS from VFCT
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.070293] amdgpu: ATOM BIOS: 
113-D1640600-103
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072517] [drm] UVD(0) is enabled in 
VM mode
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072519] [drm] UVD(1) is enabled in 
VM mode
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072520] [drm] UVD(0) ENC is 
enabled in VM mode
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072520] [drm] UVD(1) ENC is 
enabled in VM mode
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072521] [drm] VCE enabled in VM 
mode
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072632] amdgpu :03:00.0: 
amdgpu: MEM ECC is active.
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072633] amdgpu :03:00.0: 
amdgpu: SRAM ECC is not presented.
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072651] amdgpu :03:00.0: 
amdgpu: RAS INFO: ras initialized successfully, hardware ability[105] 
ras_mask[105]
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072657] [drm] vm size is 262144 
GB, 4 levels, block size is 9-bit, fragment size is 9-bit
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072668] amdgpu :03:00.0: 
amdgpu: VRAM: 16368M 0x0080 - 0x0083FEFF (16368M used)
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072669] amdgpu :03:00.0: 
amdgpu: GART: 512M 0x - 0x1FFF
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072676] amdgpu :03:00.0: 
amdgpu: AGP: 267894784M 0x0084 - 0x
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072683] [drm] Detected VRAM 
RAM=16368M, BAR=256M
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072684] [drm] RAM width 4096bits 
HBM
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072736] [drm] amdgpu: 16368M of 
VRAM memory ready
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072738] [drm] amdgpu: 16368M of 
GTT memory ready.
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072745] [drm] GART: num cpu pages 
131072, num gpu pages 131072
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072819] [drm] PCIE GART of 512M 
enabled.
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.072820] [drm] PTB located at 
0x00800030
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.075598] amdgpu :03:00.0: 
amdgpu: PSP runtime database doesn't exist
Oct 12 00:18:39 vega20-ebd-11 kernel: [   27.075605] amdgpu: hwmgr_sw_init smu 
backed is vega2

RE: [PATCH] drm/amdgpu/nbio7.4: use original HDP_FLUSH bits for navi1x

2021-10-21 Thread Chen, Guchun

[Public]

Reviewed-and-tested-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, October 22, 2021 12:30 PM
To: Deucher, Alexander 
Cc: amd-gfx list 
Subject: Re: [PATCH] drm/amdgpu/nbio7.4: use original HDP_FLUSH bits for navi1x

On Fri, Oct 22, 2021 at 12:21 AM Alex Deucher  wrote:
>

Copy paste typo in the patch title fixed locally.

> The extended bits were not available for use on vega20 and presumably 
> arcturus as well.
>
> Fixes: a0f9f854666834 ("drm/amdgpu/nbio7.4: don't use GPU_HDP_FLUSH 
> bit 12")
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |  5 -
>  drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c| 15 +++
>  drivers/gpu/drm/amd/amdgpu/nbio_v7_4.h|  1 +
>  3 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index 814e9620fac5..208a784475bd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -1125,10 +1125,13 @@ int amdgpu_discovery_set_ip_blocks(struct 
> amdgpu_device *adev)
> break;
> case IP_VERSION(7, 4, 0):
> case IP_VERSION(7, 4, 1):
> -   case IP_VERSION(7, 4, 4):
> adev->nbio.funcs = &nbio_v7_4_funcs;
> adev->nbio.hdp_flush_reg = &nbio_v7_4_hdp_flush_reg;
> break;
> +   case IP_VERSION(7, 4, 4):
> +   adev->nbio.funcs = &nbio_v7_4_funcs;
> +   adev->nbio.hdp_flush_reg = &nbio_v7_4_hdp_flush_reg_ald;
> +   break;
> case IP_VERSION(7, 2, 0):
> case IP_VERSION(7, 2, 1):
> case IP_VERSION(7, 5, 0):
> diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c 
> b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
> index 3b7775d74bb2..b8bd03d16dba 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
> @@ -325,6 +325,21 @@ static u32 nbio_v7_4_get_pcie_data_offset(struct 
> amdgpu_device *adev)  }
>
>  const struct nbio_hdp_flush_reg nbio_v7_4_hdp_flush_reg = {
> +   .ref_and_mask_cp0 = GPU_HDP_FLUSH_DONE__CP0_MASK,
> +   .ref_and_mask_cp1 = GPU_HDP_FLUSH_DONE__CP1_MASK,
> +   .ref_and_mask_cp2 = GPU_HDP_FLUSH_DONE__CP2_MASK,
> +   .ref_and_mask_cp3 = GPU_HDP_FLUSH_DONE__CP3_MASK,
> +   .ref_and_mask_cp4 = GPU_HDP_FLUSH_DONE__CP4_MASK,
> +   .ref_and_mask_cp5 = GPU_HDP_FLUSH_DONE__CP5_MASK,
> +   .ref_and_mask_cp6 = GPU_HDP_FLUSH_DONE__CP6_MASK,
> +   .ref_and_mask_cp7 = GPU_HDP_FLUSH_DONE__CP7_MASK,
> +   .ref_and_mask_cp8 = GPU_HDP_FLUSH_DONE__CP8_MASK,
> +   .ref_and_mask_cp9 = GPU_HDP_FLUSH_DONE__CP9_MASK,
> +   .ref_and_mask_sdma0 = GPU_HDP_FLUSH_DONE__SDMA0_MASK,
> +   .ref_and_mask_sdma1 = GPU_HDP_FLUSH_DONE__SDMA1_MASK, };
> +
> +const struct nbio_hdp_flush_reg nbio_v7_4_hdp_flush_reg_ald = {
> .ref_and_mask_cp0 = GPU_HDP_FLUSH_DONE__CP0_MASK,
> .ref_and_mask_cp1 = GPU_HDP_FLUSH_DONE__CP1_MASK,
> .ref_and_mask_cp2 = GPU_HDP_FLUSH_DONE__CP2_MASK, diff --git 
> a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.h 
> b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.h
> index b8216581ec8d..cc5692db6f98 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.h
> +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.h
> @@ -27,6 +27,7 @@
>  #include "soc15_common.h"
>
>  extern const struct nbio_hdp_flush_reg nbio_v7_4_hdp_flush_reg;
> +extern const struct nbio_hdp_flush_reg nbio_v7_4_hdp_flush_reg_ald;
>  extern const struct amdgpu_nbio_funcs nbio_v7_4_funcs;  extern const 
> struct amdgpu_nbio_ras_funcs nbio_v7_4_ras_funcs;
>
> --
> 2.31.1
>

RE: [PATCH] drm/amdgpu: correctly toggle gfx on/off around RLC_SPM_* register access

2021-11-03 Thread Chen, Guchun

[Public]

Acked-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Thursday, November 4, 2021 2:20 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Quan, Evan 

Subject: [PATCH] drm/amdgpu: correctly toggle gfx on/off around RLC_SPM_* 
register access

As part of the ib padding process, accessing the RLC_SPM_* register may trigger 
gfx hang. Since gfxoff may be already kicked during the whole period.
To address that, we manually toggle gfx on/off around the RLC_SPM_* register 
access.

This can resolve the gfx hang issue observed on running Talos with RDP launched 
in parallel.

Signed-off-by: Evan Quan 
Change-Id: Ifae152e8151fecd25a238ebe87dffb3b17cdb540
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 5 +  
drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  | 4   
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 4   
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 4 
 4 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index fa03db34aec4..10fc9197602e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -8388,6 +8388,9 @@ static int gfx_v10_0_update_gfx_clock_gating(struct 
amdgpu_device *adev,  static void gfx_v10_0_update_spm_vmid(struct 
amdgpu_device *adev, unsigned vmid)  {
u32 reg, data;
+
+   amdgpu_gfx_off_ctrl(adev, false);
+
/* not for *_SOC15 */
reg = SOC15_REG_OFFSET(GC, 0, mmRLC_SPM_MC_CNTL);
if (amdgpu_sriov_is_pp_one_vf(adev))
@@ -8402,6 +8405,8 @@ static void gfx_v10_0_update_spm_vmid(struct 
amdgpu_device *adev, unsigned vmid)
WREG32_SOC15_NO_KIQ(GC, 0, mmRLC_SPM_MC_CNTL, data);
else
WREG32_SOC15(GC, 0, mmRLC_SPM_MC_CNTL, data);
+
+   amdgpu_gfx_off_ctrl(adev, true);
 }
 
 static bool gfx_v10_0_check_rlcg_range(struct amdgpu_device *adev, diff --git 
a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 37b4a3db6360..d17a6f399347 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -3575,12 +3575,16 @@ static void gfx_v7_0_update_spm_vmid(struct 
amdgpu_device *adev, unsigned vmid)  {
u32 data;
 
+   amdgpu_gfx_off_ctrl(adev, false);
+
data = RREG32(mmRLC_SPM_VMID);
 
data &= ~RLC_SPM_VMID__RLC_SPM_VMID_MASK;
data |= (vmid & RLC_SPM_VMID__RLC_SPM_VMID_MASK) << 
RLC_SPM_VMID__RLC_SPM_VMID__SHIFT;
 
WREG32(mmRLC_SPM_VMID, data);
+
+   amdgpu_gfx_off_ctrl(adev, true);
 }
 
 static void gfx_v7_0_enable_cgcg(struct amdgpu_device *adev, bool enable) diff 
--git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index aefae5b1ff7b..1a476de20d08 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -5727,6 +5727,8 @@ static void gfx_v8_0_update_spm_vmid(struct amdgpu_device 
*adev, unsigned vmid)  {
u32 data;
 
+   amdgpu_gfx_off_ctrl(adev, false);
+
if (amdgpu_sriov_is_pp_one_vf(adev))
data = RREG32_NO_KIQ(mmRLC_SPM_VMID);
else
@@ -5739,6 +5741,8 @@ static void gfx_v8_0_update_spm_vmid(struct amdgpu_device 
*adev, unsigned vmid)
WREG32_NO_KIQ(mmRLC_SPM_VMID, data);
else
WREG32(mmRLC_SPM_VMID, data);
+
+   amdgpu_gfx_off_ctrl(adev, true);
 }
 
 static const struct amdgpu_rlc_funcs iceland_rlc_funcs = { diff --git 
a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 08e91e7245df..d9367747fed3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -5218,6 +5218,8 @@ static void gfx_v9_0_update_spm_vmid(struct amdgpu_device 
*adev, unsigned vmid)  {
u32 reg, data;
 
+   amdgpu_gfx_off_ctrl(adev, false);
+
reg = SOC15_REG_OFFSET(GC, 0, mmRLC_SPM_MC_CNTL);
if (amdgpu_sriov_is_pp_one_vf(adev))
data = RREG32_NO_KIQ(reg);
@@ -5231,6 +5233,8 @@ static void gfx_v9_0_update_spm_vmid(struct amdgpu_device 
*adev, unsigned vmid)
WREG32_SOC15_NO_KIQ(GC, 0, mmRLC_SPM_MC_CNTL, data);
else
WREG32_SOC15(GC, 0, mmRLC_SPM_MC_CNTL, data);
+
+   amdgpu_gfx_off_ctrl(adev, true);
 }
 
 static bool gfx_v9_0_check_rlcg_range(struct amdgpu_device *adev,
--
2.29.0

RE: [PATCH] drm/amdgpu: assign dpms for amdgpu_vkms_crtc_helper_funcs

2021-11-04 Thread Chen, Guchun

[Public]

You need to add a Fix tag in the commit message, and pls document the null 
pointer calltrace as well.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Asher Song
Sent: Friday, November 5, 2021 12:13 PM
To: amd-gfx@lists.freedesktop.org
Cc: Song, Asher 
Subject: [PATCH] drm/amdgpu: assign dpms for amdgpu_vkms_crtc_helper_funcs

To avoid NULL pointer, assign dpms for amdgpu_vkms_crtc_helper_funcs.

Signed-off-by: Asher Song 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 26 +++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index 50bdc39733aa..920b6bc1a9fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -155,8 +155,32 @@ static void amdgpu_vkms_crtc_atomic_flush(struct drm_crtc 
*crtc,
crtc->state->event = NULL;
}
 }
-
+static void amdgpu_vkms_crtc_dpms(struct drm_crtc *crtc, int mode) {
+   struct drm_device *dev = crtc->dev;
+   struct amdgpu_device *adev = drm_to_adev(dev);
+   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
+   unsigned type;
+
+   switch (mode) {
+   case DRM_MODE_DPMS_ON:
+   amdgpu_crtc->enabled = true;
+   /* Make sure VBLANK interrupts are still enabled */
+   type = amdgpu_display_crtc_idx_to_irq_type(adev,
+   amdgpu_crtc->crtc_id);
+   amdgpu_irq_update(adev, &adev->crtc_irq, type);
+   drm_crtc_vblank_on(crtc);
+   break;
+   case DRM_MODE_DPMS_STANDBY:
+   case DRM_MODE_DPMS_SUSPEND:
+   case DRM_MODE_DPMS_OFF:
+   drm_crtc_vblank_off(crtc);
+   amdgpu_crtc->enabled = false;
+   break;
+   }
+}
 static const struct drm_crtc_helper_funcs amdgpu_vkms_crtc_helper_funcs = {
+   .dpms = amdgpu_vkms_crtc_dpms,
.atomic_flush   = amdgpu_vkms_crtc_atomic_flush,
.atomic_enable  = amdgpu_vkms_crtc_atomic_enable,
.atomic_disable = amdgpu_vkms_crtc_atomic_disable,
--
2.25.1

RE: [PATCH] drm/amdgpu: assign dpms for amdgpu_vkms_crtc_helper_funcs

2021-11-05 Thread Chen, Guchun

[Public]

Acked-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Asher Song
Sent: Friday, November 5, 2021 5:41 PM
To: amd-gfx@lists.freedesktop.org
Cc: Song, Asher 
Subject: [PATCH] drm/amdgpu: assign dpms for amdgpu_vkms_crtc_helper_funcs

In drm_helper_disable_unused_functions(), when !crtc->enable is false, a NULL 
pointer crtc_funcs->dpms may occur. To avoid this, assign dpms for 
amdgpu_vkms_crtc_helper_funcs.

 Call Trace:
  __drm_helper_disable_unused_functions+0xac/0xe0 [drm_kms_helper]
  drm_helper_disable_unused_functions+0x38/0x60 [drm_kms_helper]
  amdgpu_fbdev_init+0xf6/0x100 [amdgpu]
  amdgpu_device_init+0x13d4/0x1f10 [amdgpu]

Fixes: ba5317109d0ce ("drm/amdgpu: create amdgpu_vkms (v4)")
Signed-off-by: Asher Song 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 26 
 1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index 50bdc39733aa..9cfe479c4c97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -156,7 +156,33 @@ static void amdgpu_vkms_crtc_atomic_flush(struct drm_crtc 
*crtc,
}
 }
 
+static void amdgpu_vkms_crtc_dpms(struct drm_crtc *crtc, int mode) {
+   struct drm_device *dev = crtc->dev;
+   struct amdgpu_device *adev = drm_to_adev(dev);
+   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
+   unsigned type;
+
+   switch (mode) {
+   case DRM_MODE_DPMS_ON:
+   amdgpu_crtc->enabled = true;
+   /* Make sure VBLANK interrupts are still enabled */
+   type = amdgpu_display_crtc_idx_to_irq_type(adev,
+   amdgpu_crtc->crtc_id);
+   amdgpu_irq_update(adev, &adev->crtc_irq, type);
+   drm_crtc_vblank_on(crtc);
+   break;
+   case DRM_MODE_DPMS_STANDBY:
+   case DRM_MODE_DPMS_SUSPEND:
+   case DRM_MODE_DPMS_OFF:
+   drm_crtc_vblank_off(crtc);
+   amdgpu_crtc->enabled = false;
+   break;
+   }
+}
+
 static const struct drm_crtc_helper_funcs amdgpu_vkms_crtc_helper_funcs = {
+   .dpms = amdgpu_vkms_crtc_dpms,
.atomic_flush   = amdgpu_vkms_crtc_atomic_flush,
.atomic_enable  = amdgpu_vkms_crtc_atomic_enable,
.atomic_disable = amdgpu_vkms_crtc_atomic_disable,
--
2.25.1

RE: [PATCH] drm/amdgpu: add error print when failing to add IP block(v2)

2021-11-10 Thread Chen, Guchun

[Public]

Thanks Lijo and Christian for your review, this patch has been pushed with 
Alex's RB:(

Regards,
Guchun

-Original Message-
From: Lazar, Lijo  
Sent: Thursday, November 11, 2021 3:22 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Deucher, 
Alexander ; Koenig, Christian 
; Pan, Xinhui 
Subject: Re: [PATCH] drm/amdgpu: add error print when failing to add IP 
block(v2)



On 11/11/2021 9:41 AM, Guchun Chen wrote:
> Driver initialization is driven by IP version from IP discovery table. 
> So add error print when failing to add ip block during driver 
> initialization, this will be more friendly to user to know which IP 
> version is not correct.
> 
> [   40.467361] [drm] host supports REQ_INIT_DATA handshake
> [   40.474076] [drm] add ip block number 0 
> [   40.474090] [drm] add ip block number 1 
> [   40.474101] [drm] add ip block number 2 
> [   40.474103] [drm] add ip block number 3 
> [   40.474114] [drm] add ip block number 4 
> [   40.474119] [drm] add ip block number 5 
> [   40.474134] [drm] add ip block number 6 
> [   40.474143] [drm] add ip block number 7 
> [   40.474147] amdgpu :00:08.0: amdgpu: Fatal error during GPU init
> [   40.474545] amdgpu :00:08.0: amdgpu: amdgpu: finishing device.
> 
> v2: use dev_err to multi-GPU system
> 
> Signed-off-by: Guchun Chen 
> Reviewed-by: Alex Deucher 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 36 +++
>   1 file changed, 36 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index ff70bc233489..4e3669407518 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -587,6 +587,9 @@ static int amdgpu_discovery_set_common_ip_blocks(struct 
> amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, &nv_common_ip_block);
>   break;
>   default:
> + dev_err(adev->dev,
> + "Failed to add common ip block(GC_HWIP:0x%x)\n",
> + adev->ip_versions[GC_HWIP][0]);

If not submitted, a minor modification to message (if that sounds
appropriate)- "Found unsupported IP version" or "IP version is not supported 
yet". No need for v3.

Thanks,
Lijo

>   return -EINVAL;
>   }
>   return 0;
> @@ -619,6 +622,9 @@ static int amdgpu_discovery_set_gmc_ip_blocks(struct 
> amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, &gmc_v10_0_ip_block);
>   break;
>   default:
> + dev_err(adev->dev,
> + "Failed to add gmc ip block(GC_HWIP:0x%x)\n",
> + adev->ip_versions[GC_HWIP][0]);
>   return -EINVAL;
>   }
>   return 0;
> @@ -648,6 +654,9 @@ static int amdgpu_discovery_set_ih_ip_blocks(struct 
> amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, &navi10_ih_ip_block);
>   break;
>   default:
> + dev_err(adev->dev,
> + "Failed to add ih ip block(OSSSYS_HWIP:0x%x)\n",
> + adev->ip_versions[OSSSYS_HWIP][0]);
>   return -EINVAL;
>   }
>   return 0;
> @@ -688,6 +697,9 @@ static int amdgpu_discovery_set_psp_ip_blocks(struct 
> amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, &psp_v13_0_ip_block);
>   break;
>   default:
> + dev_err(adev->dev,
> + "Failed to add psp ip block(MP0_HWIP:0x%x)\n",
> + adev->ip_versions[MP0_HWIP][0]);
>   return -EINVAL;
>   }
>   return 0;
> @@ -726,6 +738,9 @@ static int amdgpu_discovery_set_smu_ip_blocks(struct 
> amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, &smu_v13_0_ip_block);
>   break;
>   default:
> + dev_err(adev->dev,
> + "Failed to add smu ip block(MP1_HWIP:0x%x)\n",
> + adev->ip_versions[MP1_HWIP][0]);
>   return -EINVAL;
>   }
>   return 0;
> @@ -753,6 +768,9 @@ static int amdgpu_discovery_set_display_ip_blocks(struct 
> amdgpu_device *adev)
>   amdgpu_device_ip_block_add(adev, &dm_ip_block);
>   break;
>   default:
> + dev_err(adev->dev,
> + "Failed to add dm ip block(DCE_HWIP:0x%x)\n",
> + adev->ip_versions[DCE_HWIP][0]);
>   return

RE: [PATCH] drm/amdgpu: support new mode-1 reset interface (v2)

2021-11-16 Thread Chen, Guchun

[Public]

A coding style problem.

A {} is needed for the path after if (smu_version < 0x00440700).

if (smu_version < 0x00440700)
> + ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset, NULL);
> + else {
> + /* fatal error triggered by ras, PMFW supports the flag
> +from 68.44.0 */
> + if ((smu_version >= 0x00442c00) && ras &&
> + atomic_read(&ras->in_recovery))
> + fatal_err = 1;
> +
> + param |= (fatal_err << 16);
> + ret = smu_cmn_send_smc_msg_with_param(smu,
> + SMU_MSG_GfxDeviceDriverReset, param, 
> NULL);
> + }

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Lazar, Lijo
Sent: Tuesday, November 16, 2021 6:41 PM
To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; Zhang, 
Hawking ; Clements, John ; Yang, 
Stanley ; Quan, Evan ; Wang, 
Yang(Kevin) 
Subject: Re: [PATCH] drm/amdgpu: support new mode-1 reset interface (v2)



On 11/16/2021 3:58 PM, Tao Zhou wrote:
> If gpu reset is triggered by ras fatal error, tell it to smu in mode-1 
> reset message.
> 
> v2: move mode-1 reset function to aldebaran_ppt.c since it's aldebaran 
> specific currently.
> 
> Signed-off-by: Tao Zhou 

Reviewed-by: Lijo Lazar 

Thanks,
Lijo

> ---
>   drivers/gpu/drm/amd/pm/inc/smu_v13_0.h|  3 +-
>   .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 36 ++-
>   .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 21 ---
>   3 files changed, 37 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h 
> b/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h
> index e5d3b0d1a032..bbc608c990b0 100644
> --- a/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h
> +++ b/drivers/gpu/drm/amd/pm/inc/smu_v13_0.h
> @@ -29,6 +29,8 @@
>   #define SMU13_DRIVER_IF_VERSION_YELLOW_CARP 0x04
>   #define SMU13_DRIVER_IF_VERSION_ALDE 0x07
>   
> +#define SMU13_MODE1_RESET_WAIT_TIME_IN_MS 500  //500ms
> +
>   /* MP Apertures */
>   #define MP0_Public  0x0380
>   #define MP0_SRAM0x0390
> @@ -216,7 +218,6 @@ int smu_v13_0_baco_set_state(struct smu_context *smu, 
> enum smu_baco_state state)
>   int smu_v13_0_baco_enter(struct smu_context *smu);
>   int smu_v13_0_baco_exit(struct smu_context *smu);
>   
> -int smu_v13_0_mode1_reset(struct smu_context *smu);
>   int smu_v13_0_mode2_reset(struct smu_context *smu);
>   
>   int smu_v13_0_get_dpm_ultimate_freq(struct smu_context *smu, enum 
> smu_clk_type clk_type, diff --git 
> a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> index 59a7d276541d..e50d4491aa96 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
> @@ -1765,6 +1765,40 @@ static ssize_t aldebaran_get_gpu_metrics(struct 
> smu_context *smu,
>   return sizeof(struct gpu_metrics_v1_3);
>   }
>   
> +static int aldebaran_mode1_reset(struct smu_context *smu) {
> + u32 smu_version, fatal_err, param;
> + int ret = 0;
> + struct amdgpu_device *adev = smu->adev;
> + struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
> +
> + fatal_err = 0;
> + param = SMU_RESET_MODE_1;
> +
> + /*
> + * PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07
> + */
> + smu_cmn_get_smc_version(smu, NULL, &smu_version);
> + if (smu_version < 0x00440700)
> + ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset, NULL);
> + else {
> + /* fatal error triggered by ras, PMFW supports the flag
> +from 68.44.0 */
> + if ((smu_version >= 0x00442c00) && ras &&
> + atomic_read(&ras->in_recovery))
> + fatal_err = 1;
> +
> + param |= (fatal_err << 16);
> + ret = smu_cmn_send_smc_msg_with_param(smu,
> + SMU_MSG_GfxDeviceDriverReset, param, 
> NULL);
> + }
> +
> + if (!ret)
> + msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
> +
> + return ret;
> +}
> +
>   static int aldebaran_mode2_reset(struct smu_context *smu)
>   {
>   u32 smu_version;
> @@ -1925,7 +1959,7 @@ static const struct pptable_funcs aldebaran_ppt_funcs = 
> {
>   .get_gpu_metrics = aldebaran_get_gpu_metrics,
>   .mode1_reset_is_support = aldebaran_is_mode1_reset_supported,
>   .mode2_reset_is_support = aldebaran_is_mode2_reset_supported,
> - .mode1_reset = smu_v13_0_mode1_reset,
> + .mode1_reset = aldebaran_mode1_reset,
>   .set_mp1_state = aldebaran_set_mp1_state,
>   .mode2_reset = aldebaran_mode2_reset,
>   .wait_for_event = smu_v13_0_wait_for_event, diff --git 
> a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> index 35145db6eedf..4d96099a9bb1 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> +++ b/drivers/gpu/drm/amd/pm/sw

RE: [PATCH] drm/amdgpu: update the domain flags for dumb buffer creation

2021-11-18 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Thursday, November 18, 2021 4:27 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Quan, Evan 
; Koenig, Christian 
Subject: [PATCH] drm/amdgpu: update the domain flags for dumb buffer creation

After switching to generic framebuffer framework, we rely on the
->dumb_create routine for frame buffer creation. However, the
different domain flags used are not optimal. Add the contiguous flag to 
directly allocate the scanout BO as one linear buffer.

Fixes: 844612e1149d ("drm/amdgpu: use generic fb helpers instead of setting up 
AMD own's.")

Signed-off-by: Evan Quan 
Reviewed-by: Christian König 
Change-Id: I403bf7a0b265c564b5f3a3343999670e5eb87ca6
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d07b6aebc449..189e32ee7a6e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -883,7 +883,8 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv,
struct drm_gem_object *gobj;
uint32_t handle;
u64 flags = AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
-   AMDGPU_GEM_CREATE_CPU_GTT_USWC;
+   AMDGPU_GEM_CREATE_CPU_GTT_USWC |
+   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
u32 domain;
int r;
 
--
2.29.0

RE: [PATCH 1/2] drm/amdgpu: fix vkms hrtimer settings

2021-11-22 Thread Chen, Guchun

[Public]

Series is:
Reviewed-by: Guchun Chen 

+Alex to comment this series as well.

Regards,
Guchun

-Original Message-
From: Cui, Flora  
Sent: Monday, November 22, 2021 5:04 PM
To: amd-gfx@lists.freedesktop.org; Chen, Guchun 
Cc: Cui, Flora 
Subject: [PATCH 1/2] drm/amdgpu: fix vkms hrtimer settings

otherwise adev->mode_info.crtcs[] is NULL

Signed-off-by: Flora Cui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 38   
drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.h |  5 ++--
 2 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index ce982afeff91..6c62c45e3e3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -16,6 +16,8 @@
 #include "ivsrcid/ivsrcid_vislands30.h"
 #include "amdgpu_vkms.h"
 #include "amdgpu_display.h"
+#include "atom.h"
+#include "amdgpu_irq.h"
 
 /**
  * DOC: amdgpu_vkms
@@ -41,14 +43,13 @@ static const u32 amdgpu_vkms_formats[] = {
 
 static enum hrtimer_restart amdgpu_vkms_vblank_simulate(struct hrtimer *timer) 
 {
-   struct amdgpu_vkms_output *output = container_of(timer,
-struct 
amdgpu_vkms_output,
-vblank_hrtimer);
-   struct drm_crtc *crtc = &output->crtc;
+   struct amdgpu_crtc *amdgpu_crtc = container_of(timer, struct 
amdgpu_crtc, vblank_timer);
+   struct drm_crtc *crtc = &amdgpu_crtc->base;
+   struct amdgpu_vkms_output *output = 
+drm_crtc_to_amdgpu_vkms_output(crtc);
u64 ret_overrun;
bool ret;
 
-   ret_overrun = hrtimer_forward_now(&output->vblank_hrtimer,
+   ret_overrun = hrtimer_forward_now(&amdgpu_crtc->vblank_timer,
  output->period_ns);
WARN_ON(ret_overrun != 1);
 
@@ -65,22 +66,21 @@ static int amdgpu_vkms_enable_vblank(struct drm_crtc *crtc)
unsigned int pipe = drm_crtc_index(crtc);
struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
struct amdgpu_vkms_output *out = drm_crtc_to_amdgpu_vkms_output(crtc);
+   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
 
drm_calc_timestamping_constants(crtc, &crtc->mode);
 
-   hrtimer_init(&out->vblank_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-   out->vblank_hrtimer.function = &amdgpu_vkms_vblank_simulate;
out->period_ns = ktime_set(0, vblank->framedur_ns);
-   hrtimer_start(&out->vblank_hrtimer, out->period_ns, HRTIMER_MODE_REL);
+   hrtimer_start(&amdgpu_crtc->vblank_timer, out->period_ns, 
+HRTIMER_MODE_REL);
 
return 0;
 }
 
 static void amdgpu_vkms_disable_vblank(struct drm_crtc *crtc)  {
-   struct amdgpu_vkms_output *out = drm_crtc_to_amdgpu_vkms_output(crtc);
+   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
 
-   hrtimer_cancel(&out->vblank_hrtimer);
+   hrtimer_cancel(&amdgpu_crtc->vblank_timer);
 }
 
 static bool amdgpu_vkms_get_vblank_timestamp(struct drm_crtc *crtc, @@ -92,13 
+92,14 @@ static bool amdgpu_vkms_get_vblank_timestamp(struct drm_crtc *crtc,
unsigned int pipe = crtc->index;
struct amdgpu_vkms_output *output = 
drm_crtc_to_amdgpu_vkms_output(crtc);
struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
+   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
 
if (!READ_ONCE(vblank->enabled)) {
*vblank_time = ktime_get();
return true;
}
 
-   *vblank_time = READ_ONCE(output->vblank_hrtimer.node.expires);
+   *vblank_time = READ_ONCE(amdgpu_crtc->vblank_timer.node.expires);
 
if (WARN_ON(*vblank_time == vblank->time))
return true;
@@ -165,6 +166,8 @@ static const struct drm_crtc_helper_funcs 
amdgpu_vkms_crtc_helper_funcs = {  static int amdgpu_vkms_crtc_init(struct 
drm_device *dev, struct drm_crtc *crtc,
  struct drm_plane *primary, struct drm_plane *cursor)  
{
+   struct amdgpu_device *adev = drm_to_adev(dev);
+   struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
int ret;
 
ret = drm_crtc_init_with_planes(dev, crtc, primary, cursor, @@ -176,6 
+179,17 @@ static int amdgpu_vkms_crtc_init(struct drm_device *dev, struct 
drm_crtc *crtc,
 
drm_crtc_helper_add(crtc, &amdgpu_vkms_crtc_helper_funcs);
 
+   amdgpu_crtc->crtc_id = drm_crtc_index(crtc);
+   adev->mode_info.crtcs[drm_crtc_index(crtc)] = amdgpu_crtc;
+
+   amdgpu_crtc->pll_id = ATOM_PPLL_INVALID;
+   amdgpu_crtc->encoder = NULL;
+   amdgpu_crtc->connector = NULL;
+   amdgpu_crtc->vsync_timer_enabled = AMDGPU_IRQ_STATE_DISABLE;
+
+   hrtimer_init(&amdgpu_crtc->vbla

RE: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init for SIENNA_CICHLID

2021-11-23 Thread Chen, Guchun

[Public]

Hi Jane/Alex,

Adding a check of new IP in this case looks good to me.

Regards,
Guchun

From: Jian, Jane 
Sent: Wednesday, November 24, 2021 10:54 AM
To: Deucher, Alexander ; Chen, Guchun 
; Chen, JingWen 
Cc: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip 
init for SIENNA_CICHLID


[Public]

Hi Guchun,

Per Alex's suggestion, we would better add a check for new vcn0 IP version, 
which is a version only owned by sriov and a way that I originally did, how do 
you think?

Thanks,
Jane

From: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Sent: Tuesday, November 23, 2021 11:03 PM
To: Jian, Jane mailto:jane.j...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Chen, 
Guchun mailto:guchun.c...@amd.com>>; Chen, JingWen 
mailto:jingwen.ch...@amd.com>>
Subject: Re: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip 
init for SIENNA_CICHLID


[Public]

Can we just add a check for the new IP version in that case?  This looks really 
hacky.

Alex


From: Jane Jian mailto:jane.j...@amd.com>>
Sent: Tuesday, November 23, 2021 6:34 AM
To: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>; Deucher, 
Alexander mailto:alexander.deuc...@amd.com>>; Chen, 
Guchun mailto:guchun.c...@amd.com>>; Chen, JingWen 
mailto:jingwen.ch...@amd.com>>
Cc: Jian, Jane mailto:jane.j...@amd.com>>
Subject: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init 
for SIENNA_CICHLID

[WHY]
for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia 
bandwidth feature),
which will be mismatched with original vcn0 revision

[HOW]
skip ip revision match case and continue use asic type to check

Signed-off-by: Jane Jian mailto:jane.j...@amd.com>>
Change-Id: I1ace32acbf3a13c0baac958508da1324ec387a58
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 5 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 6 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 4e3669407518..0a91e53f520c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -1334,7 +1334,10 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
 return r;
 }

-   r = amdgpu_discovery_set_mm_ip_blocks(adev);
+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev))
+   r = amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
+   else
+   r = amdgpu_discovery_set_mm_ip_blocks(adev);
 if (r)
 return r;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 4f7c70845785..87f56b61be53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -86,6 +86,10 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
 for (i = 0; i < adev->vcn.num_vcn_inst; i++)
 atomic_set(&adev->vcn.inst[i].dpg_enc_submission_cnt, 0);

+   if (adev->asic_type == CHIP_SIENNA_CICHLID && amdgpu_sriov_vf(adev)) {
+   fw_name = FIRMWARE_SIENNA_CICHLID;
+   goto next;
+   }
 switch (adev->ip_versions[UVD_HWIP][0]) {
 case IP_VERSION(1, 0, 0):
 case IP_VERSION(1, 0, 1):
@@ -168,6 +172,8 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
 return -EINVAL;
 }

+next:
+
 r = request_firmware(&adev->vcn.fw, fw_name, adev->dev);
 if (r) {
 dev_err(adev->dev, "amdgpu_vcn: Can't load firmware \"%s\"\n",
--
2.17.1

RE: [PATCH] drm/amdgpu: fix byteorder error in amdgpu discovery

2021-11-23 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Yang Wang
Sent: Wednesday, November 24, 2021 12:37 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Lazar, Lijo 
; Wang, Yang(Kevin) ; Zhang, 
Hawking 
Subject: [PATCH] drm/amdgpu: fix byteorder error in amdgpu discovery

fix some byteorder issues about amdgpu discovery.
This will result in running errors on the big end system. (e.g:MIPS)

Signed-off-by: Yang Wang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 4e3669407518..503995c7ff6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -248,8 +248,8 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
 
offset = offsetof(struct binary_header, binary_checksum) +
sizeof(bhdr->binary_checksum);
-   size = bhdr->binary_size - offset;
-   checksum = bhdr->binary_checksum;
+   size = le16_to_cpu(bhdr->binary_size) - offset;
+   checksum = le16_to_cpu(bhdr->binary_checksum);
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
  size, checksum)) {
@@ -270,7 +270,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
}
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
- ihdr->size, checksum)) {
+ le16_to_cpu(ihdr->size), 
checksum)) {
DRM_ERROR("invalid ip discovery data table checksum\n");
r = -EINVAL;
goto out;
@@ -282,7 +282,7 @@ static int amdgpu_discovery_init(struct amdgpu_device *adev)
ghdr = (struct gpu_info_header *)(adev->mman.discovery_bin + offset);
 
if (!amdgpu_discovery_verify_checksum(adev->mman.discovery_bin + offset,
- ghdr->size, checksum)) {
+ le32_to_cpu(ghdr->size), 
checksum)) {
DRM_ERROR("invalid gc data table checksum\n");
r = -EINVAL;
goto out;
@@ -489,10 +489,10 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device 
*adev)
le16_to_cpu(bhdr->table_list[HARVEST_INFO].offset));
 
for (i = 0; i < 32; i++) {
-   if (le32_to_cpu(harvest_info->list[i].hw_id) == 0)
+   if (le16_to_cpu(harvest_info->list[i].hw_id) == 0)
break;
 
-   switch (le32_to_cpu(harvest_info->list[i].hw_id)) {
+   switch (le16_to_cpu(harvest_info->list[i].hw_id)) {
case VCN_HWID:
vcn_harvest_count++;
if (harvest_info->list[i].number_instance == 0)
-- 
2.25.1

RE: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init for SIENNA_CICHLID

2021-11-24 Thread Chen, Guchun

[Public]

It's better to move 'case IP_VERSION(3, 0, 192)' after IP_VERSION(3, 0, 192)?

case IP_VERSION(3, 1, 1):
case IP_VERSION(3, 0, 2):
+ case IP_VERSION(3, 0, 192):
amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
if (!amdgpu_sriov_vf(adev))
amdgpu_device_ip_block_add(adev, &jpeg_v3_0_ip_block);
break;

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Wednesday, November 24, 2021 10:23 PM
To: Jian, Jane 
Cc: Deucher, Alexander ; Chen, Guchun 
; Chen, JingWen ; amd-gfx list 

Subject: Re: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip 
init for SIENNA_CICHLID

On Wed, Nov 24, 2021 at 9:20 AM Jane Jian  wrote:
>
> [WHY]
> for sriov odd# vf will modify vcn0 engine ip revision(due to 
> multimedia bandwidth feature), which will be mismatched with original 
> vcn0 revision
>
> [HOW]
> add new version check for vcn0 disabled revision(3, 0, 192), typically 
> modified under sriov mode
>
> Signed-off-by: Jane Jian 

Reviewed-by: Alex Deucher 

> Change-Id: I1ace32acbf3a13c0baac958508da1324ec387a58
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index 503995c7ff6c..3f9b7b0bab3c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -923,6 +923,7 @@ static int amdgpu_discovery_set_mm_ip_blocks(struct 
> amdgpu_device *adev)
> amdgpu_device_ip_block_add(adev, 
> &jpeg_v3_0_ip_block);
> break;
> case IP_VERSION(3, 0, 33):
> +   case IP_VERSION(3, 0, 192):
> amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
> break;
> default:
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index 4f7c70845785..585961c2f5f2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -135,6 +135,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
> break;
> case IP_VERSION(3, 0, 0):
> case IP_VERSION(3, 0, 64):
> +   case IP_VERSION(3, 0, 192):
> if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 0))
> fw_name = FIRMWARE_SIENNA_CICHLID;
> else
> --
> 2.17.1
>

RE: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init for SIENNA_CICHLID

2021-11-24 Thread Chen, Guchun

[Public]

A typo.

It's better to move 'case IP_VERSION(3, 0, 192)' after IP_VERSION(3, 0, 2)?

case IP_VERSION(3, 1, 1):
case IP_VERSION(3, 0, 2):
+ case IP_VERSION(3, 0, 192):
amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
if (!amdgpu_sriov_vf(adev))
amdgpu_device_ip_block_add(adev, &jpeg_v3_0_ip_block);
break;

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Chen, Guchun
Sent: Thursday, November 25, 2021 10:19 AM
To: Alex Deucher ; Jian, Jane 
Cc: Deucher, Alexander ; Chen, JingWen 
; amd-gfx list 
Subject: RE: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip 
init for SIENNA_CICHLID

[Public]

It's better to move 'case IP_VERSION(3, 0, 192)' after IP_VERSION(3, 0, 192)?

case IP_VERSION(3, 1, 1):
case IP_VERSION(3, 0, 2):
+ case IP_VERSION(3, 0, 192):
amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
if (!amdgpu_sriov_vf(adev))
amdgpu_device_ip_block_add(adev, &jpeg_v3_0_ip_block);
break;

Regards,
Guchun

-Original Message-
From: Alex Deucher 
Sent: Wednesday, November 24, 2021 10:23 PM
To: Jian, Jane 
Cc: Deucher, Alexander ; Chen, Guchun 
; Chen, JingWen ; amd-gfx list 

Subject: Re: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip 
init for SIENNA_CICHLID

On Wed, Nov 24, 2021 at 9:20 AM Jane Jian  wrote:
>
> [WHY]
> for sriov odd# vf will modify vcn0 engine ip revision(due to 
> multimedia bandwidth feature), which will be mismatched with original
> vcn0 revision
>
> [HOW]
> add new version check for vcn0 disabled revision(3, 0, 192), typically 
> modified under sriov mode
>
> Signed-off-by: Jane Jian 

Reviewed-by: Alex Deucher 

> Change-Id: I1ace32acbf3a13c0baac958508da1324ec387a58
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index 503995c7ff6c..3f9b7b0bab3c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -923,6 +923,7 @@ static int amdgpu_discovery_set_mm_ip_blocks(struct 
> amdgpu_device *adev)
> amdgpu_device_ip_block_add(adev, 
> &jpeg_v3_0_ip_block);
> break;
> case IP_VERSION(3, 0, 33):
> +   case IP_VERSION(3, 0, 192):
> amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
> break;
> default:
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index 4f7c70845785..585961c2f5f2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -135,6 +135,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
> break;
> case IP_VERSION(3, 0, 0):
> case IP_VERSION(3, 0, 64):
> +   case IP_VERSION(3, 0, 192):
> if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 0))
> fw_name = FIRMWARE_SIENNA_CICHLID;
> else
> --
> 2.17.1
>

RE: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init for SIENNA_CICHLID

2021-11-24 Thread Chen, Guchun

[Public]

I guess you need to add this IP version in nv_query_video_codecs as well.

With above clarified/fixed, this patch is:

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Jane Jian  
Sent: Thursday, November 25, 2021 11:15 AM
To: Deucher, Alexander ; Chen, Guchun 
; Chen, JingWen 
Cc: amd-gfx@lists.freedesktop.org; Jian, Jane 
Subject: [PATCH] drm/amdgpu/sriov/vcn: skip ip revision check case to ip init 
for SIENNA_CICHLID

[WHY]
for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia 
bandwidth feature), which will be mismatched with original vcn0 revision

[HOW]
add new version check for vcn0 disabled revision(3, 0, 192), typically modified 
under sriov mode

Signed-off-by: Jane Jian 
Change-Id: I1ace32acbf3a13c0baac958508da1324ec387a58
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 503995c7ff6c..f6fae79203ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -918,6 +918,7 @@ static int amdgpu_discovery_set_mm_ip_blocks(struct 
amdgpu_device *adev)
case IP_VERSION(3, 0, 64):
case IP_VERSION(3, 1, 1):
case IP_VERSION(3, 0, 2):
+   case IP_VERSION(3, 0, 192):
amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
if (!amdgpu_sriov_vf(adev))
amdgpu_device_ip_block_add(adev, 
&jpeg_v3_0_ip_block); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 4f7c70845785..585961c2f5f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -135,6 +135,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
break;
case IP_VERSION(3, 0, 0):
case IP_VERSION(3, 0, 64):
+   case IP_VERSION(3, 0, 192):
if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 0))
fw_name = FIRMWARE_SIENNA_CICHLID;
else
--
2.17.1

RE: [PATCH] drm/amd/pm: Add warning for unexpected PG requests

2021-11-25 Thread Chen, Guchun

[Public]

Use dev_warn to be mGPU friendly?

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Lijo Lazar
Sent: Thursday, November 25, 2021 7:51 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Limonciello, Mario 
; Zhang, Hawking 
Subject: [PATCH] drm/amd/pm: Add warning for unexpected PG requests

Ideally power gate/ungate requests shouldn't come when smu block is 
uninitialized. Add a WARN message to check the origins if such a thing ever 
happens.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index e156add7b560..e0f8ab8be975 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -277,8 +277,11 @@ static int smu_dpm_set_power_gate(void *handle,
struct smu_context *smu = handle;
int ret = 0;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled) {
+   WARN(true, "SMU uninitialized but power %s requested for %u!\n",
+gate ? "gate" : "ungate", block_type);
return -EOPNOTSUPP;
+   }
 
switch (block_type) {
/*
--
2.25.1

RE: [PATCH] drm/amd/pm: Add warning for unexpected PG requests

2021-11-25 Thread Chen, Guchun

[Public]

Thanks for clarification, Lijo.

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Lazar, Lijo  
Sent: Thursday, November 25, 2021 9:32 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Limonciello, Mario 
; Zhang, Hawking 
Subject: Re: [PATCH] drm/amd/pm: Add warning for unexpected PG requests



On 11/25/2021 6:52 PM, Chen, Guchun wrote:
> [Public]
> 
> Use dev_warn to be mGPU friendly?

The intention is to get a trace as well along with that. There are multiple 
paths to this function.

Thanks,
Lijo

> 
> Regards,
> Guchun
> 
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Lijo Lazar
> Sent: Thursday, November 25, 2021 7:51 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Limonciello, Mario 
> ; Zhang, Hawking 
> Subject: [PATCH] drm/amd/pm: Add warning for unexpected PG requests
> 
> Ideally power gate/ungate requests shouldn't come when smu block is 
> uninitialized. Add a WARN message to check the origins if such a thing ever 
> happens.
> 
> Signed-off-by: Lijo Lazar 
> ---
>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 5 -
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> index e156add7b560..e0f8ab8be975 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> @@ -277,8 +277,11 @@ static int smu_dpm_set_power_gate(void *handle,
>   struct smu_context *smu = handle;
>   int ret = 0;
>   
> - if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
> + if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled) {
> + WARN(true, "SMU uninitialized but power %s requested for %u!\n",
> +  gate ? "gate" : "ungate", block_type);
>   return -EOPNOTSUPP;
> + }
>   
>   switch (block_type) {
>   /*
> --
> 2.25.1
>

RE: [PATCH v2] drm/amd/pm: Add warning for unexpected PG requests

2021-11-25 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Lijo Lazar
Sent: Friday, November 26, 2021 1:25 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Limonciello, Mario 
; Zhang, Hawking 
Subject: [PATCH v2] drm/amd/pm: Add warning for unexpected PG requests

v1: Ideally power gate/ungate requests shouldn't come when smu block is 
uninitialized. Add a WARN message to check the origins if such a thing ever 
happens.

v2: Use dev_WARN to log device info (Felix/Guchun).

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index e156add7b560..ea99afb38d2b 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -277,8 +277,12 @@ static int smu_dpm_set_power_gate(void *handle,
struct smu_context *smu = handle;
int ret = 0;
 
-   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled)
+   if (!smu->pm_enabled || !smu->adev->pm.dpm_enabled) {
+   dev_WARN(smu->adev->dev,
+"SMU uninitialized but power %s requested for %u!\n",
+gate ? "gate" : "ungate", block_type);
return -EOPNOTSUPP;
+   }
 
switch (block_type) {
/*
--
2.25.1

RE: [PATCH] drm/amdgpu: Use MAX_HWIP instead of HW_ID_MAX

2021-11-25 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Lijo Lazar
Sent: Friday, November 26, 2021 2:43 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Hawking 

Subject: [PATCH] drm/amdgpu: Use MAX_HWIP instead of HW_ID_MAX

HW_ID_MAX considers HWID of all IPs, far more than what amdgpu uses.
amdgpu tracks only the IPs defined by amd_hw_ip_block_type whose max is 
MAX_HWIP.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index b85b67a88a3d..c5cfe2926ca1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1096,7 +1096,7 @@ struct amdgpu_device {
pci_channel_state_t pci_channel_state;
 
struct amdgpu_reset_control *reset_cntl;
-   uint32_t
ip_versions[HW_ID_MAX][HWIP_MAX_INSTANCE];
+   uint32_t
ip_versions[MAX_HWIP][HWIP_MAX_INSTANCE];
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
--
2.25.1

RE: [PATCH V2 02/17] drm/amd/pm: do not expose power implementation details to amdgpu_pm.c

2021-11-30 Thread Chen, Guchun

[Public]

Two nit-picks.

1. It's better to drop "return" in amdgpu_dpm_get_current_power_state.

2. In some functions, when function pointer is NULL, sometimes it returns 0, 
while in other cases, it returns -EOPNOTSUPP. Is there any cause for this?

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Tuesday, November 30, 2021 3:43 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Lazar, Lijo 
; Feng, Kenneth ; Koenig, Christian 
; Quan, Evan 
Subject: [PATCH V2 02/17] drm/amd/pm: do not expose power implementation 
details to amdgpu_pm.c

amdgpu_pm.c holds all the user sysfs/hwmon interfaces. It's another
client of our power APIs. It's not proper to spike into power
implementation details there.

Signed-off-by: Evan Quan 
Change-Id: I397853ddb13eacfce841366de2a623535422df9a
---
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 458 ++-
 drivers/gpu/drm/amd/pm/amdgpu_pm.c| 519 --
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   | 160 +++
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |   3 -
 4 files changed, 709 insertions(+), 431 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 9b332c8a0079..3c59f16c7a6f 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -1453,7 +1453,9 @@ static void amdgpu_dpm_change_power_state_locked(struct 
amdgpu_device *adev)
if (equal)
return;
 
-   amdgpu_dpm_set_power_state(adev);
+   if (adev->powerplay.pp_funcs->set_power_state)
+   
adev->powerplay.pp_funcs->set_power_state(adev->powerplay.pp_handle);
+
amdgpu_dpm_post_set_power_state(adev);
 
adev->pm.dpm.current_active_crtcs = adev->pm.dpm.new_active_crtcs;
@@ -1709,3 +1711,457 @@ int amdgpu_dpm_get_ecc_info(struct amdgpu_device *adev,
 
return smu_get_ecc_info(&adev->smu, umc_ecc);
 }
+
+struct amd_vce_state *amdgpu_dpm_get_vce_clock_state(struct amdgpu_device 
*adev,
+uint32_t idx)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+
+   if (!pp_funcs->get_vce_clock_state)
+   return NULL;
+
+   return pp_funcs->get_vce_clock_state(adev->powerplay.pp_handle,
+idx);
+}
+
+void amdgpu_dpm_get_current_power_state(struct amdgpu_device *adev,
+   enum amd_pm_state_type *state)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+
+   if (!pp_funcs->get_current_power_state) {
+   *state = adev->pm.dpm.user_state;
+   return;
+   }
+
+   *state = pp_funcs->get_current_power_state(adev->powerplay.pp_handle);
+   if (*state < POWER_STATE_TYPE_DEFAULT ||
+   *state > POWER_STATE_TYPE_INTERNAL_3DPERF)
+   *state = adev->pm.dpm.user_state;
+
+   return;
+}
+
+void amdgpu_dpm_set_power_state(struct amdgpu_device *adev,
+   enum amd_pm_state_type state)
+{
+   adev->pm.dpm.user_state = state;
+
+   if (adev->powerplay.pp_funcs->dispatch_tasks)
+   amdgpu_dpm_dispatch_task(adev, AMD_PP_TASK_ENABLE_USER_STATE, 
&state);
+   else
+   amdgpu_pm_compute_clocks(adev);
+}
+
+enum amd_dpm_forced_level amdgpu_dpm_get_performance_level(struct 
amdgpu_device *adev)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+   enum amd_dpm_forced_level level;
+
+   if (pp_funcs->get_performance_level)
+   level = 
pp_funcs->get_performance_level(adev->powerplay.pp_handle);
+   else
+   level = adev->pm.dpm.forced_level;
+
+   return level;
+}
+
+int amdgpu_dpm_force_performance_level(struct amdgpu_device *adev,
+  enum amd_dpm_forced_level level)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+
+   if (pp_funcs->force_performance_level) {
+   if (adev->pm.dpm.thermal_active)
+   return -EINVAL;
+
+   if (pp_funcs->force_performance_level(adev->powerplay.pp_handle,
+ level))
+   return -EINVAL;
+   }
+
+   adev->pm.dpm.forced_level = level;
+
+   return 0;
+}
+
+int amdgpu_dpm_get_pp_num_states(struct amdgpu_device *adev,
+struct pp_states_info *states)
+{
+   const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
+
+   if (!pp_funcs->get_pp_num_states)
+   return -EOPNOTSUPP;
+
+   return pp_funcs->get_pp_num_states(adev->powerplay.pp_handle, states);
+}
+
+int amdgpu_dpm_dispatch_task(struct amdgpu_device *adev,
+ enum amd_pp_task task_id,
+ enum amd_pm_state_type *user_state)
+{
+   const struct amd_pm_fun

RE: [PATCH] drm/amdgpu: handle SRIOV VCN revision parsing

2021-12-01 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, December 2, 2021 5:36 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH] drm/amdgpu: handle SRIOV VCN revision parsing

For SR-IOV, the IP discovery revision number encodes additional information.  
Handle that case here.

v2: drop additional IP versions

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 17 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   |  2 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/nv.c   |  2 --
 4 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index ea00090b3fb3..552031950518 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -379,8 +379,21 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev)
  ip->major, ip->minor,
  ip->revision);
 
-   if (le16_to_cpu(ip->hw_id) == VCN_HWID)
+   if (le16_to_cpu(ip->hw_id) == VCN_HWID) {
+   if (amdgpu_sriov_vf(adev)) {
+   /* SR-IOV modifies each VCN’s revision 
(uint8)
+* Bit [5:0]: original revision value
+* Bit [7:6]: en/decode capability:
+* 0b00 : VCN function normally
+* 0b10 : encode is disabled
+* 0b01 : decode is disabled
+*/
+   
adev->vcn.sriov_config[adev->vcn.num_vcn_inst] =
+   (ip->revision & 0xc0) >> 6;
+   ip->revision &= ~0xc0;
+   }
adev->vcn.num_vcn_inst++;
+   }
if (le16_to_cpu(ip->hw_id) == SDMA0_HWID ||
le16_to_cpu(ip->hw_id) == SDMA1_HWID ||
le16_to_cpu(ip->hw_id) == SDMA2_HWID || @@ -917,10 
+930,8 @@ static int amdgpu_discovery_set_mm_ip_blocks(struct amdgpu_device 
*adev)
break;
case IP_VERSION(3, 0, 0):
case IP_VERSION(3, 0, 16):
-   case IP_VERSION(3, 0, 64):
case IP_VERSION(3, 1, 1):
case IP_VERSION(3, 0, 2):
-   case IP_VERSION(3, 0, 192):
amdgpu_device_ip_block_add(adev, &vcn_v3_0_ip_block);
if (!amdgpu_sriov_vf(adev))
amdgpu_device_ip_block_add(adev, 
&jpeg_v3_0_ip_block); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 585961c2f5f2..2658414c503d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -134,8 +134,6 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
adev->vcn.indirect_sram = true;
break;
case IP_VERSION(3, 0, 0):
-   case IP_VERSION(3, 0, 64):
-   case IP_VERSION(3, 0, 192):
if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 0))
fw_name = FIRMWARE_SIENNA_CICHLID;
else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
index bfa27ea94804..938a5ead3f20 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
@@ -235,6 +235,7 @@ struct amdgpu_vcn {
 
uint8_t num_vcn_inst;
struct amdgpu_vcn_inst   inst[AMDGPU_MAX_VCN_INSTANCES];
+   uint8_t  sriov_config[AMDGPU_MAX_VCN_INSTANCES];
struct amdgpu_vcn_reginternal;
struct mutex vcn_pg_lock;
struct mutexvcn1_jpeg1_workaround;
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c 
index 2ec1ffb36b1f..7088528079c6 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -182,8 +182,6 @@ static int nv_query_video_codecs(struct amdgpu_device 
*adev, bool encode,  {
switch (adev->ip_versions[UVD_HWIP][0]) {
case IP_VERSION(3, 0, 0):
-   case IP_VERSION(3, 0, 64):
-   case IP_VERSION(3, 0, 192):
if (amdgpu_sriov_vf(adev)) {
if (encode)
*codecs = &sriov_sc_video_codecs_encode;
--
2.31.1

RE: [PATCH] drm/amdgpu: fix drm_plane alloc in amdgpu_vkms

2021-12-07 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Cui, Flora  
Sent: Tuesday, December 7, 2021 3:12 PM
To: Chen, Guchun ; Yuan, Perry ; Shi, 
Leslie ; amd-gfx@lists.freedesktop.org
Cc: Cui, Flora 
Subject: [PATCH] drm/amdgpu: fix drm_plane alloc in amdgpu_vkms

otherwise the drm_plane is not released

Signed-off-by: Flora Cui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 21 -
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index af3a2f8c12b4..0bf697b72ad0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -392,19 +392,14 @@ static struct drm_plane *amdgpu_vkms_plane_init(struct 
drm_device *dev,
struct drm_plane *plane;
int ret;
 
-   plane = kzalloc(sizeof(*plane), GFP_KERNEL);
-   if (!plane)
-   return ERR_PTR(-ENOMEM);
-
-   ret = drm_universal_plane_init(dev, plane, 1 << index,
-  &amdgpu_vkms_plane_funcs,
-  amdgpu_vkms_formats,
-  ARRAY_SIZE(amdgpu_vkms_formats),
-  NULL, type, NULL);
-   if (ret) {
-   kfree(plane);
-   return ERR_PTR(ret);
-   }
+   plane = __drmm_universal_plane_alloc(dev, sizeof(*plane), 0, 1 << index,
+  &amdgpu_vkms_plane_funcs,
+  amdgpu_vkms_formats,
+  ARRAY_SIZE(amdgpu_vkms_formats),
+  NULL, type, NULL);
+
+   if (IS_ERR(plane))
+   return plane;
 
drm_plane_helper_add(plane, &amdgpu_vkms_primary_helper_funcs);
 
-- 
2.25.1

RE: [PATCH] drm/amdgpu: fix incorrect VCN revision in SRIOV

2021-12-08 Thread Chen, Guchun

[Public]

Hi Leslie,

Can we move revision handling in this patch into 
amdgpu_discovery_get_vcn_version? Then we will maintain all revision handlings 
only in amdgpu_discovery.c.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Leslie Shi
Sent: Wednesday, December 8, 2021 4:46 PM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: fix incorrect VCN revision in SRIOV

Guest OS will setup VCN instance 1 which is disabled as an enabled instance.
This will cause VCN ib ring test failure during modprobe.

Fixes: 36b7d5646476 ("drm/amdgpu: handle SRIOV VCN revision parsing")
Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 2658414c503d..2323815ac32d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -293,6 +293,9 @@ bool amdgpu_vcn_is_disabled_vcn(struct amdgpu_device *adev, 
enum vcn_ring_type t
if (amdgpu_discovery_get_vcn_version(adev, vcn_instance, &major, 
&minor, &revision) != 0)
return true;
 
+   if (amdgpu_sriov_vf(adev))
+   revision |= adev->vcn.sriov_config[vcn_instance] << 6;
+
if ((type == VCN_ENCODE_RING) && (revision & 
VCN_BLOCK_ENCODE_DISABLE_MASK)) {
ret = true;
} else if ((type == VCN_DECODE_RING) && (revision & 
VCN_BLOCK_DECODE_DISABLE_MASK)) {
-- 
2.25.1

RE: [PATCH] drm/amdgpu: add modifiers in amdgpu_vkms_plane_init()

2021-12-09 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Leslie Shi
Sent: Wednesday, December 8, 2021 4:46 PM
To: amd-gfx@lists.freedesktop.org
Subject: [PATCH] drm/amdgpu: add modifiers in amdgpu_vkms_plane_init()

Fix following warning in SRIOV during modprobe:

amdgpu :00:08.0: GFX9+ requires FB check based on format modifier
WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 
amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]

Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index af3a2f8c12b4..03a13771a9f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -390,6 +390,7 @@ static struct drm_plane *amdgpu_vkms_plane_init(struct 
drm_device *dev,
int index)
 {
struct drm_plane *plane;
+   uint64_t modifiers[] = {DRM_FORMAT_MOD_LINEAR, 
+DRM_FORMAT_MOD_INVALID};
int ret;
 
plane = kzalloc(sizeof(*plane), GFP_KERNEL); @@ -400,7 +401,7 @@ static 
struct drm_plane *amdgpu_vkms_plane_init(struct drm_device *dev,
   &amdgpu_vkms_plane_funcs,
   amdgpu_vkms_formats,
   ARRAY_SIZE(amdgpu_vkms_formats),
-  NULL, type, NULL);
+  modifiers, type, NULL);
if (ret) {
kfree(plane);
return ERR_PTR(ret);
--
2.25.1

RE: [PATCH v3] drm/amdgpu: fix incorrect VCN revision in SRIOV

2021-12-09 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Shi, Leslie  
Sent: Thursday, December 9, 2021 4:27 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Shi, Leslie 
Subject: [PATCH v3] drm/amdgpu: fix incorrect VCN revision in SRIOV

Guest OS will setup VCN instance 1 which is disabled as an enabled instance and 
execute initialization work on it, but this causes VCN ib ring test failure on 
the disabled VCN instance during modprobe:

amdgpu :00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 1 amdgpu 
:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on 
vcn_dec_0 (-110).
amdgpu :00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed 
on vcn_enc_0.0 (-110).
[drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test 
failed (-110).

v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to vcn_config
v3: modify VCN's revision in SR-IOV and bare-metal

Fixes: 36b7d5646476 ("drm/amdgpu: handle SRIOV VCN revision parsing")
Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 29 ++-  
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h |  2 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 15 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h   |  2 +-
 4 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 552031950518..f31bc0187394 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -380,18 +380,15 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev)
  ip->revision);
 
if (le16_to_cpu(ip->hw_id) == VCN_HWID) {
-   if (amdgpu_sriov_vf(adev)) {
-   /* SR-IOV modifies each VCNâ€™s 
revision (uint8)
-* Bit [5:0]: original revision value
-* Bit [7:6]: en/decode capability:
-* 0b00 : VCN function normally
-* 0b10 : encode is disabled
-* 0b01 : decode is disabled
-*/
-   
adev->vcn.sriov_config[adev->vcn.num_vcn_inst] =
-   (ip->revision & 0xc0) >> 6;
-   ip->revision &= ~0xc0;
-   }
+   /* Bit [5:0]: original revision value
+* Bit [7:6]: en/decode capability:
+* 0b00 : VCN function normally
+* 0b10 : encode is disabled
+* 0b01 : decode is disabled
+*/
+   adev->vcn.vcn_config[adev->vcn.num_vcn_inst] =
+   ip->revision & 0xc0;
+   ip->revision &= ~0xc0;
adev->vcn.num_vcn_inst++;
}
if (le16_to_cpu(ip->hw_id) == SDMA0_HWID || @@ -485,14 
+482,6 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device *adev, int 
hw_id, int n
return -EINVAL;
 }
 
-
-int amdgpu_discovery_get_vcn_version(struct amdgpu_device *adev, int 
vcn_instance,
-int *major, int *minor, int *revision)
-{
-   return amdgpu_discovery_get_ip_version(adev, VCN_HWID,
-  vcn_instance, major, minor, 
revision);
-}
-
 void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev)  {
struct binary_header *bhdr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
index 0ea029e3b850..14537cec19db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
@@ -33,8 +33,6 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev); 
 int amdgpu_discovery_get_ip_version(struct amdgpu_device *adev, int hw_id, int 
number_instance,
 int *major, int *minor, int *revision);
 
-int amdgpu_discovery_get_vcn_version(struct amdgpu_device *adev, int 
vcn_instance,
-int *major, int *minor, int *revision);
 int amdgpu_discovery_get_gfx_info(struct amdgpu_device *adev);  int 
amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 2658414c503d..38036cbf6203 100644
--- a/drivers/gpu/

RE: [PATCH v3] drm/amdgpu: fix incorrect VCN revision in SRIOV

2021-12-09 Thread Chen, Guchun

[Public]

Hi Lijo,

The check is not necessary. It has a guard by for loop in the caller.

for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
...
if (amdgpu_vcn_is_disabled_vcn(adev, VCN_ENCODE_RING, i)) {
..
}

Regards,
Guchun

-Original Message-
From: Lazar, Lijo  
Sent: Thursday, December 9, 2021 4:53 PM
To: Shi, Leslie ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun 
Subject: Re: [PATCH v3] drm/amdgpu: fix incorrect VCN revision in SRIOV



On 12/9/2021 1:56 PM, Leslie Shi wrote:
> Guest OS will setup VCN instance 1 which is disabled as an enabled 
> instance and execute initialization work on it, but this causes VCN ib 
> ring test failure on the disabled VCN instance during modprobe:
> 
> amdgpu :00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 
> 1 amdgpu :00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test 
> failed on vcn_dec_0 (-110).
> amdgpu :00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test 
> failed on vcn_enc_0.0 (-110).
> [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test 
> failed (-110).
> 
> v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to 
> vcn_config
> v3: modify VCN's revision in SR-IOV and bare-metal
> 
> Fixes: 36b7d5646476 ("drm/amdgpu: handle SRIOV VCN revision parsing")
> Signed-off-by: Leslie Shi 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 29 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h |  2 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 15 +++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h   |  2 +-
>   4 files changed, 14 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index 552031950518..f31bc0187394 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -380,18 +380,15 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
> *adev)
> ip->revision);
>   
>   if (le16_to_cpu(ip->hw_id) == VCN_HWID) {
> - if (amdgpu_sriov_vf(adev)) {
> - /* SR-IOV modifies each VCN’s revision 
> (uint8)
> -  * Bit [5:0]: original revision value
> -  * Bit [7:6]: en/decode capability:
> -  * 0b00 : VCN function normally
> -  * 0b10 : encode is disabled
> -  * 0b01 : decode is disabled
> -  */
> - 
> adev->vcn.sriov_config[adev->vcn.num_vcn_inst] =
> - (ip->revision & 0xc0) >> 6;
> - ip->revision &= ~0xc0;
> - }
> + /* Bit [5:0]: original revision value
> +  * Bit [7:6]: en/decode capability:
> +  * 0b00 : VCN function normally
> +  * 0b10 : encode is disabled
> +  * 0b01 : decode is disabled
> +  */
> + adev->vcn.vcn_config[adev->vcn.num_vcn_inst] =
> + ip->revision & 0xc0;
> + ip->revision &= ~0xc0;
>   adev->vcn.num_vcn_inst++;
>   }
>   if (le16_to_cpu(ip->hw_id) == SDMA0_HWID || @@ -485,14 
> +482,6 @@ 
> int amdgpu_discovery_get_ip_version(struct amdgpu_device *adev, int hw_id, 
> int n
>   return -EINVAL;
>   }
>   
> -
> -int amdgpu_discovery_get_vcn_version(struct amdgpu_device *adev, int 
> vcn_instance,
> -  int *major, int *minor, int *revision)
> -{
> - return amdgpu_discovery_get_ip_version(adev, VCN_HWID,
> -vcn_instance, major, minor, 
> revision);
> -}
> -
>   void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev)
>   {
>   struct binary_header *bhdr;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
> index 0ea029e3b850..14537cec19db 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
> @@ -33,8 +33,6 @@ void amdgpu_discovery_harvest_ip(struct amdgpu_device 
> *adev);
>

RE: [PATCH v2] drm/amdgpu: fix incorrect VCN revision in SRIOV

2021-12-09 Thread Chen, Guchun

[Public]

Re: We can probably just drop the conditional here and just clear the high bits 
for everything.

It's addressed in v3 by Leslie.

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Friday, December 10, 2021 12:02 AM
To: Shi, Leslie 
Cc: Lazar, Lijo ; amd-gfx list 
; Chen, Guchun 
Subject: Re: [PATCH v2] drm/amdgpu: fix incorrect VCN revision in SRIOV

On Thu, Dec 9, 2021 at 12:18 AM Leslie Shi  wrote:
>
> Guest OS will setup VCN instance 1 which is disabled as an enabled 
> instance and execute initialization work on it, but this causes VCN ib 
> ring test failure on the disabled VCN instance during modprobe:
>
> amdgpu :00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 
> 1 amdgpu :00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test 
> failed on vcn_dec_0 (-110).
> amdgpu :00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test 
> failed on vcn_enc_0.0 (-110).
> [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test 
> failed (-110).
>
> v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to 
> vcn_config
>
> Fixes: 36b7d5646476 ("drm/amdgpu: handle SRIOV VCN revision parsing")
> Signed-off-by: Leslie Shi 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 13 +++--  
> drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h |  2 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   | 15 ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h   |  2 +-
>  4 files changed, 8 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index 552031950518..53ff1bbe8bd6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -380,6 +380,9 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
> *adev)
>   ip->revision);
>
> if (le16_to_cpu(ip->hw_id) == VCN_HWID) {
> +   adev->vcn.vcn_config[adev->vcn.num_vcn_inst] =
> +   ip->revision & 0xc0;
> +
> if (amdgpu_sriov_vf(adev)) {

We can probably just drop the conditional here and just clear the high bits for 
everything.

Alex

> /* SR-IOV modifies each VCN’s 
> revision (uint8)
>  * Bit [5:0]: original 
> revision value @@ -388,8 +391,6 @@ int amdgpu_discovery_reg_base_init(struct 
> amdgpu_device *adev)
>  * 0b10 : encode is disabled
>  * 0b01 : decode is disabled
>  */
> -   
> adev->vcn.sriov_config[adev->vcn.num_vcn_inst] =
> -   (ip->revision & 0xc0) >> 6;
> ip->revision &= ~0xc0;
> }
> adev->vcn.num_vcn_inst++; @@ -485,14 
> +486,6 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device *adev, int 
> hw_id, int n
> return -EINVAL;
>  }
>
> -
> -int amdgpu_discovery_get_vcn_version(struct amdgpu_device *adev, int 
> vcn_instance,
> -int *major, int *minor, int *revision)
> -{
> -   return amdgpu_discovery_get_ip_version(adev, VCN_HWID,
> -  vcn_instance, major, minor, 
> revision);
> -}
> -
>  void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev)  {
> struct binary_header *bhdr;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
> index 0ea029e3b850..14537cec19db 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
> @@ -33,8 +33,6 @@ void amdgpu_discovery_harvest_ip(struct 
> amdgpu_device *adev);  int amdgpu_discovery_get_ip_version(struct 
> amdgpu_device *adev, int hw_id, int number_instance,
>  int *major, int *minor, int 
> *revision);
>
> -int amdgpu_discovery_get_vcn_version(struct amdgpu_device *adev, int 
> vcn_instance,
> -int *major, int *minor, int *revision);
>  int amdgpu_discovery_get_gfx_info(struct amdgpu_device *adev);  int 
> amdgpu_discovery_set_ip_blocks(struct amdgpu_device *adev);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index 2658414c503d..38

RE: [PATCH] drm/amd/pm: fix reading SMU FW version from amdgpu_firmware_info on YC

2021-12-12 Thread Chen, Guchun

[Public]

In SMU11/SMU12, it will cache pm.fw_version unconditionally only in APU case. 
So we should apply the same code in smu_v13_0_check_fw_version?

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Mario 
Limonciello
Sent: Friday, December 10, 2021 10:29 PM
To: amd-gfx@lists.freedesktop.org
Cc: Limonciello, Mario 
Subject: [PATCH] drm/amd/pm: fix reading SMU FW version from 
amdgpu_firmware_info on YC

This value does not get cached into adev->pm.fw_version during startup for 
smu13 like it does for other SMU like smu10.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 55421ea622fb..85dbd6a7efa9 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -226,6 +226,8 @@ int smu_v13_0_check_fw_version(struct smu_context *smu)
 
dev_info(smu->adev->dev, "smu fw reported version = 0x%08x 
(%d.%d.%d)\n",
 smu_version, smu_major, smu_minor, smu_debug);
+   if (!smu->adev->pm.fw_version)
+   smu->adev->pm.fw_version = smu_version;
 
/*
 * 1. if_version mismatch is not critical as our fw is designed
--
2.25.1

RE: [PATCH V5 15/16] drm/amd/pm: revise the performance level setting APIs

2021-12-12 Thread Chen, Guchun

[Public]

A coding style nitpick.

int ret = 0;
+   uint32_t profile_mode_mask = AMD_DPM_FORCED_LEVEL_PROFILE_STANDARD |
+   AMD_DPM_FORCED_LEVEL_PROFILE_MIN_SCLK |
+   AMD_DPM_FORCED_LEVEL_PROFILE_MIN_MCLK |
+   AMD_DPM_FORCED_LEVEL_PROFILE_PEAK;

It's better to declare short variable at the end. So pls move "int ret = 0;" 
after profile_mode_mask.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Monday, December 13, 2021 11:52 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Lazar, Lijo 
; Quan, Evan 
Subject: [PATCH V5 15/16] drm/amd/pm: revise the performance level setting APIs

Avoid cross callings which make lock protection enforcement on 
amdgpu_dpm_force_performance_level() impossible.

Signed-off-by: Evan Quan 
Change-Id: Ie658140f40ab906ce2ec47576a086062b61076a6
--
v1->v2:
  - drop unused enable_umd_pstate callback(Lijo)
---
 drivers/gpu/drm/amd/include/amd_shared.h  |  2 --
 drivers/gpu/drm/amd/pm/amdgpu_pm.c| 29 ---
 .../gpu/drm/amd/pm/legacy-dpm/legacy_dpm.c| 17 ++-
 .../gpu/drm/amd/pm/powerplay/amd_powerplay.c  | 12 
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 15 --
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  1 -
 6 files changed, 34 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/amd_shared.h 
b/drivers/gpu/drm/amd/include/amd_shared.h
index f57a1478f0fe..fb6ad56ad6f1 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -268,7 +268,6 @@ enum amd_dpm_forced_level;
  * @set_clockgating_state: enable/disable cg for the IP block
  * @set_powergating_state: enable/disable pg for the IP block
  * @get_clockgating_state: get current clockgating status
- * @enable_umd_pstate: enable UMD powerstate
  *
  * These hooks provide an interface for controlling the operational state
  * of IP blocks. After acquiring a list of IP blocks for the GPU in use, @@ 
-299,7 +298,6 @@ struct amd_ip_funcs {
int (*set_powergating_state)(void *handle,
 enum amd_powergating_state state);
void (*get_clockgating_state)(void *handle, u32 *flags);
-   int (*enable_umd_pstate)(void *handle, enum amd_dpm_forced_level 
*level);
 };
 
 
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index ce80430c0eb6..106f6ee955f4 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -301,6 +301,10 @@ static ssize_t 
amdgpu_set_power_dpm_force_performance_level(struct device *dev,
enum amd_dpm_forced_level level;
enum amd_dpm_forced_level current_level;
int ret = 0;
+   uint32_t profile_mode_mask = AMD_DPM_FORCED_LEVEL_PROFILE_STANDARD |
+   AMD_DPM_FORCED_LEVEL_PROFILE_MIN_SCLK |
+   AMD_DPM_FORCED_LEVEL_PROFILE_MIN_MCLK |
+   AMD_DPM_FORCED_LEVEL_PROFILE_PEAK;
 
if (amdgpu_in_reset(adev))
return -EPERM;
@@ -354,10 +358,7 @@ static ssize_t 
amdgpu_set_power_dpm_force_performance_level(struct device *dev,
}
 
/* profile_exit setting is valid only when current mode is in profile 
mode */
-   if (!(current_level & (AMD_DPM_FORCED_LEVEL_PROFILE_STANDARD |
-   AMD_DPM_FORCED_LEVEL_PROFILE_MIN_SCLK |
-   AMD_DPM_FORCED_LEVEL_PROFILE_MIN_MCLK |
-   AMD_DPM_FORCED_LEVEL_PROFILE_PEAK)) &&
+   if (!(current_level & profile_mode_mask) &&
(level == AMD_DPM_FORCED_LEVEL_PROFILE_EXIT)) {
pr_err("Currently not in any profile mode!\n");
pm_runtime_mark_last_busy(ddev->dev);
@@ -365,6 +366,26 @@ static ssize_t 
amdgpu_set_power_dpm_force_performance_level(struct device *dev,
return -EINVAL;
}
 
+   if (!(current_level & profile_mode_mask) &&
+ (level & profile_mode_mask)) {
+   /* enter UMD Pstate */
+   amdgpu_device_ip_set_powergating_state(adev,
+  AMD_IP_BLOCK_TYPE_GFX,
+  AMD_PG_STATE_UNGATE);
+   amdgpu_device_ip_set_clockgating_state(adev,
+  AMD_IP_BLOCK_TYPE_GFX,
+  AMD_CG_STATE_UNGATE);
+   } else if ((current_level & profile_mode_mask) &&
+   !(level & profile_mode_mask)) {
+   /* exit UMD Pstate */
+   amdgpu_device_ip_set_clockgating_state(adev,
+  AMD_IP_BLOCK_TYPE_GFX,
+  AMD_CG_STATE_GATE);
+   amdgpu_device_ip_set_p

RE: [PATCH] drm/amdgpu: move smu_debug_mask to a more proper place

2021-12-12 Thread Chen, Guchun

[Public]

-   if (unlikely(smu->smu_debug_mask & SMU_DEBUG_HALT_ON_ERROR) &&
+   if (unlikely(adev->smu_debug_mask & SMU_DEBUG_HALT_ON_ERROR) &&
res && (res != -ETIME)) {
amdgpu_device_halt(smu->adev);

[Guchun] As we have set an 'adev' variable, we can replace 'smu->adev' with 
'adev' in each function directly.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Monday, December 13, 2021 1:43 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Quan, Evan 

Subject: [PATCH] drm/amdgpu: move smu_debug_mask to a more proper place

As the smu_context will be invisible from outside(of power). Also, the 
smu_debug_mask can be shared around all power code instead of some specific 
framework(swSMU) only.

Signed-off-by: Evan Quan 
Change-Id: I1a0e1a436a51fc520a47b3fb28cde527d4e5eb6e
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h | 8 
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 9 ++---
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e701dedce344..9ceb8f3e73de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -811,6 +811,9 @@ struct amd_powerplay {
  (rid == 0x01) || \
  (rid == 0x10
 
+/* Used to mask smu debug modes */
+#define SMU_DEBUG_HALT_ON_ERROR0x1
+
 #define AMDGPU_RESET_MAGIC_NUM 64
 #define AMDGPU_MAX_DF_PERFMONS 4
 struct amdgpu_device {
@@ -959,6 +962,10 @@ struct amdgpu_device {
struct amdgpu_pmpm;
u32 cg_flags;
u32 pg_flags;
+   /*
+* 0 = disabled (default), otherwise enable corresponding debug mode
+*/
+   uint32_tsmu_debug_mask;
 
/* nbio */
struct amdgpu_nbio  nbio;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 9dfccb20fedd..ee1cc15c6f09 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1619,7 +1619,7 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
return 0;
 
debugfs_create_x32("amdgpu_smu_debug", 0600, root,
-  &adev->smu.smu_debug_mask);
+  &adev->smu_debug_mask);
 
ent = debugfs_create_file("amdgpu_preempt_ib", 0600, root, adev,
  &fops_ib_preempt);
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 12e67ad9a3b2..2b9b9a7ba97a 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -482,9 +482,6 @@ struct stb_context {
 
 #define WORKLOAD_POLICY_MAX 7
 
-/* Used to mask smu debug modes */
-#define SMU_DEBUG_HALT_ON_ERROR0x1
-
 struct smu_context
 {
struct amdgpu_device*adev;
@@ -573,11 +570,6 @@ struct smu_context
struct smu_user_dpm_profile user_dpm_profile;
 
struct stb_context stb_context;
-
-   /*
-* 0 = disabled (default), otherwise enable corresponding debug mode
-*/
-   uint32_t smu_debug_mask;
 };
 
 struct i2c_adapter;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 43637d55fe29..b233d9d766f2 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -257,6 +257,7 @@ int smu_cmn_send_msg_without_waiting(struct smu_context 
*smu,
 uint16_t msg_index,
 uint32_t param)
 {
+   struct amdgpu_device *adev = smu->adev;
u32 reg;
int res;
 
@@ -272,7 +273,7 @@ int smu_cmn_send_msg_without_waiting(struct smu_context 
*smu,
__smu_cmn_send_msg(smu, msg_index, param);
res = 0;
 Out:
-   if (unlikely(smu->smu_debug_mask & SMU_DEBUG_HALT_ON_ERROR) &&
+   if (unlikely(adev->smu_debug_mask & SMU_DEBUG_HALT_ON_ERROR) &&
res && (res != -ETIME)) {
amdgpu_device_halt(smu->adev);
[Guchun] As we have set a adev variable, we can replace smu->adev with adev 
directly.

WARN_ON(1);
@@ -293,13 +294,14 @@ int smu_cmn_send_msg_without_waiting(struct smu_context 
*smu,
  */
 int smu_cmn_wait_for_response(struct smu_context *smu)  {
+   struct amdgpu_device *adev = smu->adev;
u32 reg;
int res;
 
reg = __smu_cmn_poll_stat(smu);
res = __smu_cmn_reg2errno(smu, reg);
 
-   if (unlikely(smu->smu_debug_mask & SMU_DEBUG_HALT_ON_ERROR) &&
+   if (unlikely(adev->smu_debug_mask & SMU_DEBUG_HALT_ON_ERROR) &&

RE: [PATCH V2] drm/amdgpu: move smu_debug_mask to a more proper place

2021-12-13 Thread Chen, Guchun

[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Quan, Evan  
Sent: Monday, December 13, 2021 3:20 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Chen, Guchun 
; Lazar, Lijo ; Quan, Evan 

Subject: [PATCH V2] drm/amdgpu: move smu_debug_mask to a more proper place

As the smu_context will be invisible from outside(of power). Also, the 
smu_debug_mask can be shared around all power code instead of some specific 
framework(swSMU) only.

Signed-off-by: Evan Quan 
Change-Id: I1a0e1a436a51fc520a47b3fb28cde527d4e5eb6e
--
v1->v2:
  - drop non-necessary intermediate adev(Guchun)
  - move smu_debug_mask inside struct amdgpu_pm(Lijo)
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h |  8 
 drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h |  8 
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c  | 16 +---
 4 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 9dfccb20fedd..25e2e5bf90eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1619,7 +1619,7 @@ int amdgpu_debugfs_init(struct amdgpu_device *adev)
return 0;
 
debugfs_create_x32("amdgpu_smu_debug", 0600, root,
-  &adev->smu.smu_debug_mask);
+  &adev->pm.smu_debug_mask);
 
ent = debugfs_create_file("amdgpu_preempt_ib", 0600, root, adev,
  &fops_ib_preempt);
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
index 16e3f72d31b9..c464a045000d 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
@@ -423,6 +423,9 @@ enum ip_power_state {
POWER_STATE_OFF,
 };
 
+/* Used to mask smu debug modes */
+#define SMU_DEBUG_HALT_ON_ERROR0x1
+
 struct amdgpu_pm {
struct mutexmutex;
u32 current_sclk;
@@ -460,6 +463,11 @@ struct amdgpu_pm {
struct list_headpm_attr_list;
 
atomic_tpwr_state[AMD_IP_BLOCK_TYPE_NUM];
+
+   /*
+* 0 = disabled (default), otherwise enable corresponding debug mode
+*/
+   uint32_tsmu_debug_mask;
 };
 
 #define R600_SSTU_DFLT   0
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
index 12e67ad9a3b2..2b9b9a7ba97a 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_smu.h
@@ -482,9 +482,6 @@ struct stb_context {
 
 #define WORKLOAD_POLICY_MAX 7
 
-/* Used to mask smu debug modes */
-#define SMU_DEBUG_HALT_ON_ERROR0x1
-
 struct smu_context
 {
struct amdgpu_device*adev;
@@ -573,11 +570,6 @@ struct smu_context
struct smu_user_dpm_profile user_dpm_profile;
 
struct stb_context stb_context;
-
-   /*
-* 0 = disabled (default), otherwise enable corresponding debug mode
-*/
-   uint32_t smu_debug_mask;
 };
 
 struct i2c_adapter;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 43637d55fe29..735e1a1e365d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -257,10 +257,11 @@ int smu_cmn_send_msg_without_waiting(struct smu_context 
*smu,
 uint16_t msg_index,
 uint32_t param)
 {
+   struct amdgpu_device *adev = smu->adev;
u32 reg;
int res;
 
-   if (smu->adev->no_hw_access)
+   if (adev->no_hw_access)
return 0;
 
reg = __smu_cmn_poll_stat(smu);
@@ -272,9 +273,9 @@ int smu_cmn_send_msg_without_waiting(struct smu_context 
*smu,
__smu_cmn_send_msg(smu, msg_index, param);
res = 0;
 Out:
-   if (unlikely(smu->smu_debug_mask & SMU_DEBUG_HALT_ON_ERROR) &&
+   if (unlikely(adev->pm.smu_debug_mask & SMU_DEBUG_HALT_ON_ERROR) &&
res && (res != -ETIME)) {
-   amdgpu_device_halt(smu->adev);
+   amdgpu_device_halt(adev);
WARN_ON(1);
}
 
@@ -299,7 +300,7 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
reg = __smu_cmn_poll_stat(smu);
res = __smu_cmn_reg2errno(smu, reg);
 
-   if (unlikely(smu->smu_debug_mask & SMU_DEBUG_HALT_ON_ERROR) &&
+   if (unlikely(smu->adev->pm.smu_debug_mask & SMU_DEBUG_HALT_ON_ERROR) 
+&&
res && (res != -ETIME)) {
amdgpu_device_halt(smu->adev);
WARN_ON(1);
@@ -343,10 +344,11 @@ int smu_cmn_send_smc_msg_with_param(struct smu_co

RE: [PATCH v2 1/2] drm/amd/pm: fix reading SMU FW version from amdgpu_firmware_info on YC

2021-12-13 Thread Chen, Guchun

[Public]

A nitpick.

As we have defined a local variable 'adev', so code like 'smu->adev' should be 
replaced directly by 'adev' in the function to make code clean.

With above addressed, the series is:

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Mario 
Limonciello
Sent: Monday, December 13, 2021 11:09 PM
To: amd-gfx@lists.freedesktop.org
Cc: Limonciello, Mario 
Subject: [PATCH v2 1/2] drm/amd/pm: fix reading SMU FW version from 
amdgpu_firmware_info on YC

This value does not get cached into adev->pm.fw_version during startup for 
smu13 like it does for other SMU like smu12.

Signed-off-by: Mario Limonciello 
---
v1->v2:
* Run on all v13 APU to match v12 behavior  
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 55421ea622fb..7fdb63da1316 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -196,6 +196,7 @@ int smu_v13_0_check_fw_status(struct smu_context *smu)
 
 int smu_v13_0_check_fw_version(struct smu_context *smu)  {
+   struct amdgpu_device *adev = smu->adev;
uint32_t if_version = 0xff, smu_version = 0xff;
uint16_t smu_major;
uint8_t smu_minor, smu_debug;
@@ -208,6 +209,8 @@ int smu_v13_0_check_fw_version(struct smu_context *smu)
smu_major = (smu_version >> 16) & 0x;
smu_minor = (smu_version >> 8) & 0xff;
smu_debug = (smu_version >> 0) & 0xff;
+   if (smu->is_apu)
+   adev->pm.fw_version = smu_version;
 
switch (smu->adev->ip_versions[MP1_HWIP][0]) {
case IP_VERSION(13, 0, 2):
--
2.25.1

RE: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

2021-12-13 Thread Chen, Guchun

[Public]

+   if (adev->asic_type == CHIP_ALDEBARAN)
+   offset = 0;
 
if (!is_fru_eeprom_supported(adev))
return 0;

I assume the logic should be adjusted. It's better to put the asic_type check 
after is_fru_eeprom_supported.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Kent Russell
Sent: Tuesday, December 14, 2021 3:34 AM
To: amd-gfx@lists.freedesktop.org
Cc: Russell, Kent 
Subject: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

This is supported, although the offset is different from VG20, so fix that with 
a variable and enable getting the product name and serial number from the FRU.

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
index b3b951fe0861..124376b666fd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
@@ -56,6 +56,8 @@ static bool is_fru_eeprom_supported(struct amdgpu_device 
*adev)
return true;
else
return false;
+   case CHIP_ALDEBARAN:
+   return true;
default:
return false;
}
@@ -91,6 +93,10 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
unsigned char buff[66];
u32 addrptr;
int size, len;
+   int offset = 2;
+
+   if (adev->asic_type == CHIP_ALDEBARAN)
+   offset = 0;
 
if (!is_fru_eeprom_supported(adev))
return 0;
@@ -139,7 +145,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
len = sizeof(adev->product_name) - 1;
}
/* Start at 2 due to buff using fields 0 and 1 for the address */
-   memcpy(adev->product_name, &buff[2], len);
+   memcpy(adev->product_name, &buff[offset], len);
adev->product_name[len] = '\0';
 
addrptr += size + 1;
@@ -157,7 +163,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
DRM_WARN("FRU Product Number is larger than 16 characters. This 
is likely a mistake");
len = sizeof(adev->product_number) - 1;
}
-   memcpy(adev->product_number, &buff[2], len);
+   memcpy(adev->product_number, &buff[offset], len);
adev->product_number[len] = '\0';
 
addrptr += size + 1;
@@ -184,7 +190,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
DRM_WARN("FRU Serial Number is larger than 16 characters. This 
is likely a mistake");
len = sizeof(adev->serial) - 1;
}
-   memcpy(adev->serial, &buff[2], len);
+   memcpy(adev->serial, &buff[offset], len);
adev->serial[len] = '\0';
 
return 0;
--
2.25.1

RE: [PATCH 1/4] drm/amdgpu: Increase potential product_name to 64 characters

2021-12-13 Thread Chen, Guchun

[Public]

How about set a define like PRODUCT_NAME_LEN to be 64, and use it in FRU code? 
In this case, if it needs to bump string length of product name later on, it 
will be simple.

#define PRODUCT_NAME_LEN 64

unsigned char buff[PRODUCT_NAME_LEN + 2];

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Kent Russell
Sent: Tuesday, December 14, 2021 3:34 AM
To: amd-gfx@lists.freedesktop.org
Cc: Russell, Kent 
Subject: [PATCH 1/4] drm/amdgpu: Increase potential product_name to 64 
characters

Having seen at least 1 42-character product_name, bump the number up to 64.

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e701dedce344..1afb3066f6dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1083,7 +1083,7 @@ struct amdgpu_device {
 
/* Chip product information */
charproduct_number[16];
-   charproduct_name[32];
+   charproduct_name[64];
charserial[20];
 
atomic_tthrottling_logging_enabled;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
index 7709caeb233d..b3b951fe0861 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
@@ -88,7 +88,7 @@ static int amdgpu_fru_read_eeprom(struct amdgpu_device *adev, 
uint32_t addrptr,
 
 int amdgpu_fru_get_product_info(struct amdgpu_device *adev)  {
-   unsigned char buff[34];
+   unsigned char buff[66];
u32 addrptr;
int size, len;
 
@@ -131,11 +131,11 @@ int amdgpu_fru_get_product_info(struct amdgpu_device 
*adev)
}
 
len = size;
-   /* Product name should only be 32 characters. Any more,
-* and something could be wrong. Cap it at 32 to be safe
+   /* Product name should logically be < 64 characters. Any more,
+* and something could be wrong. Cap it at 64 to be safe
 */
if (len >= sizeof(adev->product_name)) {
-   DRM_WARN("FRU Product Number is larger than 32 characters. This 
is likely a mistake");
+   DRM_WARN("FRU Product Name is larger than 64 characters. This 
is 
+likely a mistake");
len = sizeof(adev->product_name) - 1;
}
/* Start at 2 due to buff using fields 0 and 1 for the address */
--
2.25.1

RE: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

2021-12-13 Thread Chen, Guchun

[Public]

BTW, does FRU exists on all the Aldebaran SKUs?

This patch acks FRU's presence when it's Aldebaran. So if some Aldebaran SKUs 
do not have a FRU, some i2c access failures will be observed during boot up. 
This is not friendly to user.

Please check Vega20 case we talked before, and there are some more strict 
checks for FRU on several Vega20 SKUs.

case CHIP_VEGA20:
/* D161 and D163 are the VG20 server SKUs */
if (strnstr(atom_ctx->vbios_version, "D161",
sizeof(atom_ctx->vbios_version)) ||
strnstr(atom_ctx->vbios_version, "D163",
sizeof(atom_ctx->vbios_version)))
return true;
else
return false;

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Chen, Guchun
Sent: Tuesday, December 14, 2021 11:17 AM
To: Russell, Kent ; amd-gfx@lists.freedesktop.org
Cc: Russell, Kent 
Subject: RE: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

[Public]

+   if (adev->asic_type == CHIP_ALDEBARAN)
+   offset = 0;
 
if (!is_fru_eeprom_supported(adev))
return 0;

I assume the logic should be adjusted. It's better to put the asic_type check 
after is_fru_eeprom_supported.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Kent Russell
Sent: Tuesday, December 14, 2021 3:34 AM
To: amd-gfx@lists.freedesktop.org
Cc: Russell, Kent 
Subject: [PATCH 4/4] drm/amdgpu: Access the FRU on Aldebaran

This is supported, although the offset is different from VG20, so fix that with 
a variable and enable getting the product name and serial number from the FRU.

Signed-off-by: Kent Russell 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
index b3b951fe0861..124376b666fd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c
@@ -56,6 +56,8 @@ static bool is_fru_eeprom_supported(struct amdgpu_device 
*adev)
return true;
else
return false;
+   case CHIP_ALDEBARAN:
+   return true;
default:
return false;
}
@@ -91,6 +93,10 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
unsigned char buff[66];
u32 addrptr;
int size, len;
+   int offset = 2;
+
+   if (adev->asic_type == CHIP_ALDEBARAN)
+   offset = 0;
 
if (!is_fru_eeprom_supported(adev))
return 0;
@@ -139,7 +145,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
len = sizeof(adev->product_name) - 1;
}
/* Start at 2 due to buff using fields 0 and 1 for the address */
-   memcpy(adev->product_name, &buff[2], len);
+   memcpy(adev->product_name, &buff[offset], len);
adev->product_name[len] = '\0';
 
addrptr += size + 1;
@@ -157,7 +163,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
DRM_WARN("FRU Product Number is larger than 16 characters. This 
is likely a mistake");
len = sizeof(adev->product_number) - 1;
}
-   memcpy(adev->product_number, &buff[2], len);
+   memcpy(adev->product_number, &buff[offset], len);
adev->product_number[len] = '\0';
 
addrptr += size + 1;
@@ -184,7 +190,7 @@ int amdgpu_fru_get_product_info(struct amdgpu_device *adev)
DRM_WARN("FRU Serial Number is larger than 16 characters. This 
is likely a mistake");
len = sizeof(adev->serial) - 1;
}
-   memcpy(adev->serial, &buff[2], len);
+   memcpy(adev->serial, &buff[offset], len);
adev->serial[len] = '\0';
 
return 0;
--
2.25.1

RE: [PATCH] drm/amdgpu: correct the wrong cached state for GMC on PICASSO

2021-12-13 Thread Chen, Guchun

[Public]

Acked-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Evan Quan
Sent: Tuesday, December 14, 2021 9:34 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Quan, Evan 
; Limonciello, Mario 
Subject: [PATCH] drm/amdgpu: correct the wrong cached state for GMC on PICASSO

Pair the operations did in GMC ->hw_init and ->hw_fini. That can help to 
maintain correct cached state for GMC and avoid unintention gate operation 
dropping due to wrong cached state.

BUG: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1828&data=04%7C01%7Cguchun.chen%40amd.com%7C42b00d7e1c4e44c0762908d9bea1ef53%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637750424983319967%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=VgBDAbcKN%2FqUz8iBQby9YP8PsG2y93VlnDVhXVaGNBo%3D&reserved=0

Signed-off-by: Evan Quan 
Change-Id: I9976672a64464b86bb45eed0c25c9599d3bb4c06
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 8 
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 8 
 drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 7 ++-
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index db2ec84f7237..c7492db3e189 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1809,6 +1809,14 @@ static int gmc_v9_0_hw_fini(void *handle)
return 0;
}
 
+   /*
+* Pair the operations did in gmc_v9_0_hw_init and thus maintain
+* a correct cached state for GMC. Otherwise, the "gate" again
+* operation on S3 resuming will fail due to wrong cached state.
+*/
+   if (adev->mmhub.funcs->update_power_gating)
+   adev->mmhub.funcs->update_power_gating(adev, false);
+
amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
amdgpu_irq_put(adev, &adev->gmc.vm_fault, 0);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index b3bede1dc41d..1da2ec692057 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -301,10 +301,10 @@ static void mmhub_v1_0_update_power_gating(struct 
amdgpu_device *adev,
if (amdgpu_sriov_vf(adev))
return;
 
-   if (enable && adev->pg_flags & AMD_PG_SUPPORT_MMHUB) {
-   amdgpu_dpm_set_powergating_by_smu(adev, AMD_IP_BLOCK_TYPE_GMC, 
true);
-
-   }
+   if (adev->pg_flags & AMD_PG_SUPPORT_MMHUB)
+   amdgpu_dpm_set_powergating_by_smu(adev,
+ AMD_IP_BLOCK_TYPE_GMC,
+ enable);
 }
 
 static int mmhub_v1_0_gart_enable(struct amdgpu_device *adev) diff --git 
a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index 3656a77baea4..9953a77cb987 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
@@ -1167,7 +1167,12 @@ static int pp_set_powergating_by_smu(void *handle,
pp_dpm_powergate_vce(handle, gate);
break;
case AMD_IP_BLOCK_TYPE_GMC:
-   pp_dpm_powergate_mmhub(handle);
+   /*
+* For now, this is only used on PICASSO.
+* And only "gate" operation is supported.
+*/
+   if (gate)
+   pp_dpm_powergate_mmhub(handle);
break;
case AMD_IP_BLOCK_TYPE_GFX:
ret = pp_dpm_powergate_gfx(handle, gate);
--
2.29.0

RE: [PATCH] drivers/amd/pm: smu13: use local variable adev

2021-12-13 Thread Chen, Guchun

[Public]

Thank you, Mario.

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Limonciello, Mario  
Sent: Tuesday, December 14, 2021 11:58 AM
To: amd-gfx@lists.freedesktop.org
Cc: Limonciello, Mario ; Chen, Guchun 

Subject: [PATCH] drivers/amd/pm: smu13: use local variable adev

Since this variable was made available by the previous commit, use it to make 
function access cleaner.

Suggested-by: Guchun Chen 
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 677a246212f9..bb3f6072ed30 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -212,7 +212,7 @@ int smu_v13_0_check_fw_version(struct smu_context *smu)
if (smu->is_apu)
adev->pm.fw_version = smu_version;
 
-   switch (smu->adev->ip_versions[MP1_HWIP][0]) {
+   switch (adev->ip_versions[MP1_HWIP][0]) {
case IP_VERSION(13, 0, 2):
smu->smc_driver_if_version = SMU13_DRIVER_IF_VERSION_ALDE;
break;
@@ -221,8 +221,8 @@ int smu_v13_0_check_fw_version(struct smu_context *smu)
smu->smc_driver_if_version = 
SMU13_DRIVER_IF_VERSION_YELLOW_CARP;
break;
default:
-   dev_err(smu->adev->dev, "smu unsupported IP version: 0x%x.\n",
-   smu->adev->ip_versions[MP1_HWIP][0]);
+   dev_err(adev->dev, "smu unsupported IP version: 0x%x.\n",
+   adev->ip_versions[MP1_HWIP][0]);
smu->smc_driver_if_version = SMU13_DRIVER_IF_VERSION_INV;
break;
}
@@ -236,11 +236,11 @@ int smu_v13_0_check_fw_version(struct smu_context *smu)
 * of halt driver loading.
 */
if (if_version != smu->smc_driver_if_version) {
-   dev_info(smu->adev->dev, "smu driver if version = 0x%08x, smu 
fw if version = 0x%08x, "
+   dev_info(adev->dev, "smu driver if version = 0x%08x, smu fw if 
version = 0x%08x, "
 "smu fw version = 0x%08x (%d.%d.%d)\n",
 smu->smc_driver_if_version, if_version,
 smu_version, smu_major, smu_minor, smu_debug);
-   dev_warn(smu->adev->dev, "SMU driver if version not matched\n");
+   dev_warn(adev->dev, "SMU driver if version not matched\n");
}
 
return ret;
--
2.25.1

RE: [PATCH] drm/amdgpu: re-apply "use the new cursor in the VM code""

2021-03-22 Thread Chen, Guchun

[AMD Public Use]

Hi Christian,

I will conduct one stress test for this tomorrow. Would you mind waiting for my 
ack before submitting?

Regards,
Guchun

-Original Message-
From: Christian König  
Sent: Monday, March 22, 2021 8:41 PM
To: amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Das, Nirmoy 
Subject: [PATCH] drm/amdgpu: re-apply "use the new cursor in the VM code""

Now that we found the underlying problem we can re-apply this patch.

This reverts commit 867fee7f8821ff42e7308088cf0c3450ac49c17c.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 55 +-
 1 file changed, 18 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 9268db1172bd..bc3951b71079 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -37,6 +37,7 @@
 #include "amdgpu_gmc.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_dma_buf.h"
+#include "amdgpu_res_cursor.h"
 
 /**
  * DOC: GPUVM
@@ -1583,7 +1584,7 @@ static int amdgpu_vm_update_ptes(struct 
amdgpu_vm_update_params *params,
  * @last: last mapped entry
  * @flags: flags for the entries
  * @offset: offset into nodes and pages_addr
- * @nodes: array of drm_mm_nodes with the MC addresses
+ * @res: ttm_resource to map
  * @pages_addr: DMA addresses to use for mapping
  * @fence: optional resulting fence
  *
@@ -1598,13 +1599,13 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
   bool unlocked, struct dma_resv *resv,
   uint64_t start, uint64_t last,
   uint64_t flags, uint64_t offset,
-  struct drm_mm_node *nodes,
+  struct ttm_resource *res,
   dma_addr_t *pages_addr,
   struct dma_fence **fence)
 {
struct amdgpu_vm_update_params params;
+   struct amdgpu_res_cursor cursor;
enum amdgpu_sync_mode sync_mode;
-   uint64_t pfn;
int r;
 
memset(¶ms, 0, sizeof(params));
@@ -1622,14 +1623,6 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
else
sync_mode = AMDGPU_SYNC_EXPLICIT;
 
-   pfn = offset >> PAGE_SHIFT;
-   if (nodes) {
-   while (pfn >= nodes->size) {
-   pfn -= nodes->size;
-   ++nodes;
-   }
-   }
-
amdgpu_vm_eviction_lock(vm);
if (vm->evicting) {
r = -EBUSY;
@@ -1648,23 +1641,17 @@ static int amdgpu_vm_bo_update_mapping(struct 
amdgpu_device *adev,
if (r)
goto error_unlock;
 
-   do {
+   amdgpu_res_first(res, offset, (last - start + 1) * AMDGPU_GPU_PAGE_SIZE,
+&cursor);
+   while (cursor.remaining) {
uint64_t tmp, num_entries, addr;
 
-
-   num_entries = last - start + 1;
-   if (nodes) {
-   addr = nodes->start << PAGE_SHIFT;
-   num_entries = min((nodes->size - pfn) *
-   AMDGPU_GPU_PAGES_IN_CPU_PAGE, num_entries);
-   } else {
-   addr = 0;
-   }
-
+   num_entries = cursor.size >> AMDGPU_GPU_PAGE_SHIFT;
if (pages_addr) {
bool contiguous = true;
 
if (num_entries > AMDGPU_GPU_PAGES_IN_CPU_PAGE) {
+   uint64_t pfn = cursor.start >> PAGE_SHIFT;
uint64_t count;
 
contiguous = pages_addr[pfn + 1] == @@ -1684,16 
+1671,18 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
}
 
if (!contiguous) {
-   addr = pfn << PAGE_SHIFT;
+   addr = cursor.start;
params.pages_addr = pages_addr;
} else {
-   addr = pages_addr[pfn];
+   addr = pages_addr[cursor.start >> PAGE_SHIFT];
params.pages_addr = NULL;
}
 
} else if (flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)) {
-   addr += bo_adev->vm_manager.vram_base_offset;
-   addr += pfn << PAGE_SHIFT;
+   addr = bo_adev->vm_manager.vram_base_offset +
+   cursor.start;
+   } else {
+   addr = 0;
}
 
tmp = start + num_entries;
@@ -1701,14 +1690,9 @@

RE: [PATCH][next] drm/amd/display: Fix sizeof arguments in bw_calcs_init()

2021-03-22 Thread Chen, Guchun

[AMD Public Use]

Thanks for your patch, Silva. The issue has been fixed by " a5c6007e20e1 
drm/amd/display: fix modprobe failure on vega series".

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Gustavo A. 
R. Silva
Sent: Monday, March 22, 2021 8:51 PM
To: Lee Jones ; Wentland, Harry ; 
Li, Sun peng (Leo) ; Deucher, Alexander 
; Koenig, Christian ; 
David Airlie ; Daniel Vetter 
Cc: Gustavo A. R. Silva ; 
dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; 
linux-ker...@vger.kernel.org
Subject: [PATCH][next] drm/amd/display: Fix sizeof arguments in bw_calcs_init()

The wrong sizeof values are currently being used as arguments to kzalloc().

Fix this by using the right arguments *dceip and *vbios, correspondingly.

Addresses-Coverity-ID: 1502901 ("Wrong sizeof argument")
Fixes: fca1e079055e ("drm/amd/display/dc/calcs/dce_calcs: Remove some large 
variables from the stack")
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c 
b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
index 556ecfabc8d2..1244fcb0f446 100644
--- a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
@@ -2051,11 +2051,11 @@ void bw_calcs_init(struct bw_calcs_dceip *bw_dceip,
 
enum bw_calcs_version version = bw_calcs_version_from_asic_id(asic_id);
 
-   dceip = kzalloc(sizeof(dceip), GFP_KERNEL);
+   dceip = kzalloc(sizeof(*dceip), GFP_KERNEL);
if (!dceip)
return;
 
-   vbios = kzalloc(sizeof(vbios), GFP_KERNEL);
+   vbios = kzalloc(sizeof(*vbios), GFP_KERNEL);
if (!vbios) {
kfree(dceip);
return;
--
2.27.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cguchun.chen%40amd.com%7C4ec6ae20f70a488fd2dd08d8ed3987cd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637520178643844637%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YKVR3n%2FnX50dwuP91T1xPxW%2FvgisWDY0dvF8PxO4P4A%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: re-apply "use the new cursor in the VM code""

2021-03-23 Thread Chen, Guchun

[AMD Public Use]

Hi Christian,

Thanks for your patience.

Unluckily, after applying below patch, vulkan cts test on my side is negative. 
The same gfxhub page fault and kernel bug along with amdgpu_vm_update_ptes 
calltrace is observed. I will send the full log to you privately soon.

I suggest holding on this patch before rooting cause it.

Regards,
Guchun

-Original Message-
From: Das, Nirmoy  
Sent: Tuesday, March 23, 2021 5:09 PM
To: Chen, Guchun ; Christian König 
; amd-gfx@lists.freedesktop.org
Cc: Das, Nirmoy 
Subject: Re: [PATCH] drm/amdgpu: re-apply "use the new cursor in the VM code""

I tested ./piglit run opengl results/test multiple times. Once I got gfx time 
out

error but without kernel freeze. I can't reproduce it any more.


Regards,

Nirmoy

On 3/22/21 2:11 PM, Chen, Guchun wrote:
> [AMD Public Use]
>
> Hi Christian,
>
> I will conduct one stress test for this tomorrow. Would you mind waiting for 
> my ack before submitting?
>
> Regards,
> Guchun
>
> -Original Message-
> From: Christian König 
> Sent: Monday, March 22, 2021 8:41 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Chen, Guchun ; Das, Nirmoy 
> 
> Subject: [PATCH] drm/amdgpu: re-apply "use the new cursor in the VM code""
>
> Now that we found the underlying problem we can re-apply this patch.
>
> This reverts commit 867fee7f8821ff42e7308088cf0c3450ac49c17c.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 55 +-
>   1 file changed, 18 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 9268db1172bd..bc3951b71079 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -37,6 +37,7 @@
>   #include "amdgpu_gmc.h"
>   #include "amdgpu_xgmi.h"
>   #include "amdgpu_dma_buf.h"
> +#include "amdgpu_res_cursor.h"
>   
>   /**
>* DOC: GPUVM
> @@ -1583,7 +1584,7 @@ static int amdgpu_vm_update_ptes(struct 
> amdgpu_vm_update_params *params,
>* @last: last mapped entry
>* @flags: flags for the entries
>* @offset: offset into nodes and pages_addr
> - * @nodes: array of drm_mm_nodes with the MC addresses
> + * @res: ttm_resource to map
>* @pages_addr: DMA addresses to use for mapping
>* @fence: optional resulting fence
>*
> @@ -1598,13 +1599,13 @@ static int amdgpu_vm_bo_update_mapping(struct 
> amdgpu_device *adev,
>  bool unlocked, struct dma_resv *resv,
>  uint64_t start, uint64_t last,
>  uint64_t flags, uint64_t offset,
> -struct drm_mm_node *nodes,
> +struct ttm_resource *res,
>  dma_addr_t *pages_addr,
>  struct dma_fence **fence)
>   {
>   struct amdgpu_vm_update_params params;
> + struct amdgpu_res_cursor cursor;
>   enum amdgpu_sync_mode sync_mode;
> - uint64_t pfn;
>   int r;
>   
>   memset(¶ms, 0, sizeof(params)); @@ -1622,14 +1623,6 @@ static 
> int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
>   else
>   sync_mode = AMDGPU_SYNC_EXPLICIT;
>   
> - pfn = offset >> PAGE_SHIFT;
> - if (nodes) {
> - while (pfn >= nodes->size) {
> - pfn -= nodes->size;
> - ++nodes;
> - }
> - }
> -
>   amdgpu_vm_eviction_lock(vm);
>   if (vm->evicting) {
>   r = -EBUSY;
> @@ -1648,23 +1641,17 @@ static int amdgpu_vm_bo_update_mapping(struct 
> amdgpu_device *adev,
>   if (r)
>   goto error_unlock;
>   
> - do {
> + amdgpu_res_first(res, offset, (last - start + 1) * AMDGPU_GPU_PAGE_SIZE,
> +  &cursor);
> + while (cursor.remaining) {
>   uint64_t tmp, num_entries, addr;
>   
> -
> - num_entries = last - start + 1;
> - if (nodes) {
> - addr = nodes->start << PAGE_SHIFT;
> - num_entries = min((nodes->size - pfn) *
> - AMDGPU_GPU_PAGES_IN_CPU_PAGE, num_entries);
> - } else {
> - addr = 0;
> - }
> -
> + num_entries = cursor.size >> AMDGPU_GPU_PAGE_SHIFT;
>   if (pages_addr) {
>   bool contiguous = true;
>   
>   if (num_entries > AMDGPU_GPU

RE: [PATCH] drm/amd/amdgpu: set MP1 state to UNLOAD before reload its FW for vega20

2021-03-28 Thread Chen, Guchun

[AMD Public Use]

It's better to add below error info in commit message for audience's 
understanding.

[  121.642772] [drm] reserve 0x40 from 0x87fec0 for PSP TMR
[  123.801051] [drm] failed to load ucode id (24) 
[  123.801055] [drm] psp command (0x6) failed and response status is (0x0)
[  123.801214] [drm:psp_load_smu_fw [amdgpu]] *ERROR* PSP load smu failed!
[  123.801398] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
[  123.801536] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP 
block  failed -22
[  123.801632] amdgpu :04:00.0: amdgpu: GPU reset(9) failed
[  123.801691] amdgpu :07:00.0: amdgpu: GPU reset(9) failed
[  123.802899] amdgpu :04:00.0: amdgpu: GPU reset end with ret = -22

With above added, the patch is:
Reviewed-and-tested-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Chengming Gui  
Sent: Monday, March 29, 2021 11:39 AM
To: amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Quan, Evan ; Long, 
Gang ; Gui, Jack 
Subject: [PATCH] drm/amd/amdgpu: set MP1 state to UNLOAD before reload its FW 
for vega20

When resume from gpu reset, need set MP1 state to UNLOAD before reload SMU FW

Signed-off-by: Chengming Gui 
Change-Id: I54c2accab58d53a2780d10720f26a717bf1ff130
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 60dbb8c1e74d..aa16bc292a16 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -2148,7 +2148,8 @@ static int psp_load_smu_fw(struct psp_context *psp)
 
if ((amdgpu_in_reset(adev) &&
 ras && ras->supported &&
-adev->asic_type == CHIP_ARCTURUS) ||
+(adev->asic_type == CHIP_ARCTURUS ||
+ adev->asic_type == CHIP_VEGA20)) ||
 (adev->in_runpm &&
  adev->asic_type >= CHIP_NAVI10 &&
  adev->asic_type <= CHIP_NAVI12)) {
--
2.17.1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: Reset error code for 'no handler' case

2021-03-28 Thread Chen, Guchun

[AMD Public Use]

Reviewed-and-tested-by: Guchun Chen 
guchun.c...@amd.com<mailto:guchun.c...@amd.com>

Regards,
Guchun

From: Lazar, Lijo 
Sent: Monday, March 29, 2021 12:04 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Xu, Feifei ; 
Chen, Guchun 
Subject: [PATCH] drm/amdgpu: Reset error code for 'no handler' case


[AMD Public Use]

If reset handler is not implemented, reset error before proceeding.

Fixes issue with the following trace -
[  106.508592] amdgpu :b1:00.0: amdgpu: ASIC reset failed with error, -38 
for drm dev, :b1:00.0
[  106.508972] amdgpu :b1:00.0: amdgpu: GPU reset succeeded, trying to 
resume
[  106.509116] [drm] PCIE GART of 512M enabled.
[  106.509120] [drm] PTB located at 0x0080
[  106.509136] [drm] VRAM is lost due to GPU reset!
[  106.509332] [drm] PSP is resuming...

Signed-off-by: Lijo Lazar lijo.la...@amd.com<mailto:lijo.la...@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 319d69646a13..a501d1a4d000 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4281,7 +4281,10 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
   drm_sched_increase_karma(&job->base);

r = amdgpu_reset_prepare_hwcontext(adev, reset_context);
-  if (r != -ENOSYS)
+ /* If reset handler not implemented, continue; otherwise return */
+ if (r == -ENOSYS)
+ r = 0;
+ else
   return r;

/* Don't suspend on bare metal if we are not going to HW reset 
the ASIC */
@@ -4323,8 +4326,10 @@ int amdgpu_do_asic_reset(struct list_head 
*device_list_handle,
   tmp_adev = list_first_entry(device_list_handle, struct 
amdgpu_device,
   reset_list);
   r = amdgpu_reset_perform_reset(tmp_adev, reset_context);
-
-  if (r != -ENOSYS)
+ /* If reset handler not implemented, continue; otherwise return */
+ if (r == -ENOSYS)
+ r = 0;
+ else
   return r;

/* Reset handler not implemented, use the default method */
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 3/6] drm/amdgpu: Restore msix after FLR

2021-03-29 Thread Chen, Guchun

[AMD Public Use]

amdgpu_irq_restore_msix should be one static function?

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Emily Deng
Sent: Tuesday, March 30, 2021 12:42 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily 
Subject: [PATCH 3/6] drm/amdgpu: Restore msix after FLR

From: "Emily.Deng" 

After FLR, the msix will be cleared, so need to re-enable it.

v2:
Change name with amdgpu_irq prefix, remove #ifdef.

Signed-off-by: Emily.Deng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 03412543427a..8936589bd7f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -277,6 +277,17 @@ static bool amdgpu_msi_ok(struct amdgpu_device *adev)
return true;
 }
 
+void amdgpu_irq_restore_msix(struct amdgpu_device *adev) {
+   u16 ctrl;
+
+   pci_read_config_word(adev->pdev, adev->pdev->msix_cap + PCI_MSIX_FLAGS, 
&ctrl);
+   ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
+   pci_write_config_word(adev->pdev, adev->pdev->msix_cap + 
PCI_MSIX_FLAGS, ctrl);
+   ctrl |= PCI_MSIX_FLAGS_ENABLE;
+   pci_write_config_word(adev->pdev, adev->pdev->msix_cap + 
+PCI_MSIX_FLAGS, ctrl); }
+
 /**
  * amdgpu_irq_init - initialize interrupt handling
  *
@@ -558,6 +569,7 @@ void amdgpu_irq_gpu_reset_resume_helper(struct 
amdgpu_device *adev)  {
int i, j, k;
 
+   amdgpu_irq_restore_msix(adev);
for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
if (!adev->irq.client[i].sources)
continue;
--
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cguchun.chen%40amd.com%7C6aff296c96104aef176208d8f3362acf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637526761267513989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BG4P%2FbJmn8PiLR%2BxTys8cVWK6924LWftjTXjKqrgnkg%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 2/2] drm/amdgpu: fix compiler warning

2021-03-30 Thread Chen, Guchun

[AMD Public Use]

Inline comments after yours'.

Regards,
Guchun

-Original Message-
From: Koenig, Christian  
Sent: Tuesday, March 30, 2021 6:40 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Zhang, 
Hawking 
Subject: Re: [PATCH 2/2] drm/amdgpu: fix compiler warning

Am 30.03.21 um 12:02 schrieb Guchun Chen:
> warning: ISO C90 forbids mixed declarations and code 
> [-Wdeclaration-after-statement]
>int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY);

Well there seems to be some kind of bug in the compiler if it complains about 
the code below.
[Guchun]From linux coding style's perspective, we shall put the declarations 
together, separated from code by one blank line, right?

>
> Signed-off-by: Guchun Chen 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 --
>   1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 722efd86718e..2a6fc0556386 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -824,7 +824,6 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_device 
> *bdev,
>   struct amdgpu_device *adev = amdgpu_ttm_adev(bdev);
>   struct amdgpu_ttm_tt *gtt = (void *)ttm;
>   int r;
> -

Better have variable like "r" and "i" declared last.
[Guchun]Will send v2 to address this if you don't have objection to this patch.

Christian.

>   int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY);
>   enum dma_data_direction direction = write ?
>   DMA_BIDIRECTIONAL : DMA_TO_DEVICE; @@ -861,7 +860,6 @@ static 
> void 
> amdgpu_ttm_tt_unpin_userptr(struct ttm_device *bdev,
>   {
>   struct amdgpu_device *adev = amdgpu_ttm_adev(bdev);
>   struct amdgpu_ttm_tt *gtt = (void *)ttm;
> -
>   int write = !(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY);
>   enum dma_data_direction direction = write ?
>   DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 1/2] drm/amdgpu: fix NULL pointer dereference

2021-03-30 Thread Chen, Guchun

[AMD Public Use]

Thanks Christian, I will put laser focus on this patch after merging it.

I notice the same logic in radeon code radeon_ttm_tt_unpin_userptr. Shall I 
create another patch to fix it as well?

Regards,
Guchun

-Original Message-
From: Christian König  
Sent: Tuesday, March 30, 2021 6:39 PM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Koenig, 
Christian ; Zhang, Hawking 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix NULL pointer dereference

Am 30.03.21 um 12:02 schrieb Guchun Chen:
> ttm->sg needs to be checked before accessing its child member.
>
> Call Trace:
>   amdgpu_ttm_backend_destroy+0x12/0x70 [amdgpu]
>   ttm_bo_cleanup_memtype_use+0x3a/0x60 [ttm]
>   ttm_bo_release+0x17d/0x300 [ttm]
>   amdgpu_bo_unref+0x1a/0x30 [amdgpu]
>   amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x78b/0x8b0 [amdgpu]
>   kfd_ioctl_alloc_memory_of_gpu+0x118/0x220 [amdgpu]
>   kfd_ioctl+0x222/0x400 [amdgpu]
>   ? kfd_dev_is_large_bar+0x90/0x90 [amdgpu]
>   __x64_sys_ioctl+0x8e/0xd0
>   ? __context_tracking_exit+0x52/0x90
>   do_syscall_64+0x33/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f97f264d317
> Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff 
> ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 
> 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
> RSP: 002b:7ffdb402c338 EFLAGS: 0246 ORIG_RAX: 0010
> RAX: ffda RBX: 7f97f3cc63a0 RCX: 7f97f264d317
> RDX: 7ffdb402c380 RSI: c0284b16 RDI: 0003
> RBP: 7ffdb402c380 R08: 7ffdb402c428 R09: c404
> R10: c404 R11: 0246 R12: c0284b16
> R13: 0003 R14: 7f97f3cc63a0 R15: 7f883620
>
> Signed-off-by: Guchun Chen 

Yeah I had this one on my TODO list as well.

For now the patch is Acked-by: Christian König , but 
I'm not 100% sure if this is the right fix.

Please keep an eye open if anybody complains about issues with this patch, if 
yes we need to get back to the drawing board.

Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index e00263bcc88b..722efd86718e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -867,7 +867,7 @@ static void amdgpu_ttm_tt_unpin_userptr(struct ttm_device 
> *bdev,
>   DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
>   
>   /* double check that we don't free the table twice */
> - if (!ttm->sg->sgl)
> + if (!ttm->sg || !ttm->sg->sgl)
>   return;
>   
>   /* unmap the pages mapped to the device */
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 4/4] drm/amdgpu: indirect register access for nv12 sriov

2021-04-04 Thread Chen, Guchun

[AMD Public Use]

Hi Peng Ju,

Patch 4 breaks the driver modprobe sequence for the ASICs with GFX IP v9.0. The 
modification in WREG32_RLC will route to one different path for GFX v9. Please 
check it.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Deng, Emily
Sent: Thursday, April 1, 2021 2:01 PM
To: Zhou, Peng Ju ; amd-gfx@lists.freedesktop.org
Cc: Zhao, Jiange 
Subject: RE: [PATCH 4/4] drm/amdgpu: indirect register access for nv12 sriov

[AMD Official Use Only - Internal Distribution Only]

[AMD Official Use Only - Internal Distribution Only]

Series Reviewed-by: Emily.Deng 

>-Original Message-
>From: amd-gfx  On Behalf Of Peng 
>Ju Zhou
>Sent: Wednesday, March 31, 2021 1:20 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Zhao, Jiange 
>Subject: [PATCH 4/4] drm/amdgpu: indirect register access for nv12 
>sriov
>
>1. expand rlcg interface for gc & mmhub indirect access 2. add rlcg 
>interface for no kiq
>
>Signed-off-by: Peng Ju Zhou 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h|   3 +-
> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 131 ++---
> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  |   2 +-
> drivers/gpu/drm/amd/amdgpu/soc15_common.h  |  75 ++--
> 5 files changed, 150 insertions(+), 63 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 060d0ae99453..438e2f732377 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -490,7 +490,7 @@ void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device 
>*adev,
> adev->gfx.rlc.funcs &&
> adev->gfx.rlc.funcs->is_rlcg_access_range) {  if 
>(adev->gfx.rlc.funcs->is_rlcg_access_range(adev, reg)) -return 
>adev->gfx.rlc.funcs->rlcg_wreg(adev, reg, v);
>+return adev->gfx.rlc.funcs->rlcg_wreg(adev, reg, v, 0);
> } else {
> writel(v, ((void __iomem *)adev->rmmio) + (reg * 4));  } diff --git 
>a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
>index aeaaae713c59..4fc2ce8ce8ab 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
>@@ -127,7 +127,8 @@ struct amdgpu_rlc_funcs {  void (*reset)(struct 
>amdgpu_device *adev);  void (*start)(struct amdgpu_device *adev);  void 
>(*update_spm_vmid)(struct amdgpu_device *adev, unsigned vmid); -void 
>(*rlcg_wreg)(struct amdgpu_device *adev, u32 offset, u32 v);
>+void (*rlcg_wreg)(struct amdgpu_device *adev, u32 offset, u32 v, u32
>flag);
>+u32 (*rlcg_rreg)(struct amdgpu_device *adev, u32 offset, u32 flag);
> bool (*is_rlcg_access_range)(struct amdgpu_device *adev, uint32_t 
>reg);  };
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>index b4fd0394cd08..85a6a10e048f 100644
>--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>@@ -177,6 +177,11 @@
> #define mmGC_THROTTLE_CTRL_Sienna_Cichlid  0x2030
> #define mmGC_THROTTLE_CTRL_Sienna_Cichlid_BASE_IDX 0
>
>+#define GFX_RLCG_GC_WRITE_OLD(0x8 << 28) #define GFX_RLCG_GC_WRITE(0x0 
>+<< 28) #define GFX_RLCG_GC_READ(0x1 << 28) #define 
>+GFX_RLCG_MMHUB_WRITE(0x2 << 28)
>+
> MODULE_FIRMWARE("amdgpu/navi10_ce.bin");
> MODULE_FIRMWARE("amdgpu/navi10_pfp.bin");
> MODULE_FIRMWARE("amdgpu/navi10_me.bin");
>@@ -1422,38 +1427,127 @@ static const struct soc15_reg_golden 
>golden_settings_gc_10_1_2[] =  SOC15_REG_GOLDEN_VALUE(GC, 0, 
>mmUTCL1_CTRL, 0x,
>0x0080)  };
>
>-static void gfx_v10_rlcg_wreg(struct amdgpu_device *adev, u32 offset, 
>u32 v)
>+static bool gfx_v10_is_rlcg_rw(struct amdgpu_device *adev, u32 offset, 
>+uint32_t *flag, bool write) {
>+/* always programed by rlcg, only for gc */ if (offset == 
>+SOC15_REG_OFFSET(GC, 0, mmRLC_CSIB_ADDR_HI) ||
>+offset == SOC15_REG_OFFSET(GC, 0, mmRLC_CSIB_ADDR_LO) ||
>+offset == SOC15_REG_OFFSET(GC, 0, mmRLC_CSIB_LENGTH) ||
>+offset == SOC15_REG_OFFSET(GC, 0, mmGRBM_GFX_CNTL) ||
>+offset == SOC15_REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX) ||
>+offset == SOC15_REG_OFFSET(GC, 0, mmCP_ME_CNTL)) { if 
>+(!amdgpu_sriov_reg_indirect_gc(adev))
>+*flag = GFX_RLCG_GC_WRITE_OLD;
>+else
>+*flag = write ? GFX_RLCG_GC_WRITE :
>GFX_RLCG_GC_READ;
>+
>+return true;
>+}
>+
>+/* currently support gc read/write, mmhub write */ if (offset >= 
>+SOC15_REG_OFFSET(GC, 0, mmSDMA0_DEC_START) &&
>+offset <= SOC15_REG_OFFSET(GC, 0, mmRLC_GTS_OFFSET_MSB)) { if 
>+(amdgpu_sriov_reg_indirect_gc(adev))
>+*flag = write ? GFX_RLCG_GC_WRITE :
>GFX_RLCG_GC_READ;
>+else
>+return false;
>+} else {
>+if (amdgpu_sriov_reg_indirect_mmhub(adev))
>+*flag = GFX_RLCG_MMHUB_WRITE;
>+else
>+return false;
>+}
>+
>+return true;
>+}
>+
>+static u32 gfx_v10_rlcg_rw(struct amdgpu_device *adev, u32 offset, u32 
>+v, uint32_t flag)
> {
> static void *scratch_reg0;
> static void *scratch_reg1;
>+static void *scratch_reg2;
>+static void *scratch_reg3;
>

RE: [pull] amdgpu, radeon, ttm, sched drm-next-5.13

2021-04-07 Thread Chen, Guchun

[AMD Public Use]

Hi Felix and Christian,

If the regression you are talking about is the NULL pointer problem when 
running KFD tests, it should fixed by below patch in this series.

drm/amdgpu: fix NULL pointer dereference

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Wednesday, April 7, 2021 2:57 PM
To: Kuehling, Felix ; Deucher, Alexander 
; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org; airl...@gmail.com; daniel.vet...@ffwll.ch
Subject: Re: [pull] amdgpu, radeon, ttm, sched drm-next-5.13

Am 06.04.21 um 17:42 schrieb Felix Kuehling:
> Am 2021-04-01 um 6:29 p.m. schrieb Alex Deucher:
>> Hi Dave, Daniel,
>>
>> New stuff for 5.13.  There are two small patches for ttm and 
>> scheduler that were dependencies for amdgpu changes.
>>
>> The following changes since commit 2cbcb78c9ee5520c8d836c7ff57d1b60ebe8e9b7:
>>
>>Merge tag 'amd-drm-next-5.13-2021-03-23' of 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
>> lab.freedesktop.org%2Fagd5f%2Flinux&data=04%7C01%7Cguchun.chen%40
>> amd.com%7C51d1cbcf7ccc43854abb08d8f99250d8%7C3dd8961fe4884e608e11a82d
>> 994e183d%7C0%7C0%7C637533754128113017%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
>> iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&am
>> p;sdata=FcdoL9w5LhBZ849ctXPudr%2BBQnnm7Oiq3pz5X7LGGk4%3D&reserved
>> =0 into drm-next (2021-03-26 15:53:21 +0100)
>>
>> are available in the Git repository at:
>>
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
>> lab.freedesktop.org%2Fagd5f%2Flinux.git&data=04%7C01%7Cguchun.che
>> n%40amd.com%7C51d1cbcf7ccc43854abb08d8f99250d8%7C3dd8961fe4884e608e11
>> a82d994e183d%7C0%7C0%7C637533754128113017%7CUnknown%7CTWFpbGZsb3d8eyJ
>> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100
>> 0&sdata=N4JIk%2BEgzleaKYaxvdtT7TR1ZsS6FGsIGpDDUqiQiLw%3D&rese
>> rved=0 tags/amd-drm-next-5.13-2021-04-01
>>
>> for you to fetch changes up to ef95d2a98d642a537190d73c45ae3c308afee890:
>>
>>drm/amdgpu/display: fix warning on 32 bit in dmub (2021-04-01 
>> 17:32:32 -0400)
>>
>> 
>> amd-drm-next-5.13-2021-04-01:
>>
>> amdgpu:
>> - Re-enable GPU reset on VanGogh
>> - Enable DPM flags for SMART_SUSPEND and MAY_SKIP_RESUME
>> - Disentangle HG from vga_switcheroo
>> - S0ix fixes
>> - W=1 fixes
>> - Resource iterator fixes
>> - DMCUB updates
>> - UBSAN fixes
>> - More PM API cleanup
>> - Aldebaran updates
>> - Modifier fixes
>> - Enable VCN load balancing with asymmetric engines
>> - Rework BO structs
>> - Aldebaran reset support
>> - Initial LTTPR display work
>> - Display MALL fixes
>> - Fall back to YCbCr420 when YCbCr444 fails
>> - SR-IOV fixes
>> - Misc cleanups and fixes
>>
>> radeon:
>> - Typo fixes
>>
>> ttm:
>> - Handle cached requests (required for Aldebaran)
>>
>> scheduler:
>> - Fix runqueue selection when changing priorities (required to fix VCN
>>load balancing)
>>
>> 
>> Alex Deucher (20):
>>drm/amdgpu/display/dm: add missing parameter documentation
>>drm/amdgpu: Add additional Sienna Cichlid PCI ID
>>drm/amdgpu: add a dev_pm_ops prepare callback (v2)
>>drm/amdgpu: enable DPM_FLAG_MAY_SKIP_RESUME and 
>> DPM_FLAG_SMART_SUSPEND flags (v2)
>>drm/amdgpu: disentangle HG systems from vgaswitcheroo
>>drm/amdgpu: rework S3/S4/S0ix state handling
>>drm/amdgpu: don't evict vram on APUs for suspend to ram (v4)
>>drm/amdgpu: clean up non-DC suspend/resume handling
>>drm/amdgpu: move s0ix check into amdgpu_device_ip_suspend_phase2 (v3)
>>drm/amdgpu: re-enable suspend phase 2 for S0ix
>>drm/amdgpu/swsmu: skip gfx cgpg on s0ix suspend
>>drm/amdgpu: update comments about s0ix suspend/resume
>>drm/amdgpu: drop S0ix checks around CG/PG in suspend
>>drm/amdgpu: skip kfd suspend/resume for S0ix
>>drm/amdgpu/display: restore AUX_DPHY_TX_CONTROL for DCN2.x
>>drm/amdgpu/display: fix memory leak for dimgrey cavefish
>>drm/amdgpu/pm: mark pcie link/speed arrays as const
>>drm/amdgpu/pm: bail on sysfs/debugfs queries during platform suspend
>>drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in 
>> suspend
>>drm/amdgpu/display: fix warning on 32 bit in dmub
>>
>> Alex Sierra (2):
>>drm/amdgpu: replace per_device_list by array
>>drm/amdgpu: ih reroute for newer asics than vega20
>>
>> Alvin Lee (1):
>>drm/amd/display: Change input parameter for set_drr
>>
>> Anson Jacob (2):
>>drm/amd/display: Fix UBSAN: shift-out-of-bounds warning
>>drm/amd/display: Removing unused code from dmub_cmd.h
>>
>> Anthony Koo (2):
>>drm/amd/display: [FW Promotion] Release 0.0.57
>>drm/amd/display: [FW Promotion] Release 0.0.58
>>
>> Aric Cyr (2):
>>drm/amd/display: 3.2.128

RE: [PATCH 2/8] drm/amdgpu: Change GC register access from MMIO to RLCG

2021-04-08 Thread Chen, Guchun

[AMD Public Use]

Hi Peng Ju,

Before merging your patches, it's suggested to conduct a full test in BM mode 
as well to avoid regression, as register access is changed.

Another problem is, it seems the subject of patch 2, 4 and 5 is the same. Can 
you please modify it respectively a bit to be more specific?

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Peng Ju Zhou
Sent: Thursday, April 8, 2021 1:33 PM
To: amd-gfx@lists.freedesktop.org
Cc: Jian, Jane 
Subject: [PATCH 2/8] drm/amdgpu: Change GC register access from MMIO to RLCG

In SRIOV environment, KMD should access GC registers with RLCG if GC indirect 
access flag enabled.

Change GC register access from MMIO to RLCG.

Signed-off-by: Peng Ju Zhou 
---
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  38 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c| 205 +-
 drivers/gpu/drm/amd/amdgpu/nv.c   |   2 +-
 3 files changed, 124 insertions(+), 121 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index 62aa1a6f64ed..9394dbf504de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -96,8 +96,8 @@ static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, 
uint32_t vmid,
 
lock_srbm(kgd, 0, 0, 0, vmid);
 
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSH_MEM_CONFIG), sh_mem_config);
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmSH_MEM_BASES), sh_mem_bases);
+   WREG32_RLC(SOC15_REG_OFFSET(GC, 0, mmSH_MEM_CONFIG), sh_mem_config);
+   WREG32_RLC(SOC15_REG_OFFSET(GC, 0, mmSH_MEM_BASES), sh_mem_bases);
/* APE1 no longer exists on GFX9 */
 
unlock_srbm(kgd);
@@ -161,7 +161,7 @@ static int kgd_init_interrupts(struct kgd_dev *kgd, 
uint32_t pipe_id)
 
lock_srbm(kgd, mec, pipe, 0, 0);
 
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCPC_INT_CNTL),
+   WREG32_RLC(SOC15_REG_OFFSET(GC, 0, mmCPC_INT_CNTL),
CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK |
CP_INT_CNTL_RING0__OPCODE_ERROR_INT_ENABLE_MASK);
 
@@ -245,7 +245,7 @@ static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, 
uint32_t pipe_id,
/* Activate doorbell logic before triggering WPTR poll. */
data = REG_SET_FIELD(m->cp_hqd_pq_doorbell_control,
 CP_HQD_PQ_DOORBELL_CONTROL, DOORBELL_EN, 1);
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), data);
+   WREG32_RLC(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), 
+data);
 
if (wptr) {
/* Don't read wptr with get_user because the user @@ -274,17 
+274,17 @@ static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t 
pipe_id,
guessed_wptr += m->cp_hqd_pq_wptr_lo & ~(queue_size - 1);
guessed_wptr += (uint64_t)m->cp_hqd_pq_wptr_hi << 32;
 
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_LO),
+   WREG32_RLC(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_LO),
   lower_32_bits(guessed_wptr));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI),
+   WREG32_RLC(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI),
   upper_32_bits(guessed_wptr));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
+   WREG32_RLC(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
   lower_32_bits((uint64_t)wptr));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
+   WREG32_RLC(SOC15_REG_OFFSET(GC, 0, 
mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
   upper_32_bits((uint64_t)wptr));
pr_debug("%s setting CP_PQ_WPTR_POLL_CNTL1 to %x\n", __func__,
 (uint32_t)get_queue_mask(adev, pipe_id, queue_id));
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_PQ_WPTR_POLL_CNTL1),
+   WREG32_RLC(SOC15_REG_OFFSET(GC, 0, mmCP_PQ_WPTR_POLL_CNTL1),
   (uint32_t)get_queue_mask(adev, pipe_id, queue_id));
}
 
@@ -294,7 +294,7 @@ static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, 
uint32_t pipe_id,
 CP_HQD_EOP_RPTR, INIT_FETCHER, 1));
 
data = REG_SET_FIELD(m->cp_hqd_active, CP_HQD_ACTIVE, ACTIVE, 1);
-   WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE), data);
+   WREG32_RLC(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE), data);
 
release_queue(kgd);
 
@@ -497,13 +497,13 @@ static bool kgd_hqd_is_occupied(struct kgd_dev *kgd, 
uint64_t queue_address,
uint32_t low, high;
 
acquire_queue(kgd, pipe_id, queue_id);
-   act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
+   act = RREG32_RLC(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
if (act) {
low = lower_32_bits(queue_address >> 8);
high = upper_32_bits(queue_address >> 8);
 
-

RE: [PATCH] drm/amd/pm: enable ASPM on navi1x

2021-04-08 Thread Chen, Guchun

[AMD Public Use]

* The ASPM function is not fully enabled and verified on
 * Navi yet. Temporarily skip this until ASPM enabled.
 */

The comments needs to be adjusted as well?

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Kenneth Feng
Sent: Thursday, April 8, 2021 5:33 PM
To: amd-gfx@lists.freedesktop.org
Cc: Feng, Kenneth 
Subject: [PATCH] drm/amd/pm: enable ASPM on navi1x

ASPM can be verified funtionally on navi1x.
And can be enabled for the benefit of the power consumption without the 
performance hurt.

Signed-off-by: Kenneth Feng 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c 
index 46d4bbabce75..5edab56c6ab0 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -601,8 +601,7 @@ static void nv_program_aspm(struct amdgpu_device *adev)
if (amdgpu_aspm != 1)
return;
 
-   if ((adev->asic_type >= CHIP_SIENNA_CICHLID) &&
-   !(adev->flags & AMD_IS_APU) &&
+   if (!(adev->flags & AMD_IS_APU) &&
(adev->nbio.funcs->program_aspm))
adev->nbio.funcs->program_aspm(adev);
 
@@ -938,8 +937,7 @@ static int nv_update_umd_stable_pstate(struct amdgpu_device 
*adev,
 * The ASPM function is not fully enabled and verified on
 * Navi yet. Temporarily skip this until ASPM enabled.
 */
-   if ((adev->asic_type >= CHIP_SIENNA_CICHLID) &&
-   !(adev->flags & AMD_IS_APU) &&
+   if (!(adev->flags & AMD_IS_APU) &&
(adev->nbio.funcs->enable_aspm))
adev->nbio.funcs->enable_aspm(adev, !enter);
 
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Cguchun.chen%40amd.com%7Ceb6ca4353559489ae74c08d8fa713a3f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534711508933688%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BtDRLzFCbPb5S6aC1pYNULjoSLy9jemx1QLUzq0CoOg%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: avoid undefined return value

2021-05-10 Thread Chen, Guchun

[AMD Public Use]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Shi, Leslie  
Sent: Monday, May 10, 2021 5:56 PM
To: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
; Chen, Guchun 
Subject: [PATCH] drm/amdgpu: avoid undefined return value

Fixes: a7c22df2fd07 ("drm/amdgpu: clean up non-DC suspend/resume handling")

Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 7d4af8fc7e97..f3b2762f6f53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -1554,7 +1554,7 @@ int amdgpu_display_suspend_helper(struct amdgpu_device 
*adev)
struct drm_crtc *crtc;
struct drm_connector *connector;
struct drm_connector_list_iter iter;
-   int r;
+   int r = 0;
 
/* turn off display hw */
drm_modeset_lock_all(dev);
-- 
2.25.1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 1/2] drm/amdgpu: add judgement when add ip blocks

2021-05-10 Thread Chen, Guchun

[AMD Public Use]

The series look good to me.

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Gao, Likun  
Sent: Tuesday, May 11, 2021 11:52 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Chen, Guchun 
; Song, Asher ; Gao, Likun 

Subject: [PATCH 1/2] drm/amdgpu: add judgement when add ip blocks

From: Likun GAO 

Judgement whether to add an sw ip according to the harvest info.

Signed-off-by: Likun Gao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 15 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 30 +++  
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h |  1 +
 drivers/gpu/drm/amd/amdgpu/nv.c   |  8 -
 drivers/gpu/drm/amd/include/amd_shared.h  |  6 
 6 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 10d9a8a237fd..3147c1c935c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1010,6 +1010,7 @@ struct amdgpu_device {
struct amdgpu_dfdf;
 
struct amdgpu_ip_block  ip_blocks[AMDGPU_MAX_IP_NUM];
+   uint32_tharvest_ip_mask;
int num_ip_blocks;
struct mutexmn_lock;
DECLARE_HASHTABLE(mn_hash, 7);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b0543f409039..6881015f40be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1683,6 +1683,19 @@ int amdgpu_device_ip_block_add(struct amdgpu_device 
*adev,
if (!ip_block_version)
return -EINVAL;
 
+   switch (ip_block_version->type) {
+   case AMD_IP_BLOCK_TYPE_VCN:
+   if (adev->harvest_ip_mask & AMD_HARVEST_IP_VCN_MASK)
+   return 0;
+   break;
+   case AMD_IP_BLOCK_TYPE_JPEG:
+   if (adev->harvest_ip_mask & AMD_HARVEST_IP_JPEG_MASK)
+   return 0;
+   break;
+   default:
+   break;
+   }
+
DRM_INFO("add ip block number %d <%s>\n", adev->num_ip_blocks,
  ip_block_version->funcs->name);
 
@@ -3111,7 +3124,6 @@ bool amdgpu_device_has_dc_support(struct amdgpu_device 
*adev)
return amdgpu_device_asic_has_dc_support(adev->asic_type);
 }
 
-
 static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)  {
struct amdgpu_device *adev =
@@ -3274,6 +3286,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
adev->vm_manager.vm_pte_funcs = NULL;
adev->vm_manager.vm_pte_num_scheds = 0;
adev->gmc.gmc_funcs = NULL;
+   adev->harvest_ip_mask = 0x0;
adev->fence_context = dma_fence_context_alloc(AMDGPU_MAX_RINGS);
bitmap_zero(adev->gfx.pipe_reserve_bitmap, AMDGPU_MAX_COMPUTE_QUEUES);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index b2dbcb4df020..99255c2f08f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -373,6 +373,36 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device 
*adev, int hw_id,
return -EINVAL;
 }
 
+void amdgpu_discovery_harvest_ip(struct amdgpu_device *adev) {
+   struct binary_header *bhdr;
+   struct harvest_table *harvest_info;
+   int i;
+
+   bhdr = (struct binary_header *)adev->mman.discovery_bin;
+   harvest_info = (struct harvest_table *)(adev->mman.discovery_bin +
+   le16_to_cpu(bhdr->table_list[HARVEST_INFO].offset));
+
+   for (i = 0; i < 32; i++) {
+   if (le32_to_cpu(harvest_info->list[i].hw_id) == 0)
+   break;
+
+   switch (le32_to_cpu(harvest_info->list[i].hw_id)) {
+   case VCN_HWID:
+   adev->harvest_ip_mask |= 
AMD_HARVEST_IP_VCN_MASK;
+   adev->harvest_ip_mask |= 
AMD_HARVEST_IP_JPEG_MASK;
+   break;
+   case DMU_HWID:
+   adev->harvest_ip_mask |= 
AMD_HARVEST_IP_DMU_MASK;
+   break;
+   default:
+   break;
+   }
+   }
+
+   return;
+}
+
 int amdgpu_discovery_get_gfx_info(struct amdgpu_device *adev)  {
struct binary_header *bhdr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
index 8f6183801cb3..1b1ae21b1037 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.h
+++ b/drivers/gp

RE: [PATCH 2/3] drm/amd/pm: Fix showing incorrect frequencies on aldebaran

2021-05-13 Thread Chen, Guchun

[AMD Public Use]

3 nit-picks inline.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Lijo Lazar
Sent: Thursday, May 13, 2021 5:48 PM
To: amd-gfx@lists.freedesktop.org
Cc: Wang, Kevin(Yang) ; Feng, Kenneth 
; Zhang, Hawking 
Subject: [PATCH 2/3] drm/amd/pm: Fix showing incorrect frequencies on aldebaran


Use the current and custom pstate frequencies to track the current and user-set 
min/max values in manual and determinism mode. Previously, only
actual_* value was used to track the currrent and user requested value.
The value will get reassigned whenever user requests a new value with 
pp_od_clk_voltage node. Hence it will show incorrect values when user requests 
an invalid value or tries a partial request without committing the values. 
Separating out to custom and current variable fixes such issues.

Signed-off-by: Lijo Lazar 
---
  .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 65 ---
  .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c| 18 -
  2 files changed, 55 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 5d04a1dfdfd8..d27ed2954705 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -78,8 +78,6 @@

  #define smnPCIE_ESM_CTRL  0x111003D0

-#define CLOCK_VALID (1 << 31)
-
  static const struct cmn2asic_msg_mapping 
aldebaran_message_map[SMU_MSG_MAX_COUNT] = {
MSG_MAP(TestMessage, PPSMC_MSG_TestMessage, 
0),
MSG_MAP(GetSmuVersion,   PPSMC_MSG_GetSmuVersion,   
1),
@@ -455,12 +453,18 @@ static int aldebaran_populate_umd_state_clk(struct
smu_context *smu)

pstate_table->gfxclk_pstate.min = gfx_table->min;
pstate_table->gfxclk_pstate.peak = gfx_table->max;
+   pstate_table->gfxclk_pstate.curr.min = gfx_table->min;
+   pstate_table->gfxclk_pstate.curr.max = gfx_table->max;

pstate_table->uclk_pstate.min = mem_table->min;
pstate_table->uclk_pstate.peak = mem_table->max;
+   pstate_table->uclk_pstate.curr.min = mem_table->min;
+   pstate_table->uclk_pstate.curr.max = mem_table->max;

pstate_table->socclk_pstate.min = soc_table->min;
pstate_table->socclk_pstate.peak = soc_table->max;
+   pstate_table->socclk_pstate.curr.min = soc_table->min;
+   pstate_table->socclk_pstate.curr.max = soc_table->max;

if (gfx_table->count > ALDEBARAN_UMD_PSTATE_GFXCLK_LEVEL &&
mem_table->count > ALDEBARAN_UMD_PSTATE_MCLK_LEVEL && @@ -669,6 
+673,7 @@ static int aldebaran_print_clk_levels(struct smu_context *smu,
  {
int i, now, size = 0;
int ret = 0;
+   struct smu_umd_pstate_table *pstate_table = &smu->pstate_table;
struct pp_clock_levels_with_latency clocks;
struct smu_13_0_dpm_table *single_dpm_table;
struct smu_dpm_context *smu_dpm = &smu->smu_dpm; @@ -703,12 +708,8 @@ 
static int aldebaran_print_clk_levels(struct smu_context *smu,

display_levels = clocks.num_levels;

-   min_clk = smu->gfx_actual_hard_min_freq & CLOCK_VALID ?
- smu->gfx_actual_hard_min_freq & ~CLOCK_VALID :
- single_dpm_table->dpm_levels[0].value;
-   max_clk = smu->gfx_actual_soft_max_freq & CLOCK_VALID ?
- smu->gfx_actual_soft_max_freq & ~CLOCK_VALID :
- single_dpm_table->dpm_levels[1].value;
+   min_clk = pstate_table->gfxclk_pstate.curr.min;
+   max_clk = pstate_table->gfxclk_pstate.curr.max;

freq_values[0] = min_clk;
freq_values[1] = max_clk;
@@ -1134,9 +1135,6 @@ static int aldebaran_set_performance_level(struct
smu_context *smu,
&& (level != AMD_DPM_FORCED_LEVEL_PERF_DETERMINISM))
smu_cmn_send_smc_msg(smu, SMU_MSG_DisableDeterminism, NULL);

-   /* Reset user min/max gfx clock */
-   smu->gfx_actual_hard_min_freq = 0;
-   smu->gfx_actual_soft_max_freq = 0;

switch (level) {

@@ -1163,6 +1161,7 @@ static int
aldebaran_set_soft_freq_limited_range(struct smu_context *smu,
  {
struct smu_dpm_context *smu_dpm = &(smu->smu_dpm);
struct smu_13_0_dpm_context *dpm_context = smu_dpm->dpm_context;
+   struct smu_umd_pstate_table *pstate_table = &smu->pstate_table;
struct amdgpu_device *adev = smu->adev;
uint32_t min_clk;
uint32_t max_clk;
@@ -1176,14 +1175,20 @@ static int
aldebaran_set_soft_freq_limited_range(struct smu_context *smu,
return -EINVAL;

if (smu_dpm->dpm_level == AMD_DPM_FORCED_LEVEL_MANUAL) {
-   min_clk = max(min, dpm_context->dpm_tables.gfx_table.min);
-   max_clk = min(max, dpm_context->dpm_tables.gfx_

RE: [PATCH] drm/amdgpu: add drm_dev_unplug() in GPU initialization failure to prevent crash

2021-12-15 Thread Chen, Guchun

[Public]

Hi Christian,

Your question is a really good one. The patch to unmap MMOI in such early phase 
is from Andrey's patch: drm/amdgpu: Unmap all MMIO mappings. It's a patch half 
a year ago, and everything looks fine till this case.

Regards,
Guchun

-Original Message-
From: Koenig, Christian  
Sent: Wednesday, December 15, 2021 7:00 PM
To: Shi, Leslie ; Grodzovsky, Andrey 
; Pan, Xinhui ; Deucher, 
Alexander ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun 
Subject: Re: [PATCH] drm/amdgpu: add drm_dev_unplug() in GPU initialization 
failure to prevent crash

Am 15.12.21 um 09:46 schrieb Leslie Shi:
> [Why]
> In amdgpu_driver_load_kms, when amdgpu_device_init returns error 
> during driver modprobe, it will start the error handle path 
> immediately and call into amdgpu_device_unmap_mmio as well to release 
> mapped VRAM. However, in the following release callback, driver stills visits 
> the unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. 
> So a kernel crash occurs.

Mhm, interesting workaround but I'm not sure that's the right thing to do.

Question is why are we unmapping the MMIO space on driver load failure so early 
in the first place? I mean don't we need to clean up a bit?

If that's really the way to go then we should at least add a comment explaining 
why it's done that way.

Regards,
Christian.

>
> [How]
> Add drm_dev_unplug() before executing amdgpu_driver_unload_kms to prevent 
> such crash.
> GPU initialization failure is somehow allowed, but a kernel crash in this 
> case should never happen.
>
> Signed-off-by: Leslie Shi 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 651c7abfde03..7bf6aecdbb92 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -268,6 +268,8 @@ int amdgpu_driver_load_kms(struct amdgpu_device *adev, 
> unsigned long flags)
>   /* balance pm_runtime_get_sync in amdgpu_driver_unload_kms */
>   if (adev->rmmio && adev->runpm)
>   pm_runtime_put_noidle(dev->dev);
> +
> + drm_dev_unplug(dev);
>   amdgpu_driver_unload_kms(dev);
>   }
>

RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure

2021-12-15 Thread Chen, Guchun

[Public]

Hi Leslie,

I think we need to modify it like:

+if (drm_dev_enter(adev_to_drm(adev), &idx)) {
+   amdgpu_device_unmap_mmio(adev);
+   drm_dev_exit(idx);
+}

Also you need to credit Andrey a 'suggested-by' in your patch.

Regards,
Guchun

-Original Message-
From: Shi, Leslie  
Sent: Thursday, December 16, 2021 2:14 PM
To: Grodzovsky, Andrey ; Koenig, Christian 
; Pan, Xinhui ; Deucher, 
Alexander ; amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Shi, Leslie 
Subject: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is 
unplugged to prevent crash in GPU initialization failure

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver 
modprobe, it will start the error handle path immediately and call into 
amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the 
following release callback, driver stills visits the unmapped memory like 
vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.

[How]
call amdgpu_device_unmap_mmio() iff device is unplugged to prevent invalid 
memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fb03d75880ec..d3656e7b60c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3845,6 +3845,8 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device 
*adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)  {
+   int idx;
+
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(&adev->delayed_init_work);
if (adev->mman.initialized) {
@@ -3888,7 +3890,11 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
amdgpu_gart_dummy_page_fini(adev);
 
-   amdgpu_device_unmap_mmio(adev);
+   if (!drm_dev_enter(adev_to_drm(adev), &idx))
+   amdgpu_device_unmap_mmio(adev);
+   else
+   drm_dev_exit(idx);
+
 }
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
--
2.25.1

1 2 3 4 5 6 >

1 - 100 of 587 matches

Mail list logo