Hi Andrey, I don't have any XGMI machines here, maybe you can reach out shaoyun for help.
On 2022/1/29 上午12:57, Grodzovsky, Andrey wrote: > Just a gentle ping. > > Andrey > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > *From:* Grodzovsky, Andrey > *Sent:* 26 January 2022 10:52 > *To:* Christian König <ckoenig.leichtzumer...@gmail.com>; Koenig, Christian > <christian.koe...@amd.com>; Lazar, Lijo <lijo.la...@amd.com>; > dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; > amd-...@lists.freedesktop.org <amd-...@lists.freedesktop.org>; Chen, JingWen > <jingwen.ch...@amd.com> > *Cc:* Chen, Horace <horace.c...@amd.com>; Liu, Monk <monk....@amd.com> > *Subject:* Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with > TDRs > > > JingWen - could you maybe give those patches a try on SRIOV XGMI system ? If > you see issues maybe you could let me connect and debug. My SRIOV XGMI system > which Shayun kindly arranged for me is not loading the driver with my > drm-misc-next branch even without my patches. > > Andrey > > On 2022-01-17 14:21, Andrey Grodzovsky wrote: >> >> >> On 2022-01-17 2:17 p.m., Christian König wrote: >>> Am 17.01.22 um 20:14 schrieb Andrey Grodzovsky: >>>> >>>> Ping on the question >>>> >>> >>> Oh, my! That was already more than a week ago and is completely swapped out >>> of my head again. >>> >>>> Andrey >>>> >>>> On 2022-01-05 1:11 p.m., Andrey Grodzovsky wrote: >>>>>>> Also, what about having the reset_active or in_reset flag in the >>>>>>> reset_domain itself? >>>>>> >>>>>> Of hand that sounds like a good idea. >>>>> >>>>> >>>>> What then about the adev->reset_sem semaphore ? Should we also move this >>>>> to reset_domain ? Both of the moves have functional >>>>> implications only for XGMI case because there will be contention over >>>>> accessing those single instance variables from multiple devices >>>>> while now each device has it's own copy. >>> >>> Since this is a rw semaphore that should be unproblematic I think. It could >>> just be that the cache line of the lock then plays ping/pong between the >>> CPU cores. >>> >>>>> >>>>> What benefit the centralization into reset_domain gives - is it for >>>>> example to prevent one device in a hive trying to access through MMIO >>>>> another one's >>>>> VRAM (shared FB memory) while the other one goes through reset ? >>> >>> I think that this is the killer argument for a centralized lock, yes. >> >> >> np, i will add a patch with centralizing both flag into reset domain and >> resend. >> >> Andrey >> >> >>> >>> Christian. >>> >>>>> >>>>> Andrey >>>