On Sun, Jul 30, 2023 at 10:30 PM Faith Ekstrand <fa...@gfxstrand.net> wrote:

>
> On Tue, Jul 25, 2023 at 5:48 PM Danilo Krummrich <d...@redhat.com> wrote:
>
>> On 7/25/23 18:43, Danilo Krummrich wrote:
>> > On 7/25/23 18:16, Faith Ekstrand wrote:
>> >> Thanks for the detailed write-up! That would definitely explain it. If
>> >> I remember, I'll try to do a single-threaded run or two. If your
>> >> theory is correct, there should be no real perf difference when
>> >> running single-threaded. Those runs will take a long time, though, so
>> >> I'll have to run them over night. I'll let you know in a few days once
>> >> I have the results.
>> >
>> > I can also push a separate branch where I just print out a warning
>> > whenever we run into such a condition including the time we were
>> waiting
>> > for things to complete. I can probably push something later today.
>>
>>
>> https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next-track-stalls
>>
>> It prints out the duration of every wait as well as the total wait time
>> since boot.
>>
>> - Danilo
>>
>> >
>> >>
>> >> If this theory holds, then I'm not concerned about the performance of
>> >> the API itself. It would still be good to see if we can find a way to
>> >> reduce the cross-process drag in the implementation but that's a perf
>> >> optimization we can do later.
>> >
>> >  From the kernel side I think the only thing we could really do is to
>> > temporarily run a secondary drm_gpu_scheduler instance, one for
>> VM_BINDs
>> > and one for EXECs until we got the new page table handling in place.
>> >
>> > However, the UMD could avoid such conditions more effectively, since it
>> > controls the address space. Namely, avoid re-using the same region of
>> > the address space right away in certain cases. For instance, instead of
>> > replacing a sparse region A[0x0, 0x4000000] with a larger sparse region
>> > B[0x0, 0x8000000], replace it with B'[0x4000000, 0xC000000] if possible.
>> >
>> > However, just mentioning this for completeness. The UMD surely
>> shouldn't
>> > probably even temporarily work around such a kernel limitation.
>> >
>> > Anyway, before doing any of those, let's see if the theory holds and
>> > we're actually running into such cases.
>> >
>> >>
>> >> Does it actually matter? Yes, it kinda does. No, it probably doesn't
>> >> matter for games because you're typically only running one game at a
>> >> time. From a development PoV, however, if it makes CI take longer then
>> >> that slows down development and that's not good for the users, either.
>>
>
> I've run a bunch of tests over the weekend and It's starting to look like
> the added CTS time might be from the newly enabled synchronization tests
> themselves.  We enabled timeline semaphores as well as semaphore and fence
> sharing along with the new uAPI and I did not expect those to be nearly
> that heavy-weight so I didn't think of that as a likely factor. I'm doing a
> couple more runs to confirm but what I'm seeing right now seems to indicate
> that the new API with the old feature set has about the same run time now
> that the submit overhead issue has been fixed.
>

My last two runs have come back and confirmed this theory. With the BO
fixes, all remaining slow-downs are coming from newly added tests which
turn out to be very slow tests.

~Faith


> I think this means that we can proceed under the assumption that there are
> no more serious perf issues with the new API.  I'm happy to RB/ACK the next
> version of the API and implementation patches, as long as it has the new
> NO_SHARE BO create flag and properly rejects exports of NO_SHARE BOs, even
> if not all of the dma_resv plumbing is fully baked.
>
> ~Faith
>

Reply via email to