Re: Proposal to report GPU private memory allocations with sysfs nodes [plain text version]

Kenny Ho Thu, 31 Oct 2019 10:58:07 -0700

Hi Yiwei,

This is the latest series:
https://patchwork.kernel.org/cover/11120371/


(I still need to reply some of the feedback.)

Regards,
Kenny

On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang <zzyi...@google.com> wrote:
>
> Hi Kenny,
>
> Thanks for the info. Do you mind forwarding the existing discussion to me or 
> have me cc'ed in that thread?
>
> Best,
> Yiwei
>
> On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho <y2ke...@gmail.com> wrote:
>>
>> Hi Yiwei,
>>
>> I am not sure if you are aware, there is an ongoing RFC on adding drm
>> support in cgroup for the purpose of resource tracking.  One of the
>> resource is GPU memory.  It's not exactly the same as what you are
>> proposing (it doesn't track API usage, but it tracks the type of GPU
>> memory from kmd perspective) but perhaps it would be of interest to
>> you.  There are no consensus on it at this point.
>>
>> (sorry for being late to the discussion.  I only noticed this thread
>> when one of the email got lucky and escape the spam folder.)
>>
>> Regards,
>> Kenny
>>
>> On Wed, Oct 30, 2019 at 4:14 AM Yiwei Zhang <zzyi...@google.com> wrote:
>> >
>> > Hi Jerome and all folks,
>> >
>> > In addition to my last reply, I just wanna get some more information 
>> > regarding this on the upstream side.
>> >
>> > 1. Do you think this(standardize a way to report GPU private allocations) 
>> > is going to be a useful thing on the upstream as well? It grants a lot 
>> > benefits for Android, but I'd like to get an idea for the non-Android 
>> > world.
>> >
>> > 2. There might be some worries that upstream kernel driver has no idea 
>> > regarding the API. However, to achieve good fidelity around memory 
>> > reporting, we'd have to pass down certain metadata which is known only by 
>> > the userland. Consider this use case: on the upstream side, freedreno for 
>> > example, some memory buffer object(BO) during its own lifecycle could 
>> > represent totally different things, and kmd is not aware of that. When 
>> > we'd like to take memory snapshots at certain granularity, we have to know 
>> > what that buffer represents so that the snapshot can be meaningful and 
>> > useful.
>> >
>> > If we just keep this Android specific, I'd worry some day the upstream has 
>> > standardized a way to report this and Android vendors have to take extra 
>> > efforts to migrate over. This is one of the main reasons we'd like to do 
>> > this on the upstream side.
>> >
>> > Timeline wise, Android has explicit deadlines for the next release and we 
>> > have to push hard towards those. Any prompt responses are very much 
>> > appreciated!
>> >
>> > Best regards,
>> > Yiwei
>> >
>> > On Mon, Oct 28, 2019 at 11:33 AM Yiwei Zhang <zzyi...@google.com> wrote:
>> >>
>> >> On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse <jgli...@redhat.com> wrote:
>> >>>
>> >>> On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote:
>> >>> > Hi folks,
>> >>> >
>> >>> > This is the plain text version of the previous email in case that was
>> >>> > considered as spam.
>> >>> >
>> >>> > --- Background ---
>> >>> > On the downstream Android, vendors used to report GPU private memory
>> >>> > allocations with debugfs nodes in their own formats. However, debugfs 
>> >>> > nodes
>> >>> > are getting deprecated in the next Android release.
>> >>>
>> >>> Maybe explain why it is useful first ?
>> >>
>> >>
>> >> Memory is precious on Android mobile platforms. Apps using a large amount 
>> >> of
>> >> memory, games, tend to maintain a table for the memory on different 
>> >> devices with
>> >> different prediction models. Private gpu memory allocations is currently 
>> >> semi-blind
>> >> to the apps and the platform as well.
>> >>
>> >> By having the data, the platform can do:
>> >> (1) GPU memory profiling as part of the huge Android profiler in progress.
>> >> (2) Android system health team can enrich the performance test coverage.
>> >> (3) We can collect filed metrics to detect any regression on the gpu 
>> >> private memory
>> >> allocations in the production population.
>> >> (4) Shell user can easily dump the allocations in a uniform way across 
>> >> vendors.
>> >> (5) Platform can feed the data to the apps so that apps can do memory 
>> >> allocations
>> >> in a more predictable way.
>> >>
>> >>>
>> >>> >
>> >>> > --- Proposal ---
>> >>> > We are taking the chance to unify all the vendors to migrate their 
>> >>> > existing
>> >>> > debugfs nodes into a standardized sysfs node structure. Then the 
>> >>> > platform
>> >>> > is able to do a bunch of useful things: memory profiling, system health
>> >>> > coverage, field metrics, local shell dump, in-app api, etc. This 
>> >>> > proposal
>> >>> > is better served upstream as all GPU vendors can standardize a gpu 
>> >>> > memory
>> >>> > structure and reduce fragmentation across Android and Linux that 
>> >>> > clients
>> >>> > can rely on.
>> >>> >
>> >>> > --- Detailed design ---
>> >>> > The sysfs node structure looks like below:
>> >>> > /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name>
>> >>> > e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a 
>> >>> > node
>> >>> > having the comma separated size values: "4096,81920,...,4096".
>> >>>
>> >>> How does kernel knows what API the allocation is use for ? With the
>> >>> open source driver you never specify what API is creating a gem object
>> >>> (opengl, vulkan, ...) nor what purpose (transient, shader, ...).
>> >>
>> >>
>> >> Oh, is this a hard requirement for the open source drivers to not 
>> >> bookkeep any
>> >> data from userland? I think the API is just some additional metadata 
>> >> passed down.
>> >>
>> >>>
>> >>>
>> >>> > For the top level root, vendors can choose their own names based on the
>> >>> > value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu 
>> >>> > driver
>> >>> > cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd 
>> >>> > KMDs.
>> >>> > (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or
>> >>> > "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name
>> >>> > under /sys/devices/ is already created and used for other purposes.
>> >>>
>> >>> On one side you want to standardize on the other you want to give
>> >>> complete freedom on the top level naming scheme. I would rather see a
>> >>> consistent naming scheme (ie something more restraint and with little
>> >>> place for interpration by individual driver)
>> >>
>> >>
>> >> Thanks for commenting on this. We definitely need some suggestions on the 
>> >> root
>> >> directory. In the multi-gpu case on desktop, is there some existing 
>> >> consumer to
>> >> query "some data" from all the GPUs? How does the tool find all GPUs and
>> >> differentiate between them? Is this already standardized?
>> >>
>> >>> > For the 2nd level "pid", there are usually just a couple of them per
>> >>> > snapshot, since we only takes snapshot for the active ones.
>> >>>
>> >>> ? Do not understand here, you can have any number of applications with
>> >>> GPU objects ? And thus there is no bound on the number of PID. Please
>> >>> consider desktop too, i do not know what kind of limitation android
>> >>> impose.
>> >>
>> >>
>> >> We are only interested in tracking *active* GPU private allocations. So 
>> >> yes, any
>> >> application currently holding an active GPU context will probably has a 
>> >> node here.
>> >> Since we want to do profiling for specific apps, the data has to be per 
>> >> application
>> >> based. I don't get your concerns here. If it's about the tracking 
>> >> overhead, it's rare
>> >> to see tons of application doing private gpu allocations at the same 
>> >> time. Could
>> >> you help elaborate a bit?
>> >>
>> >>> > For the 3rd level "type_name", the type name will be one of the GPU 
>> >>> > memory
>> >>> > object types in lower case, and the value will be a comma separated
>> >>> > sequence of size values for all the allocations under that specific 
>> >>> > type.
>> >>> >
>> >>> > We especially would like some comments on this part. For the GPU memory
>> >>> > object types, we defined 9 different types for Android:
>> >>> > (1) UNKNOWN // not accounted for in any other category
>> >>> > (2) SHADER // shader binaries
>> >>> > (3) COMMAND // allocations which have a lifetime similar to a
>> >>> > VkCommandBuffer
>> >>> > (4) VULKAN // backing for VkDeviceMemory
>> >>> > (5) GL_TEXTURE // GL Texture and RenderBuffer
>> >>> > (6) GL_BUFFER // GL Buffer
>> >>> > (7) QUERY // backing for query
>> >>> > (8) DESCRIPTOR // allocations which have a lifetime similar to a
>> >>> > VkDescriptorSet
>> >>> > (9) TRANSIENT // random transient things that the driver needs
>> >>> >
>> >>> > We are wondering if those type enumerations make sense to the upstream 
>> >>> > side
>> >>> > as well, or maybe we just deal with our own different type sets. Cuz 
>> >>> > on the
>> >>> > Android side, we'll just read those nodes named after the types we 
>> >>> > defined
>> >>> > in the sysfs node structure.
>> >>>
>> >>> See my above point of open source driver and kernel being unaware
>> >>> of the allocation purpose and use.
>> >>>
>> >>> Cheers,
>> >>> Jérôme
>> >>>
>> >>
>> >> Many thanks for the reply!
>> >> Yiwei
>> >
>> > _______________________________________________
>> > dri-devel mailing list
>> > dri-devel@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Proposal to report GPU private memory allocations with sysfs nodes [plain text version]

Reply via email to