Hi Yiwei, This is the latest series: https://patchwork.kernel.org/cover/11120371/
(I still need to reply some of the feedback.) Regards, Kenny On Thu, Oct 31, 2019 at 12:59 PM Yiwei Zhang <zzyi...@google.com> wrote: > > Hi Kenny, > > Thanks for the info. Do you mind forwarding the existing discussion to me or > have me cc'ed in that thread? > > Best, > Yiwei > > On Wed, Oct 30, 2019 at 10:23 PM Kenny Ho <y2ke...@gmail.com> wrote: >> >> Hi Yiwei, >> >> I am not sure if you are aware, there is an ongoing RFC on adding drm >> support in cgroup for the purpose of resource tracking. One of the >> resource is GPU memory. It's not exactly the same as what you are >> proposing (it doesn't track API usage, but it tracks the type of GPU >> memory from kmd perspective) but perhaps it would be of interest to >> you. There are no consensus on it at this point. >> >> (sorry for being late to the discussion. I only noticed this thread >> when one of the email got lucky and escape the spam folder.) >> >> Regards, >> Kenny >> >> On Wed, Oct 30, 2019 at 4:14 AM Yiwei Zhang <zzyi...@google.com> wrote: >> > >> > Hi Jerome and all folks, >> > >> > In addition to my last reply, I just wanna get some more information >> > regarding this on the upstream side. >> > >> > 1. Do you think this(standardize a way to report GPU private allocations) >> > is going to be a useful thing on the upstream as well? It grants a lot >> > benefits for Android, but I'd like to get an idea for the non-Android >> > world. >> > >> > 2. There might be some worries that upstream kernel driver has no idea >> > regarding the API. However, to achieve good fidelity around memory >> > reporting, we'd have to pass down certain metadata which is known only by >> > the userland. Consider this use case: on the upstream side, freedreno for >> > example, some memory buffer object(BO) during its own lifecycle could >> > represent totally different things, and kmd is not aware of that. When >> > we'd like to take memory snapshots at certain granularity, we have to know >> > what that buffer represents so that the snapshot can be meaningful and >> > useful. >> > >> > If we just keep this Android specific, I'd worry some day the upstream has >> > standardized a way to report this and Android vendors have to take extra >> > efforts to migrate over. This is one of the main reasons we'd like to do >> > this on the upstream side. >> > >> > Timeline wise, Android has explicit deadlines for the next release and we >> > have to push hard towards those. Any prompt responses are very much >> > appreciated! >> > >> > Best regards, >> > Yiwei >> > >> > On Mon, Oct 28, 2019 at 11:33 AM Yiwei Zhang <zzyi...@google.com> wrote: >> >> >> >> On Mon, Oct 28, 2019 at 8:26 AM Jerome Glisse <jgli...@redhat.com> wrote: >> >>> >> >>> On Fri, Oct 25, 2019 at 11:35:32AM -0700, Yiwei Zhang wrote: >> >>> > Hi folks, >> >>> > >> >>> > This is the plain text version of the previous email in case that was >> >>> > considered as spam. >> >>> > >> >>> > --- Background --- >> >>> > On the downstream Android, vendors used to report GPU private memory >> >>> > allocations with debugfs nodes in their own formats. However, debugfs >> >>> > nodes >> >>> > are getting deprecated in the next Android release. >> >>> >> >>> Maybe explain why it is useful first ? >> >> >> >> >> >> Memory is precious on Android mobile platforms. Apps using a large amount >> >> of >> >> memory, games, tend to maintain a table for the memory on different >> >> devices with >> >> different prediction models. Private gpu memory allocations is currently >> >> semi-blind >> >> to the apps and the platform as well. >> >> >> >> By having the data, the platform can do: >> >> (1) GPU memory profiling as part of the huge Android profiler in progress. >> >> (2) Android system health team can enrich the performance test coverage. >> >> (3) We can collect filed metrics to detect any regression on the gpu >> >> private memory >> >> allocations in the production population. >> >> (4) Shell user can easily dump the allocations in a uniform way across >> >> vendors. >> >> (5) Platform can feed the data to the apps so that apps can do memory >> >> allocations >> >> in a more predictable way. >> >> >> >>> >> >>> > >> >>> > --- Proposal --- >> >>> > We are taking the chance to unify all the vendors to migrate their >> >>> > existing >> >>> > debugfs nodes into a standardized sysfs node structure. Then the >> >>> > platform >> >>> > is able to do a bunch of useful things: memory profiling, system health >> >>> > coverage, field metrics, local shell dump, in-app api, etc. This >> >>> > proposal >> >>> > is better served upstream as all GPU vendors can standardize a gpu >> >>> > memory >> >>> > structure and reduce fragmentation across Android and Linux that >> >>> > clients >> >>> > can rely on. >> >>> > >> >>> > --- Detailed design --- >> >>> > The sysfs node structure looks like below: >> >>> > /sys/devices/<ro.gfx.sysfs.0>/<pid>/<type_name> >> >>> > e.g. "/sys/devices/mali0/gpu_mem/606/gl_buffer" and the gl_buffer is a >> >>> > node >> >>> > having the comma separated size values: "4096,81920,...,4096". >> >>> >> >>> How does kernel knows what API the allocation is use for ? With the >> >>> open source driver you never specify what API is creating a gem object >> >>> (opengl, vulkan, ...) nor what purpose (transient, shader, ...). >> >> >> >> >> >> Oh, is this a hard requirement for the open source drivers to not >> >> bookkeep any >> >> data from userland? I think the API is just some additional metadata >> >> passed down. >> >> >> >>> >> >>> >> >>> > For the top level root, vendors can choose their own names based on the >> >>> > value of ro.gfx.sysfs.0 the vendors set. (1) For the multiple gpu >> >>> > driver >> >>> > cases, we can use ro.gfx.sysfs.1, ro.gfx.sysfs.2 for the 2nd and 3rd >> >>> > KMDs. >> >>> > (2) It's also allowed to put some sub-dir for example "kgsl/gpu_mem" or >> >>> > "mali0/gpu_mem" in the ro.gfx.sysfs.<channel> property if the root name >> >>> > under /sys/devices/ is already created and used for other purposes. >> >>> >> >>> On one side you want to standardize on the other you want to give >> >>> complete freedom on the top level naming scheme. I would rather see a >> >>> consistent naming scheme (ie something more restraint and with little >> >>> place for interpration by individual driver) >> >> >> >> >> >> Thanks for commenting on this. We definitely need some suggestions on the >> >> root >> >> directory. In the multi-gpu case on desktop, is there some existing >> >> consumer to >> >> query "some data" from all the GPUs? How does the tool find all GPUs and >> >> differentiate between them? Is this already standardized? >> >> >> >>> > For the 2nd level "pid", there are usually just a couple of them per >> >>> > snapshot, since we only takes snapshot for the active ones. >> >>> >> >>> ? Do not understand here, you can have any number of applications with >> >>> GPU objects ? And thus there is no bound on the number of PID. Please >> >>> consider desktop too, i do not know what kind of limitation android >> >>> impose. >> >> >> >> >> >> We are only interested in tracking *active* GPU private allocations. So >> >> yes, any >> >> application currently holding an active GPU context will probably has a >> >> node here. >> >> Since we want to do profiling for specific apps, the data has to be per >> >> application >> >> based. I don't get your concerns here. If it's about the tracking >> >> overhead, it's rare >> >> to see tons of application doing private gpu allocations at the same >> >> time. Could >> >> you help elaborate a bit? >> >> >> >>> > For the 3rd level "type_name", the type name will be one of the GPU >> >>> > memory >> >>> > object types in lower case, and the value will be a comma separated >> >>> > sequence of size values for all the allocations under that specific >> >>> > type. >> >>> > >> >>> > We especially would like some comments on this part. For the GPU memory >> >>> > object types, we defined 9 different types for Android: >> >>> > (1) UNKNOWN // not accounted for in any other category >> >>> > (2) SHADER // shader binaries >> >>> > (3) COMMAND // allocations which have a lifetime similar to a >> >>> > VkCommandBuffer >> >>> > (4) VULKAN // backing for VkDeviceMemory >> >>> > (5) GL_TEXTURE // GL Texture and RenderBuffer >> >>> > (6) GL_BUFFER // GL Buffer >> >>> > (7) QUERY // backing for query >> >>> > (8) DESCRIPTOR // allocations which have a lifetime similar to a >> >>> > VkDescriptorSet >> >>> > (9) TRANSIENT // random transient things that the driver needs >> >>> > >> >>> > We are wondering if those type enumerations make sense to the upstream >> >>> > side >> >>> > as well, or maybe we just deal with our own different type sets. Cuz >> >>> > on the >> >>> > Android side, we'll just read those nodes named after the types we >> >>> > defined >> >>> > in the sysfs node structure. >> >>> >> >>> See my above point of open source driver and kernel being unaware >> >>> of the allocation purpose and use. >> >>> >> >>> Cheers, >> >>> Jérôme >> >>> >> >> >> >> Many thanks for the reply! >> >> Yiwei >> > >> > _______________________________________________ >> > dri-devel mailing list >> > dri-devel@lists.freedesktop.org >> > https://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel