On Sun, Dec 7, 2025 at 11:49 PM Nicolas Frattaroli <[email protected]> wrote: > > On Friday, 5 December 2025 22:16:44 Central European Standard Time Chia-I Wu > wrote: > > On Fri, Dec 5, 2025 at 2:48 AM Nicolas Frattaroli > > <[email protected]> wrote: > > > > > > On Thursday, 4 December 2025 21:21:08 Central European Standard Time > > > Chia-I Wu wrote: > > > > On Wed, Dec 3, 2025 at 6:04 AM Nicolas Frattaroli > > > > <[email protected]> wrote: > > > > > > > > > > Mali GPUs have three registers that indicate which parts of the > > > > > hardware > > > > > are powered and active at any moment. These take the form of bitmaps. > > > > > In > > > > > the case of SHADER_PWRACTIVE for example, a high bit indicates that > > > > > the > > > > > shader core corresponding to that bit index is active. These bitmaps > > > > > aren't solely contiguous bits, as it's common to have holes in the > > > > > sequence of shader core indices, and the actual set of which cores are > > > > > present is defined by the "shader present" register. > > > > > > > > > > When the GPU finishes a power state transition, it fires a > > > > > GPU_IRQ_POWER_CHANGED_ALL interrupt. After such an interrupt is > > > > > received, the PWRACTIVE registers will likely contain interesting new > > > > > information. > > > > I am seeing > > > > > > > > irq/342-panthor-412 [000] ..... 934.526754: gpu_power_active: > > > > shader_bitmap=0x0 tiler_bitmap=0x0 l2_bitmap=0x0 > > > > irq/342-panthor-412 [000] ..... 936.640356: gpu_power_active: > > > > shader_bitmap=0x0 tiler_bitmap=0x0 l2_bitmap=0x0 > > > > > > > > on a gpu-bound test. It does not look like collecting samples on > > > > GPU_IRQ_POWER_CHANGED_ALL gives too much info. > > > > > > On what GPU and SoC is that? If it's MT8196 then I wouldn't be > > > surprised if it just broke that hardware register, considering > > > what it did to the SHADER_PRESENT register. > > Indeed I was on mt8196. > > I don't have much faith in the Mali integration of that SoC being > representative of how the Mali hardware is supposed to work. The > SHADER_PRESENT thing is just the tip of the iceberg, I've also > noticed while developing mtk-mfg-pmdomain that it seemingly messes > with the Mali GPU's internal MCU from the GPUEB depending on the > commands you send it, and can get it into a broken state with > enough luck. > > Check if the registers ever read anything but 0, e.g. by dumping > them from sysfs like this: > > --- > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c > b/drivers/gpu/drm/panthor/panthor_drv.c > index d1d4c50da5bf..b0e67dc17c92 100644 > --- a/drivers/gpu/drm/panthor/panthor_drv.c > +++ b/drivers/gpu/drm/panthor/panthor_drv.c > @@ -1678,8 +1678,69 @@ static ssize_t profiling_store(struct device *dev, > > static DEVICE_ATTR_RW(profiling); > > +static ssize_t print_active_bitmask(char *buf, ssize_t len, u64 present, u64 > active) > +{ > + unsigned int i = 0; > + u64 bit; > + > + while (present) { > + bit = BIT(i); > + if (present & bit) { > + present &= ~bit; > + len += sysfs_emit_at(buf, len, "%s", (active & bit) ? > "1" : "0"); > + } else { > + len += sysfs_emit_at(buf, len, "_"); > + } > + i++; > + } > + > + return len; > +} > + > +static ssize_t power_active_show(struct device *dev, struct device_attribute > *attr, > + char *buf) > +{ > + struct panthor_device *ptdev = dev_get_drvdata(dev); > + ssize_t len = 0; > + u64 present; > + int ret; > + > + if (pm_runtime_suspended(ptdev->base.dev)) > + return sysfs_emit(buf, "Shader:\t0\nTiler:\t0\nL2:\t0\n"); > + > + ret = pm_runtime_resume_and_get(ptdev->base.dev); > + if (ret) > + return ret; > + > + len += sysfs_emit_at(buf, len, "Shader:\t"); > + len += print_active_bitmask(buf, len, gpu_read64(ptdev, > GPU_SHADER_PRESENT), > + gpu_read64(ptdev, SHADER_PWRACTIVE)); > + len += sysfs_emit_at(buf, len, "\n"); > + > + present = gpu_read64(ptdev, GPU_TILER_PRESENT); > + if (present == 0x1) /* "Implementation defined", just try to dump all > */ > + present = U64_MAX; > + len += sysfs_emit_at(buf, len, "Tiler:\t"); > + len += print_active_bitmask(buf, len, present, gpu_read64(ptdev, > TILER_PWRACTIVE)); > + len += sysfs_emit_at(buf, len, "\n"); > + > + present = gpu_read64(ptdev, GPU_L2_PRESENT); > + if (present == 0x1) /* "Implementation defined", just try to dump all > */ > + present = U64_MAX; > + len += sysfs_emit_at(buf, len, "L2:\t"); > + len += print_active_bitmask(buf, len, present, gpu_read64(ptdev, > L2_PWRACTIVE)); > + len += sysfs_emit_at(buf, len, "\n"); > + > + pm_runtime_put(ptdev->base.dev); > + > + return len; > +} > + > +static DEVICE_ATTR_RO(power_active); > + > static struct attribute *panthor_attrs[] = { > &dev_attr_profiling.attr, > + &dev_attr_power_active.attr, > NULL, > }; > --- > > If they always read 0 regardless of whether you're running a GPU > workload or not, then it's just not properly wired up. They can be non-zero. > > > > > > > On RK3588 (v10), GPU_IRQ_POWER_CHANGED_ALL reliably fires when > > > there is new information available in those registers. I haven't > > > tried on MT8196 (v13) yet because that still doesn't boot with > > > mainline so testing anything is a pain. > > > > > > I don't have any v12 or v11 hardware to test with. From what I > > > understand, there's no open enough platform to do v11 testing on, > > > just the Pixel 8 and Pixel 9. I could look into the Cix SoC for v12 > > > though some day, but I don't own one at the moment. > > > > > > > > > > > I think they are more useful to be collected periodically, such that > > > > we know that in the past X seconds, Y out of a total of Z samples > > > > indicates activities. That's best done in userspace, and panthor's > > > > role should be to provide an uapi such as > > > > https://lore.kernel.org/all/[email protected]/. > > > > > > This wouldn't give you information on the time a power transition has > > > completed, which is one of the motivations. A periodically collected > > > PWRACTIVE would just be roughly correlated to how busy the GPU is, > > > which isn't very useful additional information as the performance > > > counters themselves are likely a better source of that kind of info. > > {SHADER,TILER,L2}_READY might be more appropriate if you want to trace > > power transitions? > > Depends, the documentation I have access to isn't explicit about > what "READY" means. Is a busy core non-ready? Is there ever a case > where a significant number of cores are READY but not PWRACTIVE? > > I can answer the first question with some more poking on RK3588, > but for the latter a simple experiment on one piece of hardware > isn't going to answer it. Plus, the core being active will probably > be more interesting than it either sitting idle but powered or > actually doing work. >From what I can see, *_READY are non-zero when powered and *_PWRACTIVE are non-zero when powered and busy on mt8196.
If you want to generate a trace event upon GPU_IRQ_POWER_CHANGED_ALL, *_READY seems more appropriate at least on mt8196. If you want to track busyness with *_PWRACTIVE, you probably need to sample periodically. > > > > > > > > > What I need to do is restrict this to <= v13 in the next revision > > > however, because v14 reworks this stuff. > > > > > > Kind regards, > > > Nicolas Frattaroli > > > > > > > > > > > >
