Am 10.01.23 um 16:28 schrieb Marek Olšák:
On Wed, Jan 4, 2023 at 9:51 AM Christian König
<ckoenig.leichtzumer...@gmail.com> wrote:
Am 04.01.23 um 00:08 schrieb Marek Olšák:
I see about the access now, but did you even look at the patch?
I did look at the patch, but I haven't fully understood yet what
you are trying to do here.
First and foremost, it returns the evicted size of VRAM and visible
VRAM, and returns visible VRAM usage. It should be obvious which stat
includes the size of another.
Because what the patch does isn't even exposed to common drm
code, such as the preferred domain and visible VRAM placement, so
it can't be in fdinfo right now.
Or do you even know what fdinfo contains? Because it contains
nothing useful. It only has VRAM and GTT usage, which we already
have in the INFO ioctl, so it has nothing that we need. We mainly
need the eviction information and visible VRAM information now.
Everything else is a bonus.
Well the main question is what are you trying to get from that
information? The eviction list for example is completely
meaningless to userspace, that stuff is only temporary and will be
cleared on the next CS again.
I don't know what you mean. The returned eviction stats look correct
and are stable (they don't change much). You can suggest changes if
you think some numbers are not reported correctly.
What we could expose is the VRAM over-commit value, e.g. how much
BOs which where supposed to be in VRAM are in GTT now. I think
that's what you are looking for here, right?
The VRAM overcommit value is "evicted_vram".
Also, it's undesirable to open and parse a text file if we can
just call an ioctl.
Well I see the reasoning for that, but I also see why other
drivers do a lot of the stuff we have as IOCTL as separate files
in sysfs, fdinfo or debugfs.
Especially repeating all the static information which were already
available under sysfs in the INFO IOCTL was a design mistake as
far as I can see. Just compare what AMDGPU and the KFD code is
doing to what for example i915 is doing.
Same for things like debug information about a process. The fdinfo
stuff can be queried from external tools (gdb, gputop, umr etc...)
as well which makes that interface more preferred.
Nothing uses fdinfo in Mesa. No driver uses sysfs in Mesa except drm
shims, noop drivers, and Intel for perf metrics. sysfs itself is an
unusable mess for the PCIe query and is missing information.
I'm not against exposing more stuff through sysfs and fdinfo for
tools, but I don't see any reason why drivers should use it (other
than for slowing down queries and initialization).
That's what I'm asking: Is this for some tool or to make some driver
decision based on it?
If you just want the numbers for over displaying then I think it would
be better to put this into fdinfo together with the other existing stuff
there.
If you want to make allocation decisions based on this then we should
have that as IOCTL or even better as mmap() page between kernel and
userspace. But in this case I would also calculation the numbers
completely different as well.
See we have at least the following things in the kernel:
1. The eviction list in the VM.
Those are the BOs which are currently evicted and tried to moved
back in on the next CS.
2. The VRAM over commit value.
In other words how much more VRAM than available has the
application tried to allocate?
3. The visible VRAM usage by this application.
The end goal is that the eviction list will go away, e.g. we will always
have stable allocations based on allocations of other applications and
not constantly swap things in and out.
When you now expose the eviction list to userspace we will be stuck with
this interface forever.
Christian.
Marek