On 20.06.23 14:35, Michael Ellerman wrote:
David Hildenbrand <da...@redhat.com> writes:
On 09.06.23 08:08, Aneesh Kumar K.V wrote:
Certain devices can possess non-standard memory capacities, not constrained
to multiples of 1GB. Provide a kernel parameter so that we can map the
device memory completely on memory hotplug.
So, the unfortunate thing is that these devices would have worked out of
the box before the memory block size was increased from 256 MiB to 1 GiB
in these setups. Now, one has to fine-tune the memory block size. The
only other arch that I know, which supports setting the memory block
size, is x86 for special (large) UV systems -- and at least in the past
128 MiB vs. 2 GiB memory blocks made a performance difference during
boot (maybe no longer today, who knows).
Obviously, less tunable and getting stuff simply working out of the box
is preferable.
Two questions:
1) Isn't there a way to improve auto-detection to fallback to 256 MiB in
these setups, to avoid specifying these parameters?
2) Is the 256 MiB -> 1 GiB memory block size switch really worth it? On
x86-64, experiments (with direct map fragmentation) showed that the
effective performance boost is pretty insignificant, so I wonder how big
the 1 GiB direct map performance improvement is.
The other issue is simply the number of sysfs entries.
With 64TB of memory and a 256MB block size you end up with ~250,000
directories in /sys/devices/system/memory.
Yes, and so far on other archs we only optimize for that for on UV x86
systems (with a default of 2 GiB). And that was added before we started
to speed up memory device lookups significantly using a radix tree IIRC.
It's worth noting that there was a discussion on:
(a) not creating these device sysfs entries (when configured on the
cmdline); often, nobody really ends up using them to online/offline
memory blocks. Then, the only primary users is lsmem.
(b) exposing logical devices (e.g., a DIMM) taht can only be
offlined/removed as a whole, instead of their individual memblocks (when
configured on the cmdline). But for PPC64 that won't help.
But (a) gets more tricky if device drivers (and things like dax/kmem)
rely on user-space memory onlining/offlining.
--
Cheers,
David / dhildenb