Re: [PATCH] libgomp, openmp: pinned memory

Andrew Stubbs Tue, 07 Jun 2022 04:05:56 -0700

Following some feedback from users of the OG11 branch I think I need towithdraw this patch, for now.

The memory pinned via the mlock call does not give the expectedperformance boost. I had not expected that it would do much in my testsetup, given that the machine has a lot of RAM and my benchmarks aresmall, but others have tried more and on varying machines and architectures.

It seems that it isn't enough for the memory to be pinned, it has to bepinned using the Cuda API to get the performance boost. I had not donethis because it was difficult to resolve the code abstractiondifficulties and anyway the implementation was supposed to be deviceindependent, but it seems we need a specific pinning mechanism for eachdevice.

I will resubmit this patch with some kind of Cuda/plugin hook soonish,keeping the existing implementation for other device types. I don't knowhow that'll handle heterogenous systems, but those ought to be rare.

I don't think libmemkind will resolve this performance issue, althoughcertainly it can be used for host implementations of low-latencymemories, etc.


Andrew

On 13/01/2022 13:53, Andrew Stubbs wrote:

On 05/01/2022 17:07, Andrew Stubbs wrote:
I don't believe 64KB will be anything like enough for any real HPCapplication. Is it really worth optimizing for this case?
Anyway, I'm working on an implementation using mmap instead of mallocfor pinned allocations. I figure that will simplify the unpinalgorithm (because it'll be munmap) and optimize for large allocationssuch as I imagine HPC applications will use. It won't fix the ulimitissue.
Here's my new patch.
This version is intended to apply on top of the latest version of mylow-latency allocator patch, although the dependency is mostly textual.
Pinned memory is allocated via mmap + mlock, and allocation fails(returns NULL) if the lock fails and there's no fallback configured.
This means that large allocations will now be page aligned and thereforepin the smallest number of pages for the size requested, and that thatmemory will be unpinned automatically when freed via munmap, or movedvia mremap.
Obviously this is not ideal for allocations much smaller than one page.If that turns out to be a problem in the real world then we can add aspecial case fairly straight-forwardly, and incur the extra pagetracking expense in those cases only, or maybe implement our ownpinned-memory heap (something like already proposed for low-latencymemory, perhaps).
Also new is a realloc implementation that works better when reallocationfails. This is confirmed by the new testcases.
OK for stage 1?

Thanks

Andrew

Re: [PATCH] libgomp, openmp: pinned memory

Reply via email to