I changed it to use 128-byte alignment to match the GPU cache-lines.

Committed to OG12.

Andrew

On 11/01/2023 18:05, Andrew Stubbs wrote:
This patch fixes a runtime issue I encountered with the AMD GCN Unified Shared Memory implementation.

We were using regular malloc'd memory configured into USM mode, but there were random intermittent crashes. I can't be completely sure, but my best guess is that the HSA driver is using malloc internally from the same heap, and therefore using memory on the same page as the offload kernel. What I do know is that I could make the crashes go away by simply padding the USM allocations before and after.

With this patch USM allocations are now completely separated from the system heap. The custom allocator is probably less optimal is some use-cases, but does have the advantage that all the metadata is stored in a side-table that won't ever cause any pages to migrate back to main-memory unnecessarily. It's still possible for the user program to use USM memory in a way that causes it to thrash, and this might have been the ultimate cause of the crashes, but there's not much we can do about that here.

I've broken the allocator out into a new file because I anticipate it being needed in more than one place, but I didn't put full data-isolation on it yet.

I'll rebase, merge, and repost all of the OpenMP memory patches sometime soonish.

Andrew
libgomp, amdgcn: Switch USM to 128-byte alignment

This should optimize cache-lines on the AMD GPUs somewhat.

libgomp/ChangeLog:

        * usm-allocator.c (ALIGN): Use 128-byte alignment.

diff --git a/libgomp/usm-allocator.c b/libgomp/usm-allocator.c
index c45109169ca..68c1ebafec2 100644
--- a/libgomp/usm-allocator.c
+++ b/libgomp/usm-allocator.c
@@ -57,7 +57,8 @@ static int usm_lock = 0;
 static struct usm_splay_tree_s usm_allocations = { NULL };
 static struct usm_splay_tree_s usm_free_space = { NULL };
 
-#define ALIGN(VAR) (((VAR) + 7) & ~7)    /* 8-byte granularity.  */
+/* 128-byte granularity means GPU cache-line aligned.  */
+#define ALIGN(VAR) (((VAR) + 127) & ~127)
 
 /* Coalesce contiguous free space into one entry.  This considers the entries
    either side of the root node only, so it should be called each time a new

Reply via email to