The discussion is restricted to AArch64. Question: On arm64, publicationBarrier in mallocgc is implemented as DMB ST. What is the invariant that requires it to execute at its current position?
Specifically: - Must it execute before the allocated object becomes visible to another P/M? - Must it execute before GC metadata becomes visible? - Or is it required for maintaining the tri-color invariant under concurrent GC? My reasoning (please correct me if wrong) The comment in runtime/stubs.go says that the purpose of publicationBarrier is to ensure that other processors observe the fully initialized object before it becomes reachable from GC. If that is the case, it seems that as long as: 1) the allocated object is not yet accessible by another goroutine, and 2) the goroutine which does the allocation is not preempted or schedule itself through chanrecv or other operations to another P/M, then the barrier might be deferrable. Under this reasoning, it appears possible that a single DMB ST could be shared across multiple consecutive mallocgc calls. However, I'm unsure whether this reasoning overlooks some GC or scheduler invariants, and that is what I would like to understand. --- Background: The current order in mallocgc (simplified) is: ```go alloc publicationBarrier // DMB ST update GC metadata ``` According to measurements in issue comment https://github.com/golang/go/issues/63640#issuecomment-3661284210, the barrier can account for ~35–40% of mallocgc time on arm64 microbenchmarks. I experimented with amortizing the barrier across multiple consecutive allocations (i.e., sharing the DMB ST). The design is omitted here for concise question. Microbenchmark results show mixed performance impact: ``` goos: linux goarch: arm64 pkg: runtime │ default.txt │ batch.txt │ │ sec/op │ sec/op vs base │ Malloc8-64 22.11n ± 0% 21.82n ± 0% -1.31% (p=0.000 n=10) Malloc16-64 38.79n ± 0% 33.76n ± 0% -12.98% (p=0.000 n=10) MallocTypeInfo8-64 28.49n ± 0% 31.37n ± 0% +10.11% (p=0.000 n=10) MallocTypeInfo16-64 38.19n ± 0% 39.57n ± 0% +3.61% (p=0.000 n=10) MallocLargeStruct-64 417.9n ± 1% 400.8n ± 1% -4.10% (p=0.000 n=10) geomean 52.27n 51.62n -1.24% ``` However, my main concern is correctness: I would like to understand the exact memory-ordering guarantee enforced by this barrier on AArch64. Thanks. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/82dd40a1-fbb1-4379-b273-4558954b109bn%40googlegroups.com.
