In an attempt to reduce the time pprof says is spent in sync.Pool (internal workloads, sorry) I modified Get and getSlow to skip locking the per-P shared pools if the pools are likely to be empty. This yields promising results, but I'm not sure the approach is sound since the check I do is inherently racy.
As a (artificial and contrived) benchmark, I'm using this: func BenchmarkPoolUnderflow(b *testing.B) { var p Pool b.RunParallel(func(pb *testing.PB) { for pb.Next() { p.Put(1) p.Get() p.Get() } }) } This is meant to simulate a pool in which more or objects are Get() than are Put() (it wouldn't make much sense to simulate a pool in which we only get, so to keep things simple I opted for a 2:1 ratio) The change I applied to Get and getSlow is the following. Starting from the current pattern of: l := ... # per-P poolLocal l.Lock() last := len(l.shared) - 1 if last >= 0 { x = l.shared[last] l.shared = l.shared[:last] } l.Unlock() I add a check (not protected by the mutex, that is the expensive op we're trying to skip if it's not necessary) to see if the pool is likely to be non-empty: l := ... # per-P poolLocal if len(l.shared) > 0 { # the racy check for non-emptiness l.Lock() last := len(l.shared) - 1 if last >= 0 { x = l.shared[last] l.shared = l.shared[:last] } l.Unlock() } I know I should not call this a benign race, but in this case I don't see how this can lead to problems. If the racy check gets it right, then it's almost a net win. If if it gets it wrong, either we do what we do now (i.e. we lock, just to find an empty pool), or we skip an otherwise non-empty pool - thereby failing to immediately return an otherwise reusable object (note that 1. there is a per-P shared pool for each P, so I'd estimate the chances of this happening on all of them to be pretty low and 2. the Pool documentation explicitly say that Get is allowed to treat the pool as empty). Also note that the race detector is already explicitly disabled in all sync.Pool methods. The reason I'm asking is to understand whether my reasoning is sound and, regardless, if anybody has suggestions about how to do this in a better way. The current results (of the approach above, plus some refactoring to recover some lost performance on the other benchmarks) on my laptop are the following: name old time/op new time/op delta Pool-4 14.5ns ± 3% 14.2ns ± 2% -1.64% (p=0.023 n=9+10) PoolOverflow-4 1.99µs ±12% 1.78µs ± 1% -10.62% (p=0.000 n=10+8) PoolUnderflow-4 152ns ± 6% 30ns ± 1% -80.00% (p=0.000 n=10+8) (the first two benchmarks are already part of sync.Pool, the last one is the one I described above) Any feedback is welcome. If this is deemed safe I'm going to submit a CL. Carlo -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.