In an attempt to reduce the time pprof says is spent in sync.Pool (internal 
workloads, sorry) I modified Get and getSlow to skip locking the per-P 
shared pools if the pools are likely to be empty. This yields promising 
results, but I'm not sure the approach is sound since the check I do is 
inherently racy.

As a (artificial and contrived) benchmark, I'm using this:

func BenchmarkPoolUnderflow(b *testing.B) {
  var p Pool
  b.RunParallel(func(pb *testing.PB) {
    for pb.Next() {
      p.Put(1)
      p.Get()
      p.Get()
    }
  })
}

This is meant to simulate a pool in which more or objects are Get() than 
are Put() (it wouldn't make much sense to simulate a pool in which we only 
get, so to keep things simple I opted for a 2:1 ratio)

The change I applied to Get and getSlow is the following. Starting from the 
current pattern of:

l := ... # per-P poolLocal
l.Lock()
last := len(l.shared) - 1
if last >= 0 {
  x = l.shared[last]
  l.shared = l.shared[:last]
}
l.Unlock()

I add a check (not protected by the mutex, that is the expensive op we're 
trying to skip if it's not necessary) to see if the pool is likely to be 
non-empty: 

l := ... # per-P poolLocal
if len(l.shared) > 0 { # the racy check for non-emptiness
  l.Lock()
  last := len(l.shared) - 1
  if last >= 0 {
    x = l.shared[last]
    l.shared = l.shared[:last]
  }
  l.Unlock()
}

I know I should not call this a benign race, but in this case I don't see 
how this can lead to problems. If the racy check gets it right, then it's 
almost a net win. If if it gets it wrong, either we do what we do now (i.e. 
we lock, just to find an empty pool), or we skip an otherwise non-empty 
pool - thereby failing to immediately return an otherwise reusable object 
(note that 1. there is a per-P shared pool for each P, so I'd estimate the 
chances of this happening on all of them to be pretty low and 2. the Pool 
documentation explicitly say that Get is allowed to treat the pool as 
empty). Also note that the race detector is already explicitly disabled in 
all sync.Pool methods.

The reason I'm asking is to understand whether my reasoning is sound and, 
regardless, if anybody has suggestions about how to do this in a better way.

The current results (of the approach above, plus some refactoring to 
recover some lost performance on the other benchmarks) on my laptop are the 
following:

    name             old time/op  new time/op  delta
    Pool-4           14.5ns ± 3%  14.2ns ± 2%   -1.64%  (p=0.023 n=9+10)
    PoolOverflow-4   1.99µs ±12%  1.78µs ± 1%  -10.62%  (p=0.000 n=10+8)
    PoolUnderflow-4   152ns ± 6%    30ns ± 1%  -80.00%  (p=0.000 n=10+8)

(the first two benchmarks are already part of sync.Pool, the last one is 
the one I described above)

Any feedback is welcome. If this is deemed safe I'm going to submit a CL.

Carlo


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to