I'm trying to prove an optimization technique for ring buffer is effective. One of the technique is using bitmask instead of modulo to calculate a wrap around. However, in my environment, modulo is slightly faster in a test where 1 billion items are enqueued /dequeued by a single goroutine. What do you think could be the cause?
Full code: https://go.dev/play/p/H933oqrhPI- Environment: * go version go1.21.4 darwin/arm64 * Apple M1 Pro RingBuffer with modulo: ``` type RingBuffer0 struct { writeIdx uint64 readIdx uint64 buffers []any size uint64 } func NewRingBuffer0(size uint64) *RingBuffer0 { rb := &RingBuffer0{} rb.init(size) return rb } func (rb *RingBuffer0) init(size uint64) { rb.buffers = make([]any, size) rb.size = size } func (rb *RingBuffer0) Enqueue(item any) error { if rb.writeIdx-rb.readIdx == rb.size { return ErrBufferFull } rb.buffers[rb.writeIdx%rb.size] = item rb.writeIdx++ return nil } func (rb *RingBuffer0) Dequeue() (any, error) { if rb.writeIdx == rb.readIdx { return nil, ErrBufferEmpty } item := rb.buffers[rb.readIdx%rb.size] rb.readIdx++ return item, nil } ``` RingBuffer with bitmask: change each module calculation to the code below * rb.buffers[rb.writeIdx&(rb.size-1)] = item * item := rb.buffers[rb.readIdx&(rb.size-1)] Test: func TestSingle(rb RingBuffer) { start := time.Now() total := 500000 for i := 0; i < total; i++ { for j := 0; j < 1000; j++ { rb.Enqueue(j) } for j := 0; j < 1000; j++ { rb.Dequeue() } } end := time.Now() count := total * 2000 duration := end.Sub(start).Milliseconds() fmt.Printf("%d ops in %d ms\n", count, duration) } -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/b9c4d2e0-4ab4-4d27-9359-abd8c090ae33n%40googlegroups.com.