Configuring UAR as IO-mapped makes maximum throughput decline by noticeable amount. If UAR is configured as write-combining register, a write memory barrier is needed on ringing a doorbell. rte_wmb() is mostly effective when the size of a burst is comparatively small.
Personally I don't think that the flag is really a good interface choice. But also I'm not convinced that its dependent on the burst size. What guarantees that even for larger bursts the mmio write was flushed? it comes after a set of writes that were flushed prior to the db update and its not guaranteed that the application will immediately have more data to trigger this writes to flush.