> Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing > spinlock use. > > On Thu, Nov 24, 2011 at 08:12:25AM +0000, Shi Xuelin-B29237 wrote: > > Hi Ira, > > > > Thanks for your review. > > > > After second thought, I think your scenario may not occur. > > Because the cookie 20 we query must be returned by fsl_dma_tx_submit(...) in > practice. > > We never query a cookie not returned by fsl_dma_tx_submit(...). > > > > I agree about this part. > > > When we call fsl_tx_status(20), the chan->common.cookie is definitely wrote > > as > 20 and cpu2 could not read as 19. > > > > This is what I don't agree about. However, I'm not an expert on CPU cache vs. > memory accesses in an multi-processor system. The section titled "CACHE > COHERENCY" in Documentation/memory-barriers.txt leads me to believe that the > scenario I described is possible.
For Freescale PowerPC, the chip automatically takes care of cache coherency. Even if this is a concern, spinlock can't address it. > > What happens if CPU1's write of chan->common.cookie only goes into CPU1's > cache. It never makes it to main memory before CPU2 fetches the old value of > 19. > > I don't think you should see any performance impact from the smp_mb() > operation. Smp_mb() do have impact on performance if it's in the hot path. While it might be safer having it, I doubt it is really necessary. If the CPU1 doesn't have the updated last_used, it's shouldn't have known there is a cookie 20 existed either. - Leo > > Thanks, > Ira > > > -----Original Message----- > > From: Ira W. Snyder [mailto:i...@ovro.caltech.edu] > > Sent: 2011年11月23日 2:59 > > To: Shi Xuelin-B29237 > > Cc: dan.j.willi...@intel.com; Li Yang-R58472; z...@zh-kernel.org; > > vinod.k...@intel.com; linuxppc-dev@lists.ozlabs.org; > > linux-ker...@vger.kernel.org > > Subject: Re: [PATCH][RFC] fsldma: fix performance degradation by optimizing > spinlock use. > > > > On Tue, Nov 22, 2011 at 12:55:05PM +0800, b29...@freescale.com wrote: > > > From: Forrest Shi <b29...@freescale.com> > > > > > > dma status check function fsl_tx_status is heavily called in > > > a tight loop and the desc lock in fsl_tx_status contended by > > > the dma status update function. this caused the dma performance > > > degrades much. > > > > > > this patch releases the lock in the fsl_tx_status function. > > > I believe it has no neglect impact on the following call of > > > dma_async_is_complete(...). > > > > > > we can see below three conditions will be identified as success > > > a) x < complete < use > > > b) x < complete+N < use+N > > > c) x < complete < use+N > > > here complete is the completed_cookie, use is the last_used > > > cookie, x is the querying cookie, N is MAX cookie > > > > > > when chan->completed_cookie is being read, the last_used may > > > be incresed. Anyway it has no neglect impact on the dma status > > > decision. > > > > > > Signed-off-by: Forrest Shi <xuelin....@freescale.com> > > > --- > > > drivers/dma/fsldma.c | 5 ----- > > > 1 files changed, 0 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index > > > 8a78154..1dca56f 100644 > > > --- a/drivers/dma/fsldma.c > > > +++ b/drivers/dma/fsldma.c > > > @@ -986,15 +986,10 @@ static enum dma_status fsl_tx_status(struct > dma_chan *dchan, > > > struct fsldma_chan *chan = to_fsl_chan(dchan); > > > dma_cookie_t last_complete; > > > dma_cookie_t last_used; > > > - unsigned long flags; > > > - > > > - spin_lock_irqsave(&chan->desc_lock, flags); > > > > > > > This will cause a bug. See below for a detailed explanation. You need this > > instead: > > > > /* > > * On an SMP system, we must ensure that this CPU has seen the > > * memory accesses performed by another CPU under the > > * chan->desc_lock spinlock. > > */ > > smp_mb(); > > > last_complete = chan->completed_cookie; > > > last_used = dchan->cookie; > > > > > > - spin_unlock_irqrestore(&chan->desc_lock, flags); > > > - > > > dma_set_tx_state(txstate, last_complete, last_used, 0); > > > return dma_async_is_complete(cookie, last_complete, last_used); } > > > > Facts: > > - dchan->cookie is the same member as chan->common.cookie (same memory > > location) > > - chan->common.cookie is the "last allocated cookie for a pending > > transaction" > > - chan->completed_cookie is the "last completed transaction" > > > > I have replaced "dchan->cookie" with "chan->common.cookie" in the below > explanation, to keep everything referenced from the same structure. > > > > Variable usage before your change. Everything is used locked. > > - RW chan->common.cookie (fsl_dma_tx_submit) > > - R chan->common.cookie (fsl_tx_status) > > - R chan->completed_cookie (fsl_tx_status) > > - W chan->completed_cookie (dma_do_tasklet) > > > > Variable usage after your change: > > - RW chan->common.cookie LOCKED > > - R chan->common.cookie NO LOCK > > - R chan->completed_cookie NO LOCK > > - W chan->completed_cookie LOCKED > > > > What if we assume that you have a 2 CPU system (such as a P2020). After your > changes, one possible sequence is: > > > > === CPU1 - allocate + submit descriptor: fsl_dma_tx_submit() === > > spin_lock_irqsave > > descriptor->cookie = 20 (x in your example) > > chan->common.cookie = 20 (used in your example) > > spin_unlock_irqrestore > > > > === CPU2 - immediately calls fsl_tx_status() === > > chan->common.cookie == 19 > > chan->completed_cookie == 19 > > descriptor->cookie == 20 > > > > Since we don't have locks anymore, CPU2 may not have seen the write to > > chan->common.cookie yet. > > > > Also assume that the DMA hardware has not started processing the > > transaction yet. Therefore dma_do_tasklet() has not been called, and > > chan->completed_cookie has not been updated. > > > > In this case, dma_async_is_complete() (on CPU2) returns DMA_SUCCESS, even > though the DMA operation has not succeeded. The DMA operation has not even > started yet! > > > > The smp_mb() fixes this, since it forces CPU2 to have seen all memory > > operations > that happened before CPU1 released the spinlock. Spinlocks are implicit SMP > memory barriers. > > > > Therefore, the above example becomes: > > smp_mb(); > > chan->common.cookie == 20 > > chan->completed_cookie == 19 > > descriptor->cookie == 20 > > > > Then dma_async_is_complete() returns DMA_IN_PROGRESS, which is correct. > > > > Thanks, > > Ira > > > > > > _______________________________________________ > > Linuxppc-dev mailing list > > Linuxppc-dev@lists.ozlabs.org > > https://lists.ozlabs.org/listinfo/linuxppc-dev _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev