> On 03/02/2012 04:39 PM, Marek Vasut wrote:
> >>    ensure that transmit and receive buffers are cache-line aligned
> >>    
> >>          invalidate cache after each packet received
> >>          flush cache before transmitting
> >> 
> >> Signed-off-by: Eric Nelson<eric.nel...@boundarydevices.com>
> >> 
>  >> ---
>  >> 
>  >>   drivers/net/fec_mxc.c |  248
>  >> 
>  >> ++++++++++++++++++++++++++++++++++++------------- drivers/net/fec_mxc.h
>  >> |
>  >> 
>  >>   19 +----
>  >>   2 files changed, 184 insertions(+), 83 deletions(-)
>  >> 
>  >> diff --git a/drivers/net/fec_mxc.c b/drivers/net/fec_mxc.c
>  >> index 1fdd071..f72304b 100644
>  >> --- a/drivers/net/fec_mxc.c
>  >> +++ b/drivers/net/fec_mxc.c
>  >> 
>  >> <snip>
>  >> 
> >>    /*
> >>    
> >>     * This routine transmits one frame.  This routine only accepts
> >> 
> >> @@ -650,15 +708,21 @@ static int fec_send(struct eth_device *dev,
> >> volatile void* packet, int length) }
> >> 
> >>    /*
> >> 
> >> -   * Setup the transmit buffer
> >> -   * Note: We are always using the first buffer for transmission,
> >> -   * the second will be empty and only used to stop the DMA engine
> >> +   * Setup the transmit buffer. We are always using the first buffer for
> >> +   * transmission, the second will be empty and only used to stop the
> >> DMA +       * engine. We also flush the packet to RAM here to avoid cache
> >> trouble.
> >> 
> >>     */
> >>   
> >>   #ifdef   CONFIG_FEC_MXC_SWAP_PACKET
> >>   
> >>    swap_packet((uint32_t *)packet, length);
> >>   
> >>   #endif
> >> 
> >> -  writew(length,&fec->tbd_base[fec->tbd_index].data_length);
> >> -  writel((uint32_t)packet,&fec->tbd_base[fec->tbd_index].data_pointer);
> >> +
> >> +  addr = (uint32_t)packet;
> >> +  size = roundup(length, CONFIG_FEC_DATA_ALIGNMENT);
> >> +  flush_dcache_range(addr, addr + size);
> > 
> > What if size of the packet isn't aligned on cacheline boundary? Won't it
> > choke here?
> 
> Here's a quandary...
> 
> flush_dcache_range() can potentially force writes into unintended areas,
> which could conceivably include areas owned by the ethernet receiver if
> the source object weren't aligned to a cache-line boundary and size.
> 
> In practice, it appears that net/net.c only transmits from one of two
> places:

You can also invalidate memory and loose some information. Though I'm not quite 
sure this driver can be the case.

>       - a dedicated transmit buffer (NetTxPacket), which is guaranteed
>       to be aligned to PKTALIGN (32).
>       - a receive buffer (ICMP and ARP replies)
> 
> The networking API certainly allows for transmits from arbitrary
> areas in memory, but I'm not seeing where or how this can be invoked.

OK
> 
> Because there's no way to query how the memory for a packet is
> allocated, the only way around this I can see is to memcpy to
> a dedicated transmit buffer within fec_mxc.c, which would defeat
> any benefit of cache.

Indeed ... a bounce buffer. But such a bounce buffer should be implemented 
close 
to the place where the misalignment originates. If the net core can align 
packets everywhere well, we're safe and happy.
> 
> AFAICT, all of the other calls to 'flush_dcache_range()' are okay because
> they're dealing with buffers allocated by the driver.
> 

I'd love to see someone else review this. This is very important to be made 
right.

> >> +
> >> +  fec->tbd_base[fec->tbd_index].data_length = length;
> >> +  fec->tbd_base[fec->tbd_index].data_pointer = addr;
> >> +
> >> 
> >>    /*
> >>    
> >>     * update BD's status now
> >> 
> >>     * This block:
> >> @@ -667,9 +731,18 @@ static int fec_send(struct eth_device *dev,
> >> volatile void* packet, int length) * - might be the last BD in the
> >> list, so the address counter should *   wrap (->  keep the WRAP flag)
> >> 
> >>     */
> >> 
> >> -  status = readw(&fec->tbd_base[fec->tbd_index].status)&  FEC_TBD_WRAP;
> >> +  status = fec->tbd_base[fec->tbd_index].status&  FEC_TBD_WRAP;
> >> 
> >>    status |= FEC_TBD_LAST | FEC_TBD_TC | FEC_TBD_READY;
> >> 
> >> -  writew(status,&fec->tbd_base[fec->tbd_index].status);
> >> +  fec->tbd_base[fec->tbd_index].status = status;
> >> +
> >> +  /*
> >> +   * Flush data cache. This code flushes both TX descriptors to RAM.
> >> +   * After this code this code, the descritors will be safely in RAM
> >> +   * and we can start DMA.
> >> +   */
> >> +  size = roundup(2 * sizeof(struct fec_bd), CONFIG_FEC_DATA_ALIGNMENT);
> >> +  addr = (uint32_t)fec->tbd_base;
> >> +  flush_dcache_range(addr, addr + size);
> > 
> > Same concern here and everywhere else ... I believe aligning it like this
> > is quite unsafe :-(
> 
> This one looks safe because you've controlled the allocation of tbd_base.
> 
> >> <snip>
> >> 
> >> @@ -751,6 +846,13 @@ static int fec_recv(struct eth_device *dev)
> >> 
> >>                    frame = (struct nbuf *)readl(&rbd->data_pointer);
> >>                    frame_length = readw(&rbd->data_length) - 4;
> >>                    /*
> >> 
> >> +                   * Invalidate data cache over the buffer
> >> +                   */
> >> +                  addr = (uint32_t)frame;
> >> +                  size = roundup(frame_length, CONFIG_FEC_DATA_ALIGNMENT);
> >> +                  invalidate_dcache_range(addr, addr + size);
> > 
> > DTTO here, frame length might not be aligned properly, or will it be?
> > Network stack must be properly analyzed here.
> 
> The hardware won't return an unaligned value here, so this should be good.

Are you sure? You can't receive frame aligned to 8 bytes boundary?
> 
>  >> <snip>
> >> 
> >> @@ -765,11 +867,27 @@ static int fec_recv(struct eth_device *dev)
> >> 
> >>                                            (ulong)rbd->data_pointer,
> >>                                            bd_status);
> >>            
> >>            }
> >> 
> >> +
> >> 
> >>            /*
> >> 
> >> -           * free the current buffer, restart the engine
> >> -           * and move forward to the next buffer
> >> +           * Free the current buffer, restart the engine and move forward
> >> +           * to the next buffer. Here we check if the whole cacheline of
> >> +           * descriptors was already processed and if so, we mark it free
> >> +           * as whole.
> >> 
> >>             */
> >> 
> >> -          fec_rbd_clean(fec->rbd_index == (FEC_RBD_NUM - 1) ? 1 : 0, rbd);
> >> +          size = (CONFIG_FEC_DATA_ALIGNMENT / sizeof(struct fec_bd)) - 1;
> >> +          if ((fec->rbd_index&  size) == size) {
> >> +                  i = fec->rbd_index - size;
> >> +                  addr = (uint32_t)&fec->rbd_base[i];
> >> +                  for (; i<= fec->rbd_index + size; i++) {
> >> +                          if (i == (FEC_RBD_NUM - 1))
> >> +                                  fec_rbd_clean(1,&fec->rbd_base[i]);
> >> +                          else
> >> +                                  fec_rbd_clean(0,&fec->rbd_base[i]);
> >> +                  }
> >> +                  flush_dcache_range(addr,
> >> +                          addr + CONFIG_FEC_DATA_ALIGNMENT);
> >> +          }
> >> +
> > 
> > This was the worst part. I don't quite remember how and why I did those
> > alignment decisions (well it's obvious here, you flush rxdesc once whole
> > cacheline is done), but some of the pieces are far less obvious now that
> > I read the code.
> 
> I'm not grokking this one either. Definitely needs fresher eyes than I have
> at the moment.

Indeed, I'm thinking the same thing. Let's sleep on this!

Thanks for your good work on putting this into shape!

M
_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Reply via email to