On Wed, 2013-10-02 at 17:16 +0300, Claudiu Manoil wrote: > On 10/1/2013 9:50 PM, Scott Wood wrote: > > On Tue, 2013-10-01 at 14:38 +0300, Claudiu Manoil wrote: > >> > >> On 10/1/2013 2:22 AM, Scott Wood wrote: > >>> On Mon, 2013-09-30 at 12:44 +0300, Claudiu Manoil wrote: > >>>> +static RTXBD rtx __aligned(8); > >>>> +#define RXBD(i) rtx.rxbd[i] > >>>> +#define TXBD(i) rtx.txbd[i] > >>>> +#define GET_BD_STAT(T, i) be16_to_cpu((__force __be16)T##BD(i).status) > >>>> +#define SET_BD_STAT(T, i, v) T##BD(i).status = (__force > >>>> __u16)cpu_to_be16(v) > >>>> +#define GET_BD_BLEN(T, i) be16_to_cpu((__force __be16)T##BD(i).length) > >>>> +#define SET_BD_BLEN(T, i, v) T##BD(i).length = (__force > >>>> __u16)cpu_to_be16(v) > >>>> +#define GET_BD_BPTR(T, i) be32_to_cpu((__force __be32)T##BD(i).bufptr) > >>>> +#define SET_BD_BPTR(T, i, v) T##BD(i).bufptr = (__force > >>>> __u32)cpu_to_be32(v) > >>> > >>> Why the forcing? Can't you declare the data with __be16/__be32? > >> > >> The __force annotation is obviously needed to suppress the sparse > >> warnings due to BD data declaration type not being __bexx, but the > >> generic uintxx_t, as you noticed as well. > >> Now, why I didn't use __bexx instead? The main reason would be > >> maintainability: the DMA descriptors may not be in big endian format > >> exclusively. The eTSEC hw may actually have an option to interpret > >> the DMA descriptors in little endian format. > > > > May have or does have? > > > > If it does have such a feature, do you plan to use it? Usually I have > > not seen such features used for (e.g.) little-endian PCI devices on big > > endian hosts. > > > I has that option, but I don't really plan to use it, clearly not for > big endian hosts. The "may have" was for future little endian hosts. > But I think this option is not really needed by the uboot driver > so doing the byte swapping in software should be ok (i.e. performance > wise).
If we don't plan to use it even on future little endian hosts, then it's no big deal to use __beNN. > >>> What's wrong with: > >>> > >>> for (t = 0; in_be16(&rtx.rxbd[rx_idx].status) & RXBD_EMPTY; t++) { > >>> > >>> Or if you don't want to use I/O accessors on the DMA descriptors (Is > >>> synchronization ensured some other way? We had problems with this in > >>> the Linux driver before...): > >>> > >> > >> Synchronization here is is insured by declaring the RTXBD structure > >> type as "volatile" (see RTXBD declaration, a couple of lines above). > > > > That does not achieve hardware synchronization, and even the effects on > > the compiler are questionable due to volatile's vague semantics. > > > >> The existing code has been working this way for quite a while on the > >> mpc85xx platforms, > > > > It was "working" for a while in Linux as well, until we encountered a > > workload where it didn't (though granted, there was no volatile in that > > case). See Linux commit 3b6330ce2a3e1f152f79a6203f73d23356e243a7 > > > > Good point, I guess it would be safer too use some memory barriers > around accesses to BD fields in the uboot driver too. However some > portable barriers would be needed, eieio() doesn't have an equivalent > for ARM. > > > FWIW, I see some other places in U-Boot's TSEC driver that use > > out_be32() on the descriptors (e.g. startup_tsec after "Point to the > > buffer descriptors"). > > > > That's only to program the tbase and rbase registers with the physical > address of the Tx/Rx BD rings' base. Oops. :-P > >> Also, there doesn't seem to be other net drivers using I/O accessors > >> for the BD rings. > > > > I picked some random examples, and the first driver in Linux in which I > > could quickly find the BD rings uses I/O accessors > > (drivers/net/ethernet/realtek/8319too.c). I then checked its U-Boot > > eqivalent (drivers/net/rtl8139.c) and it also uses I/O accessors for the > > descriptors. > > > > I actually meant accessing buffer descriptor fields (like status, > length, dma address). As you can see in this example from linux > 8319too.c's Rx routine, there's no I/O accessor involved: > > /* read size+status of next frame from DMA ring buffer */ > rx_status = le32_to_cpu (*(__le32 *) (rx_ring + ring_offset)); > > However the driver does use a rmb() just before this line. Yeah, it doesn't help when both types of accesses show up when searching for the ring, and accessors exist with both possible argument orderings. Especially when a driver has custom accessors. It's OK to use explicit synchronization rather than I/O accessors, if you're careful to ensure that it's correct. It looks like drivers/net/fec_mxc.c in U-Boot is an example that uses I/O accessors on ring data, e.g.: writew(length, &fec->tbd_base[fec->tbd_index].data_length); -Scott _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot