> Date: Tue, 31 Aug 2010 18:49:06 -0600 > From: Theo de Raadt <[email protected]> > > > > Date: Tue, 31 Aug 2010 22:30:42 +0100 > > > From: Stuart Henderson <[email protected]> > > > > > > running with this seems to help. > > > > Can you try this diff instead? > > > > Index: if_vr.c > > =================================================================== > > RCS file: /cvs/src/sys/dev/pci/if_vr.c,v > > retrieving revision 1.105 > > diff -u -p -r1.105 if_vr.c > > --- if_vr.c 19 May 2010 15:27:35 -0000 1.105 > > +++ if_vr.c 31 Aug 2010 21:57:55 -0000 > > @@ -1562,6 +1562,10 @@ vr_alloc_mbuf(struct vr_softc *sc, struc > > d = r->vr_ptr; > > d->vr_data = htole32(r->vr_map->dm_segs[0].ds_addr); > > d->vr_ctl = htole32(VR_RXCTL | VR_RXLEN); > > + > > + bus_dmamap_sync(sc->sc_dmat, sc->sc_listmap, 0, > > + sc->sc_listmap->dm_mapsize, BUS_DMASYNC_PREWRITE); > > + > > d->vr_status = htole32(VR_RXSTAT); > > > > bus_dmamap_sync(sc->sc_dmat, sc->sc_listmap, 0, > > > > #define bus_dmamap_sync(t, p, o, l, ops) \ > (void)((t)->_dmamap_sync ? \ > (*(t)->_dmamap_sync)((t), (p), (o), (l), (ops)) : (void)0) > > I expect it to still fail in exactly the same way, with gcc > re-organizing the writes.
Well, sthen@'s tests suggest otherwise, but you're right in that this may not future proof. > Only a global function call will fix it, Yes, and we should fix our bus_dmamap_sync implementations to be real functions, or at least include a proper memory barrier that is respected by the compiler. > or, making the pointer volatile so that operations on it are > ordered. Unfortunately that isn't good enough. Let me explain: The vr(4) Rx ring descriptors have a bit in them that indicates whether the descriptor is "owned" by software (us) or hardware (the chip). You set up the descriptor and then flip that bit to make the descriptor available to the hardware. It is important that the hardware doesn't see the flipped bit before it sees the updated descriptor. On strictly in-order and cache coherent architectures (like i386) it is enough to make that the store that flips the ownership bit happens after the stores to the other parts of the descriptor. The hardware will perform the stores in the same order, and there are no issues with the hardware not seeing parts of the descriptor that are still cached. As long as we make sure the compiler doesn't screw us by reordering the instructions that issue the stores, we're fine. On architectures that can reorder stores things get more difficult. The only way we can make sure that the store to the "ownership" bit happens after the stores that set up the descriptor is by putting a memory barrier in between those stores. Inserting a bus_dmamap_sync() call in between is one of the few portable ways to achieve this. Obviously bus_dmamap_sync() must be written in a way that prevents the compiler from moving load or store instructions past it. At that point markiing the pointer to the descriptor volatile is pointless. On architectures that are not cache coherent, there also is an issue. If the descriptor spans multiple cache lines, the hardware will see the descriptor updates in the order the lines are flushed. Since the vr(4) descriptors have the "onership" bit in the first word of the descriptor, this means that it is possible (even likely) that the hardware sees the "ownership" bit being flipped before it sees the fully updated descriptor. The only way to avoid this issue is to flush the cache twice using bus_dmamap_sync(). Again making the pointer to the descriptor volatile doesn't help a thing. It is this reasoning that made me come up with the alternative diff I sen st...@.
