From: Alexander Duyck
> Sent: 01 February 2016 18:18
> >> 1) Unaligned accesses
> >
> > Remember the even if you do a 'realignment copy' of the IP header,
> > at some point the 'userdata' of the packet has to be accessed.
> > Mostly this will be with memcpy() and you want that copy to be aligned.
> 
> The problem is, aligned by what?  For x86 anything other than 16 byte
> alignment will require some realignment anyway since internally it is
> essentially doing some SSE style copy based on rep movsb.

That isn't 'SSE' based, there is probably some kind of barrel shifter
associated with the data cache that allows cache-line reads from
the 'wrong' address.

> I'm sure it
> is similar for other architectures that have moved on from 32b to 64b
> in that they want to move by the largest available data unit and 2
> byte alignment vs 4 byte doesn't make that much difference anymore.

There are still plently of 32bit systems out there.
Including SoC systems with cpus that fault on misaligned transfers on
the same silicon as ethernet MACs that can't write 4n+2 aligned frames.

> The thing you have to keep in mind is that your standard TCP frame has
> 66 bytes of headers(14 Ethernet, 20 IP, 20 TCP, 12 TCP options).  This
> means that you are crossing over into the next cache line by either 2
> or 4 bytes depending on NET_IP_ALIGN.  On a system with an 8 byte long
> it doesn't really make a difference one way or the other since it is
> still unaligned in terms of the fastest possible memcpy.

Unless you align the frame on an 8n+6 boundary.

> > I really can't believe just how many ethernet chips are now being designed
> > that can't write received frames to '4n+2' boundaries.
> > It isn't even a new problem. Sun fixed the sbus 'DMA' part so that it would
> > to sbus burst transfers to 4n+2 aligned buffers a long time ago...
> 
> The problem here is motivation.  It makes sense that Sun would want
> their headers to be IP aligned since a Sparc processor still needs
> that.  Most NIC vendors don't really care about the more obscure
> architectures, and even 32 bit is falling out of favor.  The market
> share is just too small and in many cases customers don't run Linux
> OSes on them anyway so even driver support is minimal.  They mostly
> care about x86_64, PowerPC, and 64 bit ARM.  And all 3 of those
> architectures have efficient unaligned accesses.

With the possible exception of the x86 optimised 'rep movsb' on
very recent intel cpus, unaligned transfers are very likely
slower than aligned ones - just a lot faster than aligning the
copy in software.

My suspicion is that many hardware engineers are just not aware
of the problem. Or just leave it as a 'software problem'.

It is the same family of 'problem' that adds more and more
protocol specific 1s compliment checksum offloading.

        David.


Reply via email to