On Mon, Aug 23, 2010 at 12:16:34PM -0700, Pyun YongHyeon wrote: > On Mon, Aug 23, 2010 at 09:04:02PM +0200, Andre Oppermann wrote: > > On 23.08.2010 19:52, Pyun YongHyeon wrote: > > >On Mon, Aug 23, 2010 at 12:18:01PM +0200, Andre Oppermann wrote: > > >>On 23.08.2010 11:26, Adrian Chadd wrote: > > >>>On 23 August 2010 06:27, Pyun YongHyeon<pyu...@gmail.com> wrote: > > >>> > > >>>>I recall there was SIOCSIFCAP ioctl handling bug in bce(4) on 8.0 so > > >>>>it might also disable IFCAP_TSO4/IFCAP_TXCSUM/IFCAP_RXCSUM when yo > > >>>>disabled RX checksum offloading. But I can't explain how checksum > > >>>>offloading could be related with the growth of 4k jumbo buffers. > > >>> > > >>>Neither can I! > > >>> > > >>>I'm trying to come up with a reproduction method that doesn't involve > > >>>"put box on the internet, push clients through it, wait." > > >> > > >>Network drivers use 2k sized mbuf clusters on receive. So the problem > > >>doesn't seem to be RX related. > > >> > > > > > >bce(4) is special in this regards. The controller would allocate > > >jumbo cluster on RX if jumbo frame is used. If header splitting is > > >used, driver will use normal mbuf clusters. > > > > Didn't know that. > > > > >>The function that is called on a socket write is sosend_generic() which > > >>makes use of m_getm2(). This function allocates mbuf chains with the > > >>tightest packing it can achieve. It will make use 4k (page size) mbufs > > >>as much as it can. This is where they come from. > > >> > > >>It seems the 4k clusters do not get freed back to the pool after they've > > >>been sent by the NIC and dropped from the socket buffer after the ACK has > > >>arrived. The leak must occur in one of these two places. The socket > > >>buffer is unlikely as it would affect not just you but everyone else too. > > >>Thus the mbuf freeing after DMA/tx in the bce(4) driver is the prime > > >>suspect. > > >> > > > > > >I know bce(4) has a couple of bug in TX path(wrong dma tag, lack of > > >bus_dmamap_sync(9) etc) but this is the same code path with/without > > >TX checksum offloading. This is one of reason why I still do not > > >understand what's really happening here. TX checksum offloading may > > >introduce additional frame processing time to fill internal FIFO to > > >compute checksum before transmitting the frame to wire such that it > > >can change timing of TX path. This timing change might trigger the > > >TX path bug. It's just vague guessing though. > > > > Had a chat with clau...@openbsd and he said that the bce(4) DMA engine > > can only access the first 1GB of physical RAM and has to use bounce > > buffers all the time. Maybe this is related. > > > > Really? I don't remember I saw such a DMA address space limitation > in data sheet. And I don't think Broadcom made such a horrible > thing for controllers targeted for servers. The only limitation I > know is BCM5708 is not able to handle DMA addresses greater than > 40bits so bce(4) limits the DMA address space in DMA tag creation.
Ugh, FreeBSD bce(4) != OpenBSD bce(4). I was talking of the old Broadcom BCM4401 and that chip has such a stupid limit. The NetXtreme II are called bnx(4) in OpenBSD. Sorry. Yes, the DMA engine of the NetXtreme II can address more then 4G. -- :wq Claudio _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"