01/11/2020 10:12, Morten Brørup: > One thing has always puzzled me: > Why do we use 64 bits to indicate which memory pool > an mbuf belongs to? > The portid only uses 16 bits and an indirection index. > Why don't we use the same kind of indirection index for mbuf pools?
I wonder what would be the cost of indirection. Probably neglectible. I think it is a good proposal... ... for next year, after a deprecation notice. > I can easily imagine using one mbuf pool (or perhaps a few pools) > per CPU socket (or per physical memory bus closest to an attached NIC), > but not more than 256 mbuf memory pools in total. > So, let's introduce an mbufpoolid like the portid, > and cut this mbuf field down from 64 to 8 bits. > > If we also cut down m->pkt_len from 32 to 24 bits, Who is using packets larger than 64k? Are 16 bits enough? > we can get the 8 bit mbuf pool index into the first cache line > at no additional cost. I like the idea. It means we don't need to move the pool pointer now, i.e. it does not have to replace the timestamp field. > In other words: This would free up another 64 bit field in the mbuf structure! That would be great! > And even though the m->next pointer for scattered packets resides > in the second cache line, the libraries and application knows > that m->next is NULL when m->nb_segs is 1. > This proves that my suggestion would make touching > the second cache line unnecessary (in simple cases), > even for re-initializing the mbuf. So you think the "next" pointer should stay in the second half of mbuf? I feel you would like to move the Tx offloads in the first half to improve performance of very simple apps. I am thinking the opposite: we could have some dynamic fields space in the first half to improve performance of complex Rx. Note: we can add a flag hint for field registration in this first half. > And now I will proceed out on a tangent with two more > independent thoughts, so feel free to ignore. > > Consider a multi CPU socket system with one mbuf pool > per CPU socket, the NICs attached to each CPU socket > use an RX mbuf pool with RAM on the same CPU socket. > I would imagine that (re-)initializing these mbufs could be faster > if performed only on a CPU on the same socket. > If this is the case, mbufs should be re-initialized > as part of the RX preparation at ingress, > not as part of the mbuf free at egress. > > Perhaps some microarchitectures are faster to compare > nb_segs==0 than nb_segs==1. > If so, nb_segs could be redefined to mean number of > additional segments, rather than number of segments.