Just doing some network TX perf. measurement on a Dell 1850 dual Xeon box and I see that DMA mapping my buffers seems to be incredibly costly: each mapping taking >8us, some >16us. I dug into the ddi_dma_addr_bind_handle() operation and it seems to be hat_getpfnum() that's slow. I used the following bit of D...
[efsys_txq_map_packet takes a packet passed down by the stack and tries to DMA map all its dblks] fbt::efsys_txq_map_packet:entry { self->in_tx = 1; } fbt::rootnex_dma_bindhdl:entry /self->in_tx == 1/ { self->in_bind = 1; } fbt::rootnex_dma_bindhdl:return /self->in_bind == 1/ { self->in_bind = 0; } fbt::efsys_txq_map_packet:return { self->in_tx = 0; } fbt::hat_getpfnum:entry /self->in_bind == 1/ { self->ts = timestamp; } fbt::hat_getpfnum:return /self->ts != 0/ { @time["getpfnum"] = quantize(timestamp - self->ts); self->ts = 0; } ... and got the following on a 60 second run of a tight loop (in kernel) which allocates a single 1500 byte dblk, makes it look like an ethernet packet and then passes it to my TX code: getpfnum value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@ 592964 8192 |@@@@@@@@@@@@@@@@@@@ 538248 16384 | 826 32768 | 27 65536 | 0 That seems incredibly slow. Based on these sorts of numbers bcopy()ing network packets will probably be faster, even with a jumbo MTU. Paul -- Paul Durrant http://www.linkedin.com/in/pdurrant _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org