Just doing some network TX perf. measurement on a Dell 1850 dual Xeon
box and I see that DMA mapping my buffers seems to be incredibly
costly: each mapping taking >8us, some >16us.
I dug into the ddi_dma_addr_bind_handle() operation and it seems to be
hat_getpfnum() that's slow. I used the following bit of D...

[efsys_txq_map_packet takes a packet passed down by the stack and
tries to DMA map all its dblks]

fbt::efsys_txq_map_packet:entry
{
        self->in_tx = 1;
}

fbt::rootnex_dma_bindhdl:entry
/self->in_tx == 1/
{
        self->in_bind = 1;
}

fbt::rootnex_dma_bindhdl:return
/self->in_bind == 1/
{
        self->in_bind = 0;
}

fbt::efsys_txq_map_packet:return
{
        self->in_tx = 0;
}

fbt::hat_getpfnum:entry
/self->in_bind == 1/
{
        self->ts = timestamp;
}

fbt::hat_getpfnum:return
/self->ts != 0/
{
        @time["getpfnum"] = quantize(timestamp - self->ts);
        self->ts = 0;
}

... and got the following on a 60 second run of a tight loop (in
kernel) which allocates a single 1500 byte dblk, makes it look like an
ethernet packet and then passes it to my TX code:

 getpfnum
          value  ------------- Distribution ------------- count
           2048 |                                         0
           4096 |@@@@@@@@@@@@@@@@@@@@@                    592964
           8192 |@@@@@@@@@@@@@@@@@@@                      538248
          16384 |                                         826
          32768 |                                         27
          65536 |                                         0

That seems incredibly slow. Based on these sorts of numbers bcopy()ing
network packets will probably be faster, even with a jumbo MTU.

 Paul

--
Paul Durrant
http://www.linkedin.com/in/pdurrant
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to