Paul Durrant wrote:
Just doing some network TX perf. measurement on a Dell 1850 dual Xeon
box and I see that DMA mapping my buffers seems to be incredibly
costly: each mapping taking >8us, some >16us.
I dug into the ddi_dma_addr_bind_handle() operation and it seems to be
hat_getpfnum() that's slow. I used the following bit of D...
[efsys_txq_map_packet takes a packet passed down by the stack and
tries to DMA map all its dblks]
fbt::efsys_txq_map_packet:entry
{
self->in_tx = 1;
}
fbt::rootnex_dma_bindhdl:entry
/self->in_tx == 1/
{
self->in_bind = 1;
}
fbt::rootnex_dma_bindhdl:return
/self->in_bind == 1/
{
self->in_bind = 0;
}
fbt::efsys_txq_map_packet:return
{
self->in_tx = 0;
}
fbt::hat_getpfnum:entry
/self->in_bind == 1/
{
self->ts = timestamp;
}
fbt::hat_getpfnum:return
/self->ts != 0/
{
@time["getpfnum"] = quantize(timestamp - self->ts);
self->ts = 0;
}
... and got the following on a 60 second run of a tight loop (in
kernel) which allocates a single 1500 byte dblk, makes it look like an
ethernet packet and then passes it to my TX code:
getpfnum
value ------------- Distribution ------------- count
2048 | 0
4096 |@@@@@@@@@@@@@@@@@@@@@ 592964
8192 |@@@@@@@@@@@@@@@@@@@ 538248
16384 | 826
32768 | 27
65536 | 0
That seems incredibly slow. Based on these sorts of numbers bcopy()ing
network packets will probably be faster, even with a jumbo MTU.
Paul
For small interval timings, I'd trust the profile provider a lot more
as it gives a less distorted view of relative time spent. How prevalent
is hat_getpfnum in your kernel profile?
- Bart
--
Bart Smaalders Solaris Kernel Performance
[EMAIL PROTECTED] http://blogs.sun.com/barts
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org