On Mon, Apr 28, 2008 at 5:01 PM,  <[EMAIL PROTECTED]> wrote:
> Hey Dude,
>  I pulled down a copy of your test program and ran a few experiments.
>
>  $ time ./xml
>  100000 iter in 22.715982 sec
>
>  real    0m22.721s
>  user    0m22.694s
>  sys     0m0.007s
>
>  This seems to indicate that all of our time is being spent in usermode, so
>  whatever it is in Solaris that is slower than Linux, it's not the operating
>  system.
>
>  Just to double check, I used the following dtrace to look at the time this 
> application is spending in syscalls:
>
>  # dtrace -n 'syscall:::entry /execname == "xml"/ { self->traced = timestamp; 
> self->func = probefunc;}  syscall:::return /self->traced/ { @a[self->func] = 
> sum(timestamp - self->traced); @b[self->func] = count(); self->func = (char 
> *)0; self->timestamp = 0;}'
>
>  (This returns the time xml spends in syscalls as well as the number of 
> syscalls
>  made)
>
>  Time (nanoseconds):
>
>   getpid                                                         1978
>   sysconfig                                                      2229
>   sigpending                                                     3071
>   sysi86                                                         3529
>   getrlimit                                                      4234
>   setcontext                                                     5763
>   fstat64                                                        7042
>   close                                                         16606
>   write                                                         19898
>   getcwd                                                        21302
>   ioctl                                                         23668
>   munmap                                                        25757
>   brk                                                           28445
>   open                                                          40712
>   resolvepath                                                   51777
>   xstat                                                         57616
>   mmap                                                         159275
>   memcntl                                                      267070
>
>  Number of invocations:
>
>   fstat64                                                           1
>   getcwd                                                            1
>   getpid                                                            1
>   getrlimit                                                         1
>   ioctl                                                             1
>   sigpending                                                        1
>   sysconfig                                                         1
>   sysi86                                                            1
>   write                                                             1
>   setcontext                                                        2
>   memcntl                                                           6
>   close                                                             7
>   munmap                                                            8
>   open                                                              8
>   resolvepath                                                       9
>   xstat                                                             9
>   brk                                                              10
>   mmap                                                             32
>
>  This agrees with the output from time.  You're spending a miniscule amount
>  of system time performing memory operations.
>
>  The next place to look would be at libxml2.  What library versions do
>  you have installed on the different machines.  If the library versions
>  don't match, or you've compiled with a different set of options, or used
>  a different compiler to build the library, you'll get different results
>  for the benchmark.
>
>  I ran my test on snv_84.  It has version 2.6.23 installed.
>
>  Another approach would be to use DTrace's profile provider.  I obtained
>  usable results by performing the following:
>
>  # dtrace -n 'profile-1001 /execname == "xml"/ [EMAIL PROTECTED]()] = 
> count();} END { trunc(@a, 20);}' -c /tmp/xml
>
>  This returns the 20 most frequent stack frames (and the number of times
>  the occurred) when DTrace runs a profile probe at 1001hz.
>
>  xmlDictLookup shows up as a popular function, but that may be a red
>  herring.  I'll include the top 5 here, but you can run this command, or 
> adjust
>  the truncation to view any amount you'd like:
>
>               libc.so.1`_free_unlocked+0x4c
>               libc.so.1`free+0x2b
>               libxml2.so.2`xmlFreeNodeList+0x19e
>               libxml2.so.2`xmlFreeNodeList+0xb6
>               libxml2.so.2`xmlFreeNodeList+0xb6
>               libxml2.so.2`xmlFreeDoc+0xac
>               xml`main+0x55
>               xml`_start+0x80
>                61
>
>               libxml2.so.2`xmlDictLookup+0x60
>               libxml2.so.2`xmlParseNCName+0xab
>               libxml2.so.2`xmlParseQName+0x44
>               libxml2.so.2`xmlParseAttribute2+0x54
>               libxml2.so.2`xmlParseStartTag2+0x246
>               libxml2.so.2`xmlParseElement+0x99
>               libxml2.so.2`xmlParseContent+0x12c
>               libxml2.so.2`xmlParseElement+0x134
>               libxml2.so.2`xmlParseContent+0x12c
>               libxml2.so.2`xmlParseElement+0x134
>               libxml2.so.2`xmlParseDocument+0x33b
>               libxml2.so.2`xmlDoRead+0x6b
>               libxml2.so.2`xmlReadMemory+0x36
>               xml`main+0x4c
>               xml`_start+0x80
>                66
>
>               libxml2.so.2`xmlParseStartTag2+0xb7e
>               libxml2.so.2`xmlParseElement+0x99
>               libxml2.so.2`xmlParseContent+0x12c
>               libxml2.so.2`xmlParseElement+0x134
>               libxml2.so.2`xmlParseContent+0x12c
>               libxml2.so.2`xmlParseElement+0x134
>               libxml2.so.2`xmlParseDocument+0x33b
>               libxml2.so.2`xmlDoRead+0x6b
>               libxml2.so.2`xmlReadMemory+0x36
>               xml`main+0x4c
>               xml`_start+0x80
>                66
>
>               libc.so.1`_free_unlocked+0x53
>               libc.so.1`free+0x2b
>               libxml2.so.2`xmlFreeNodeList+0x19e
>               libxml2.so.2`xmlFreeNodeList+0xb6
>               libxml2.so.2`xmlFreeNodeList+0xb6
>               libxml2.so.2`xmlFreeDoc+0xac
>               xml`main+0x55
>               xml`_start+0x80
>                72
>
>               libxml2.so.2`xmlDictLookup+0x60
>               libxml2.so.2`xmlParseNCName+0xab
>               libxml2.so.2`xmlParseQName+0x44
>               libxml2.so.2`xmlParseStartTag2+0x14c
>               libxml2.so.2`xmlParseElement+0x99
>               libxml2.so.2`xmlParseContent+0x12c
>               libxml2.so.2`xmlParseElement+0x134
>               libxml2.so.2`xmlParseContent+0x12c
>               libxml2.so.2`xmlParseElement+0x134
>               libxml2.so.2`xmlParseDocument+0x33b
>               libxml2.so.2`xmlDoRead+0x6b
>               libxml2.so.2`xmlReadMemory+0x36
>               xml`main+0x4c
>               xml`_start+0x80
>                80
>
>  Hope this helps.  It may be that you've got a newer / more optimized
>  version of libxml2 on your Linux box.

Thanks for the reply. When I mentioned memory operations, I probably
should have been more specific. The libxml test case I sent makes heavy
use of malloc, and according to the Sun Studio collect utility, the calls to
malloc are taking up a significant percentage of the overall execution time.
Here is a snippet from the collect output:

Excl.     Incl.      Name
User CPU  User CPU
  sec.      sec.
26.789    26.789     <Total>
 2.292    10.497     umem_cache_alloc
 1.391     3.062     xmlDictLookup
 1.381     1.381     mutex_unlock
 1.301     5.084     umem_slab_alloc
 1.241     2.742     umem_slab_create
 1.091    15.671     xmlParseStartTag2
 0.931     0.931     mutex_lock_impl
 0.801     1.771     umem_depot_alloc
 0.801     3.052     xmlParseNCName
 0.771     1.701     mutex_lock
 0.751     9.407     xmlSAX2StartElementNs
 0.680    11.768     malloc
 0.680    25.488     xmlParseElement
 0.640     8.506     xmlSAX2TextNode
 0.630     6.935     xmlParseCharData
 0.590    11.088     umem_alloc
 0.580     0.580     memcpy
 0.490     0.490     xmlDictComputeKey
 0.480     0.480     mutex_trylock
 0.420     6.344     xmlSAX2Characters
 0.420     0.420     xmlSkipBlankChars
 0.380    24.517     xmlParseContent
 0.380     0.380     xmlStrEqual
 0.360     1.771     xmlNewPropInternal
 0.330     0.330     xmlNextChar
 0.320     0.330     xmlParseAttValueInternal

Has anyone seen this behavior before? Has anyone found any compiler
flags or light weight memory allocators that might help?

Thanks,
- Ryan
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to