If your application is single threaded, you could try using the
bsdmalloc library.  This is a fast malloc, but it is not multi-thread
safe and will also tend to use more memory than the default
malloc.  For  a comparison of different malloc libraries, look
at the NOTES section at the end of umem_alloc(3MALLOC).

I got the following result with your example code:

$ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c
$ ./xml
100000 iter in 21.445672 sec
$
$ gcc -O3 -o xml `/usr/bin/xml2-config --libs --cflags` xml.c -lbsdmalloc
$ ./xml
100000 iter in 12.761969 sec
$

I got similar results using Sun Studio 12.

Again, bsdmalloc is not multi-thread safe, so use it with caution.

HTH,
David

----- Original Message -----
From: Matty <[EMAIL PROTECTED]>
Date: Wednesday, April 30, 2008 2:22 pm
Subject: Re: [perf-discuss] Application runs almost 2x slower on Nevada than 
Linux
To: perf-discuss@opensolaris.org, [EMAIL PROTECTED]


> On Mon, Apr 28, 2008 at 5:01 PM,  <[EMAIL PROTECTED]> wrote:
> > Hey Dude,
> >  I pulled down a copy of your test program and ran a few experiments.
> >
> >  $ time ./xml
> >  100000 iter in 22.715982 sec
> >
> >  real    0m22.721s
> >  user    0m22.694s
> >  sys     0m0.007s
> >
> >  This seems to indicate that all of our time is being spent in 
> usermode, so
> >  whatever it is in Solaris that is slower than Linux, it's not the operating
> >  system.
> >
> >  Just to double check, I used the following dtrace to look at the 
> time this application is spending in syscalls:
> >
> >  # dtrace -n 'syscall:::entry /execname == "xml"/ { self->traced = 
> timestamp; self->func = probefunc;}  syscall:::return /self->traced/ { 
> @a[self->func] = sum(timestamp - self->traced); @b[self->func] = 
> count(); self->func = (char *)0; self->timestamp = 0;}'
> >
> >  (This returns the time xml spends in syscalls as well as the number 
> of syscalls
> >  made)
> >
> >  Time (nanoseconds):
> >
> >   getpid                                                         1978
> >   sysconfig                                                      2229
> >   sigpending                                                     3071
> >   sysi86                                                         3529
> >   getrlimit                                                      4234
> >   setcontext                                                     5763
> >   fstat64                                                        7042
> >   close                                                         16606
> >   write                                                         19898
> >   getcwd                                                        21302
> >   ioctl                                                         23668
> >   munmap                                                        25757
> >   brk                                                           28445
> >   open                                                          40712
> >   resolvepath                                                   51777
> >   xstat                                                         57616
> >   mmap                                                         159275
> >   memcntl                                                      267070
> >
> >  Number of invocations:
> >
> >   fstat64                                                           
> 1
> >   getcwd                                                            
> 1
> >   getpid                                                            
> 1
> >   getrlimit                                                         
> 1
> >   ioctl                                                             
> 1
> >   sigpending                                                        
> 1
> >   sysconfig                                                         
> 1
> >   sysi86                                                            
> 1
> >   write                                                             
> 1
> >   setcontext                                                        
> 2
> >   memcntl                                                           
> 6
> >   close                                                             
> 7
> >   munmap                                                            
> 8
> >   open                                                              
> 8
> >   resolvepath                                                       
> 9
> >   xstat                                                             
> 9
> >   brk                                                              10
> >   mmap                                                             32
> >
> >  This agrees with the output from time.  You're spending a miniscule 
> amount
> >  of system time performing memory operations.
> >
> >  The next place to look would be at libxml2.  What library versions 
> do
> >  you have installed on the different machines.  If the library versions
> >  don't match, or you've compiled with a different set of options, or 
> used
> >  a different compiler to build the library, you'll get different results
> >  for the benchmark.
> >
> >  I ran my test on snv_84.  It has version 2.6.23 installed.
> >
> >  Another approach would be to use DTrace's profile provider.  I obtained
> >  usable results by performing the following:
> >
> >  # dtrace -n 'profile-1001 /execname == "xml"/ [EMAIL PROTECTED]()] = 
> count();} END { trunc(@a, 20);}' -c /tmp/xml
> >
> >  This returns the 20 most frequent stack frames (and the number of times
> >  the occurred) when DTrace runs a profile probe at 1001hz.
> >
> >  xmlDictLookup shows up as a popular function, but that may be a red
> >  herring.  I'll include the top 5 here, but you can run this 
> command, or adjust
> >  the truncation to view any amount you'd like:
> >
> >               libc.so.1`_free_unlocked+0x4c
> >               libc.so.1`free+0x2b
> >               libxml2.so.2`xmlFreeNodeList+0x19e
> >               libxml2.so.2`xmlFreeNodeList+0xb6
> >               libxml2.so.2`xmlFreeNodeList+0xb6
> >               libxml2.so.2`xmlFreeDoc+0xac
> >               xml`main+0x55
> >               xml`_start+0x80
> >                61
> >
> >               libxml2.so.2`xmlDictLookup+0x60
> >               libxml2.so.2`xmlParseNCName+0xab
> >               libxml2.so.2`xmlParseQName+0x44
> >               libxml2.so.2`xmlParseAttribute2+0x54
> >               libxml2.so.2`xmlParseStartTag2+0x246
> >               libxml2.so.2`xmlParseElement+0x99
> >               libxml2.so.2`xmlParseContent+0x12c
> >               libxml2.so.2`xmlParseElement+0x134
> >               libxml2.so.2`xmlParseContent+0x12c
> >               libxml2.so.2`xmlParseElement+0x134
> >               libxml2.so.2`xmlParseDocument+0x33b
> >               libxml2.so.2`xmlDoRead+0x6b
> >               libxml2.so.2`xmlReadMemory+0x36
> >               xml`main+0x4c
> >               xml`_start+0x80
> >                66
> >
> >               libxml2.so.2`xmlParseStartTag2+0xb7e
> >               libxml2.so.2`xmlParseElement+0x99
> >               libxml2.so.2`xmlParseContent+0x12c
> >               libxml2.so.2`xmlParseElement+0x134
> >               libxml2.so.2`xmlParseContent+0x12c
> >               libxml2.so.2`xmlParseElement+0x134
> >               libxml2.so.2`xmlParseDocument+0x33b
> >               libxml2.so.2`xmlDoRead+0x6b
> >               libxml2.so.2`xmlReadMemory+0x36
> >               xml`main+0x4c
> >               xml`_start+0x80
> >                66
> >
> >               libc.so.1`_free_unlocked+0x53
> >               libc.so.1`free+0x2b
> >               libxml2.so.2`xmlFreeNodeList+0x19e
> >               libxml2.so.2`xmlFreeNodeList+0xb6
> >               libxml2.so.2`xmlFreeNodeList+0xb6
> >               libxml2.so.2`xmlFreeDoc+0xac
> >               xml`main+0x55
> >               xml`_start+0x80
> >                72
> >
> >               libxml2.so.2`xmlDictLookup+0x60
> >               libxml2.so.2`xmlParseNCName+0xab
> >               libxml2.so.2`xmlParseQName+0x44
> >               libxml2.so.2`xmlParseStartTag2+0x14c
> >               libxml2.so.2`xmlParseElement+0x99
> >               libxml2.so.2`xmlParseContent+0x12c
> >               libxml2.so.2`xmlParseElement+0x134
> >               libxml2.so.2`xmlParseContent+0x12c
> >               libxml2.so.2`xmlParseElement+0x134
> >               libxml2.so.2`xmlParseDocument+0x33b
> >               libxml2.so.2`xmlDoRead+0x6b
> >               libxml2.so.2`xmlReadMemory+0x36
> >               xml`main+0x4c
> >               xml`_start+0x80
> >                80
> >
> >  Hope this helps.  It may be that you've got a newer / more optimized
> >  version of libxml2 on your Linux box.
> 
> Thanks for the reply. When I mentioned memory operations, I probably
> should have been more specific. The libxml test case I sent makes heavy
> use of malloc, and according to the Sun Studio collect utility, the 
> calls to
> malloc are taking up a significant percentage of the overall execution 
> time.
> Here is a snippet from the collect output:
> 
> Excl.     Incl.      Name
> User CPU  User CPU
>   sec.      sec.
> 26.789    26.789     <Total>
>  2.292    10.497     umem_cache_alloc
>  1.391     3.062     xmlDictLookup
>  1.381     1.381     mutex_unlock
>  1.301     5.084     umem_slab_alloc
>  1.241     2.742     umem_slab_create
>  1.091    15.671     xmlParseStartTag2
>  0.931     0.931     mutex_lock_impl
>  0.801     1.771     umem_depot_alloc
>  0.801     3.052     xmlParseNCName
>  0.771     1.701     mutex_lock
>  0.751     9.407     xmlSAX2StartElementNs
>  0.680    11.768     malloc
>  0.680    25.488     xmlParseElement
>  0.640     8.506     xmlSAX2TextNode
>  0.630     6.935     xmlParseCharData
>  0.590    11.088     umem_alloc
>  0.580     0.580     memcpy
>  0.490     0.490     xmlDictComputeKey
>  0.480     0.480     mutex_trylock
>  0.420     6.344     xmlSAX2Characters
>  0.420     0.420     xmlSkipBlankChars
>  0.380    24.517     xmlParseContent
>  0.380     0.380     xmlStrEqual
>  0.360     1.771     xmlNewPropInternal
>  0.330     0.330     xmlNextChar
>  0.320     0.330     xmlParseAttValueInternal
> 
> Has anyone seen this behavior before? Has anyone found any compiler
> flags or light weight memory allocators that might help?
> 
> Thanks,
> - Ryan
> _______________________________________________
> perf-discuss mailing list
> perf-discuss@opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to