On Wed, Sep 04, 2013 at 05:33:05PM +0900, Joonsoo Kim wrote:
> On Tue, Sep 03, 2013 at 02:15:42PM +0000, Christoph Lameter wrote:
> > On Mon, 2 Sep 2013, Joonsoo Kim wrote:
> > 
> > > This patchset implements byte sized indexes for the freelist of a slab.
> > >
> > > Currently, the freelist of a slab consist of unsigned int sized indexes.
> > > Most of slabs have less number of objects than 256, so much space is 
> > > wasted.
> > > To reduce this overhead, this patchset implements byte sized indexes for
> > > the freelist of a slab. With it, we can save 3 bytes for each objects.
> > >
> > > This introduce one likely branch to functions used for setting/getting
> > > objects to/from the freelist, but we may get more benefits from
> > > this change.
> > >
> > > Below is some numbers of 'cat /proc/slabinfo' related to my previous 
> > > posting
> > > and this patchset.
> > 
> > You  may also want to run some performance tests. The cache footprint
> > should also be reduced with this patchset and therefore performance should
> > be better.
> 
> Yes, I did a hackbench test today, but I'm not ready for posting it.
> The performance is improved for my previous posting and futher improvement is
> founded by this patchset. Perhaps I will post it tomorrow.
> 

Here are the results from both patchsets on my 4 cpus machine.

* Before *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 
runs):

       238,309,671 cache-misses                                                 
 ( +-  0.40% )

      12.010172090 seconds time elapsed                                         
 ( +-  0.21% )

* After my previous posting *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 
runs):

       229,945,138 cache-misses                                                 
 ( +-  0.23% )

      11.627897174 seconds time elapsed                                         
 ( +-  0.14% )


* After my previous posting + this patchset *

 Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 
runs):

       218,640,472 cache-misses                                                 
 ( +-  0.42% )

      11.504999837 seconds time elapsed                                         
 ( +-  0.21% )



cache-misses are reduced whenever applying each patchset, roughly 5% 
respectively.
And elapsed times are also improved by 3.1% and 4.2% to baseline, respectively.

I think that all patchsets deserve to be merged, since it reduces memory usage 
and
also improves performance. :)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to