On Sun, Sep 01, 2013 at 03:48:01PM -0700, Linus Torvalds wrote:
> I made DEFINE_LGLOCK use DEFINE_PER_CPU_SHARED_ALIGNED for the
> spinlock, so that each local lock gets its own cacheline, and the
> total loops jumped to 62M (from 52-54M before). So when I looked at
> the numbers, I thought "oh, that helped".
> 
> But then I looked closer, and realized that I just see a fair amount
> of boot-to-boot variation anyway (probably a lot to do with cache
> placement and how dentries got allocated etc). And it didn't actually
> help at all, the problem is stilte there, and lg_local_lock is still
> really really high on the profile, at 8% cpu time:
> 
> -   8.00%  lg_local_lock
>    - lg_local_lock
>       + 64.83% mntput_no_expire
>       + 33.81% path_init
>       + 0.78% mntput
>       + 0.58% path_lookupat
> 
> which just looks insane. And no, no lg_global_lock visible anywhere..
> 
> So it's not false sharing. But something is bouncing *that* particular
> lock around.

Hrm...  It excludes sharing between the locks, all right.  AFAICS, that
won't exclude sharing with plain per-cpu vars, will it?  Could you
tell what vfsmount_lock is sharing with on that build?  The stuff between
it and files_lock doesn't have any cross-CPU writers, but with that
change it's the stuff after it that becomes interesting...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to