On (04/26/17 15:08), Joonsoo Kim wrote: > > > +struct zram_hash { > > > + spinlock_t lock; > > > + struct rb_root rb_root; > > > }; > > > > just a note. > > > > we can easily have N CPUs spinning on ->lock for __zram_dedup_get() lookup, > > which can invole a potentially slow zcomp_decompress() [zlib, for example, > > with 64k pages] and memcmp(). the larger PAGE_SHIFT is, the more serialized > > IOs become. in theory, at least. > > > > CPU0 CPU1 ... CPUN > > > > __zram_bvec_write() __zram_bvec_write() __zram_bvec_write() > > zram_dedup_find() zram_dedup_find() zram_dedup_find() > > spin_lock(&hash->lock); > > spin_lock(&hash->lock); > > spin_lock(&hash->lock); > > __zram_dedup_get() > > zcomp_decompress() > > ... > > > > > > so may be there is a way to use read-write lock instead on spinlock for hash > > and reduce write/read IO serialization. > > In fact, dedup release hash->lock before doing zcomp_decompress(). So, > above contention cannot happen.
oh, my bad. you are right. somehow I didn't spot the unlock. -ss