On Tue, Aug 11, 2020 at 12:37:53PM +0900, Daeho Jeong wrote: > From: Daeho Jeong <daehoje...@google.com> > > By profiling f2fs compression works, I've found vmap() callings are > bottlenecks of f2fs decompression path. Changing these with > vm_map_ram(), we can enhance f2fs decompression speed pretty much. > > [Verification] > dd if=/dev/zero of=dummy bs=1m count=1000 > echo 3 > /proc/sys/vm/drop_caches > dd if=dummy of=/dev/zero bs=512k > > - w/o compression - > 1048576000 bytes (0.9 G) copied, 1.999384 s, 500 M/s > 1048576000 bytes (0.9 G) copied, 2.035988 s, 491 M/s > 1048576000 bytes (0.9 G) copied, 2.039457 s, 490 M/s > > - before patch - > 1048576000 bytes (0.9 G) copied, 9.146217 s, 109 M/s > 1048576000 bytes (0.9 G) copied, 9.997542 s, 100 M/s > 1048576000 bytes (0.9 G) copied, 10.109727 s, 99 M/s > > - after patch - > 1048576000 bytes (0.9 G) copied, 2.253441 s, 444 M/s > 1048576000 bytes (0.9 G) copied, 2.739764 s, 365 M/s > 1048576000 bytes (0.9 G) copied, 2.185649 s, 458 M/s
Indeed, vmap() approach has some impact on the whole workflow. But I don't think the gap is such significant, maybe it relates to unlocked cpufreq (and big little core difference if it's on some arm64 board).