On Tue, Apr 16 2019, Jacek Tomaka wrote:
> I did :
> for i in {1..100}; do cat /proc/36960/stack >$i; sleep 1; done
> in one bash and on the other one(36960):
> time -p echo 3 >/proc/sys/vm/drop_caches
>
> It took about two minutes, unfortunately most of the time it claims that it
> was not doing anything kernel side:
> [<ffffffffffffffff>] 0xffffffffffffffffThis suggests that the time taken is largely CPU time - walking lists and moving things around and freeing memory. With a slow CPU and lots of memory 2 minutes might be a reasonable number, though it seems a little long. > with the exception of two files, at 32 sec and 73 sec: > root@xxx xxx]# cat 32 > [<ffffffffc11231db>] cl_sync_file_range+0x2db/0x380 [lustre] > [<ffffffffffffffff>] 0xffffffffffffffff In this case the process probably was scheduled to another CPU while the stack was being examined, so it suddenly became undefined and the examination was aborted. > [root@xxx xxx]# cat 73 > [<ffffffffc11231db>] cl_sync_file_range+0x2db/0x380 [lustre] > [<ffffffffc11330f6>] ll_delete_inode+0xa6/0x1c0 [lustre] > [<ffffffff8121d729>] evict+0xa9/0x180 > [<ffffffff8121d83e>] dispose_list+0x3e/0x50 > [<ffffffff8121e834>] prune_icache_sb+0x174/0x340 > [<ffffffff81203863>] prune_super+0x143/0x170 > [<ffffffff81195443>] shrink_slab+0x163/0x330 > [<ffffffff812655f3>] drop_caches_sysctl_handler+0xc3/0x120 > [<ffffffff8127c203>] proc_sys_call_handler+0xd3/0xf0 > [<ffffffff8127c234>] proc_sys_write+0x14/0x20 > [<ffffffff81200cad>] vfs_write+0xbd/0x1e0 > [<ffffffff81201abf>] SyS_write+0x7f/0xe0 > [<ffffffff816b5292>] tracesys+0xdd/0xe2 > [<ffffffffffffffff>] 0xffffffffffffffff So 2 out of 100 checks were in cl_sync_file_range. Maybe it is spending 2% of the time here??? Calling ll_delete_inode is very near the end of the processes of removing an inode from the cache. By the time it gets to this point there shouldn't be any dirty data in the file, so calling cl_sync_file_range() seems odd. The fact that it (apparently) needs to wait for something to happen seems even more odd. Maybe it is releasing some locks or something like that - I'd need to dig through the code more to be sure. I doubt this contributes a lot to the time though. Unfortunately I don't think we can gather more data without something intrusive like kprobes... > > also after unmounting lustre fs and removing all modules i could relate to > lustre i could still see some vvp_object_kmem, is it expected? > [root@xxx xxx]# rmmod obdclass ptlrpc ksocklnd libcfs lnet lustre fid mdc > osc cnetmgc fld lmv lov; > rmmod: ERROR: Module obdclass is not currently loaded > rmmod: ERROR: Module ptlrpc is not currently loaded > rmmod: ERROR: Module ksocklnd is not currently loaded > rmmod: ERROR: Module libcfs is not currently loaded > rmmod: ERROR: Module lnet is not currently loaded > rmmod: ERROR: Module lustre is not currently loaded > rmmod: ERROR: Module fid is not currently loaded > rmmod: ERROR: Module mdc is not currently loaded > rmmod: ERROR: Module osc is not currently loaded > rmmod: ERROR: Module cnetmgc is not currently loaded > rmmod: ERROR: Module fld is not currently loaded > rmmod: ERROR: Module lmv is not currently loaded > rmmod: ERROR: Module lov is not currently loaded > [root@xxx xxx]# cat /proc/slabinfo |grep vvp > vvp_object_kmem 32982 33212 176 46 2 : tunables 0 0 0 > : slabdata 722 722 0 > vvp_object_kmem is created by the 'lustre' module, and should be discarded when that module is unloaded. The only explanation I can find for it remaining around is if it was shared. If two different modules both request a slab cache which would use 46 entries across 2 pages - 175 to 178 bytes per object - then they will both use the same slab cache and the name it has will be whichever name was created first. So maybe some other modules want a slab cache about that size. The other possible explanation is that the cache wasn't empty when the lustre module was unloaded. In that case you would have seen a message: kmem_cache_destroy %s: Slab cache still has objects together with a Call Trace. That would indicate a problem in lustre. NeilBrown
signature.asc
Description: PGP signature
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
