Charles-Francois Natali <neolo...@free.fr> added the comment: > Big dirs are really slow to read at once. If user wants to read items one by > one like here
The problem is that readdir doesn't read a directory entry one at a time. When you call readdir on an open DIR * for the first time, the libc calls the getdents syscall, requesting a whole bunch of dentry at a time (32768 on my box). Then, the subsequent readdir calls are virtually free, and don't involve any syscall/IO at all (that is, until you hit the last cached dent, and then another getdents is performed until end of directory). > Also, dir_cache in kernel used more effectively. You mean the dcache ? Could you elaborate ? > also, forgot... memory usage on big directories using list is a pain. This would indeed be a good reason. Do you have numbers ? > A generator listdir() geared towards performance should probably be able to > work in batches, e.g. read 100 entries at once and buffer them in some > internal storage (that might mean use readdir_r()). That's exactly what readdir is doing :-) > Bonus points if it doesn't release the GIL around each individual entry, but > also batches that. Yes, since only one in 2**15 readdir call actually blocks, that could be a nice optimization (I've no idea of the potential gain though). > Big dirs are really slow to read at once. Are you using EXT3 ? There are records of performance issues with getdents on EXT2/3 filesystems, see: http://lwn.net/Articles/216948/ and this nice post by Linus: https://lkml.org/lkml/2007/1/7/149 Could you provide the output of an "strace -ttT python <test script>" (and also the time spent in os.listdir) ? ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue11406> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com