Charles-Francois Natali <neolo...@free.fr> added the comment:

> Big dirs are really slow to read at once. If user wants to read items one by 
> one like here

The problem is that readdir doesn't read a directory entry one at a time.
When you call readdir on an open DIR * for the first time, the libc calls the 
getdents syscall, requesting a whole bunch of dentry at a time (32768 on my 
box).
Then, the subsequent readdir calls are virtually free, and don't involve any 
syscall/IO at all (that is, until you hit the last cached dent, and then 
another getdents is performed until end of directory).

> Also, dir_cache in kernel used more effectively.

You mean the dcache ? Could you elaborate ?

> also, forgot... memory usage on big directories using list is a pain.

This would indeed be a good reason. Do you have numbers ?

> A generator listdir() geared towards performance should probably be able to 
> work in batches, e.g. read 100 entries at once and buffer them in some 
> internal storage (that might mean use readdir_r()).

That's exactly what readdir is doing :-)

> Bonus points if it doesn't release the GIL around each individual entry, but 
> also batches that.

Yes, since only one in 2**15 readdir call actually blocks, that could be a nice 
optimization (I've no idea of the potential gain though).

> Big dirs are really slow to read at once.

Are you using EXT3 ?
There are records of performance issues with getdents on EXT2/3 filesystems, 
see:
http://lwn.net/Articles/216948/
and this nice post by Linus:
https://lkml.org/lkml/2007/1/7/149

Could you provide the output of an "strace -ttT python <test script>"  (and 
also the time spent in os.listdir) ?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11406>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to