I have a dir with a large # of files that I need to perform operations on, but only needing to access a subset of the files, i.e. the first 100 files.
Using glob is very slow, so I ran across iglob, which returns an iterator, which seemed just like what I wanted. I could iterate over the files that I wanted, not having to read the entire dir. So the iglob was faster, but accessing the first file took about the same time as glob.glob. Here's some code to compare glob vs. iglob performance, it outputs the time before/after a glob.iglob('*.*') files.next() sequence and a glob.glob('*.*') sequence. #!/usr/bin/env python import glob,time print '\nTest of glob.iglob' print 'before iglob:', time.asctime() files = glob.iglob('*.*') print 'after iglob:',time.asctime() print files.next() print 'after files.next():', time.asctime() print '\nTest of glob.glob' print 'before glob:', time.asctime() files = glob.glob('*.*') print 'after glob:',time.asctime() Here are the results: Test of glob.iglob before iglob: Sun Jan 31 11:09:08 2010 after iglob: Sun Jan 31 11:09:08 2010 foo.bar after files.next(): Sun Jan 31 11:09:59 2010 Test of glob.glob before glob: Sun Jan 31 11:09:59 2010 after glob: Sun Jan 31 11:10:51 2010 The results are about the same for the 2 approaches, both took about 51 seconds. Am I doing something wrong with iglob? Is there a way to get the first X # of files from a dir with lots of files, that does not take a long time to run? thanx, mark -- http://mail.python.org/mailman/listinfo/python-list