On May 21, 11:41 am, [EMAIL PROTECTED] wrote: > On May 21, 11:13 am, "A.T.Hofkamp" <[EMAIL PROTECTED]> wrote: > > > > > On 2008-05-21, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > > I'd appreciate any help. I've got a list of files in a directory, and > > > I'd like to iterate through that list and process each one. Rather > > > than do that serially, I was thinking I should start five threads and > > > process five files at a time. > > > > Is this a good idea? I picked the number five at random... I was > > > Depends what you are doing. > > If you are mainly reading/writing files, there is not much to gain, since 1 > > process will already push the disk IO system to its limit. If you do a lot > > of > > processing, then more threads than the number of processors is not much > > use. If > > you have more 'burtsy' behavior (first do lot of reading, then lot of > > processing, then again reading, etc), then the system may be able to do some > > scheduling and keep both the processors and the file system busy. > > > I cannot really give you advice on threading, I have never done that. You > > may > > want to consider an alternative, namely multi-tasking at OS level. If you > > can > > easily split the files over a number of OS processes (written in Python), > > you > > can make the Python program really simple, and let the OS handle the > > task-switching between the programs. > > > Sincerely, > > Albert > > Albert, > > Thanks for your response - I appreciate your time! > > I am mainly reading and writing files, so it seems like it might not > be a good idea. What if I read the whole file into memory first, and > operate on it there? They are not large files... > > Either way, I'd hope that someone might respond with an example, as > then I could test and see which is faster! > > Thanks again.
Ah, well, I didn't get any other responses, but here's what I've done: loopCount = 0 for l in range(len(self.filesToProcess)): threads = [] try: threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount +l]))) threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount +2]))) threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount +3]))) threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount +4]))) threads.append(threading.Thread(target=self.processFiles(self.filesToProcess[loopCount +5]))) msg = "Processing file...\n" for thread in threads: wx.CallAfter(self.textctrl03.write(msg), thread.start()) for thread in threads: thread.join() loopCount += 5 except IndexError: pass It works, and it works well. It starts five threads, and processes five files at a time. (In the "self.processFiles" I read the whole file into memory using readlines(), which works well.) Of course, now the wx.CallAfter function doesn't work... I get "TypeError: 'NoneType' object is not callable" for every time it is run... -- http://mail.python.org/mailman/listinfo/python-list