On Aug 9, 9:45 pm, "Mark T" <[EMAIL PROTECTED]> wrote: > <[EMAIL PROTECTED]> wrote in message > > news:[EMAIL PROTECTED] > > > > > Hi all! I'm implementing one of my first multithreaded apps, and have > > gotten to a point where I think I'm going off track from a standard > > idiom. Wondering if anyone can point me in the right direction. > > > The script will run as a daemon and watch a given directory for new > > files. Once it determines that a file has finished moving into the > > watch folder, it will kick off a process on one of the files. Several > > of these could be running at any given time up to a max number of > > threads. > > > Here's how I have it designed so far. The main thread starts a > > Watch(threading.Thread) class that loops and searches a directory for > > files. It has been passed a Queue.Queue() object (watch_queue), and > > as it finds new files in the watch folder, it adds the file name to > > the queue. > > > The main thread then grabs an item off the watch_queue, and kicks off > > processing on that file using another class Worker(threading.thread). > > > My problem is with communicating between the threads as to which files > > are currently processing, or are already present in the watch_queue so > > that the Watch thread does not continuously add unneeded files to the > > watch_queue to be processed. For example...Watch() finds a file to be > > processed and adds it to the queue. The main thread sees the file on > > the queue and pops it off and begins processing. Now the file has > > been removed from the watch_queue, and Watch() thread has no way of > > knowing that the other Worker() thread is processing it, and shouldn't > > pick it up again. So it will see the file as new and add it to the > > queue again. PS.. The file is deleted from the watch folder after it > > has finished processing, so that's how i'll know which files to > > process in the long term. > > > I made definite progress by creating two queues...watch_queue and > > processing_queue, and then used lists within the classes to store the > > state of which files are processing/watched. > > > I think I could pull it off, but it has got very confusing quickly, > > trying to keep each thread's list and the queue always in sync with > > one another. The easiset solution I can see is if my threads could > > read an item from the queue without removing it from the queue and > > only remove it when I tell it to. Then the Watch() thread could then > > just follow what items are on the watch_queue to know what files to > > add, and then the Worker() thread could intentionally remove the item > > from the watch_queue once it has finished processing it. > > > Now that I'm writing this out, I see a solution by over-riding or > > wrapping Queue.Queue().get() to give me the behavior I mention above. > > > I've noticed .join() and .task_done(), but I'm not sure of how to use > > them properly. Any suggestions would be greatly appreciated. > > > ~Sean > > Just rename the file. We've used that technique in a similar application at > my work for years where a service looks for files of a particular extension > to appear in a directory. When the service sees a file, in renames it to a > different extension and spins off a thread to process the contents. > > -Mark T.
I ended up taking this route for the most part. The worker thread first moves the file to be processed into a temp directory, and the watch thread never knows about it again. I still had to implement my StateQueue(Queue.Queue) so I could implement a function to return all the items on the queue without popping them off. Thanks all for your great ideas. My current response to multi- threading... PITA! ~Sean -- http://mail.python.org/mailman/listinfo/python-list