[Twisted-Python] twisted thumbnail server
I hope this will be an easy question for some of you guys :) I'm trying to set up a simple server which will accept requests over GET to create a thumbnail for an image, and server it back as the response. The images are stored in two S3 buckets, the originals are in one bucket (store), and the generated thumbnails are stored in another (thumb) as a cache so that the work doesn't need to be repeated. Currently I'm checking if the thumbnail already exists in the thumb bucket. I'm redirecting the request if it is or if not I'm downloading the image from store, generating the thumb using PIL, uploading the thumbnail to the thumb bucket and then redirecting the request. I'm very new to twisted and was wondering if anyone who is more experienced would be able to take a look at what I have so far and let me know if anything is wrong/not ideal/will cause problems etc. or just general style pointers? The more critical the better, as I said I'm very new to this. I've just chucked it up on github: https://github.com/GP89/thumbs/blob/master/thumb.py There's a definite memory leak right now which I believe is PIL, or possibly StringIO objects not being disposed, hence all the random del statements trying to cure it (unsuccessfully). Maybe there's something I'm doing wrong in twisted that is causing things not to be cleaned up that I'm not aware of as well. I did try to use deferToThread, rather than my thread pool but the server seemed to block up- I probably should have left it incase it was because I was doing something obviously wrong. I think I'll make a branch quickly with my deferToThread version. Thanks very much for any time you can lend! Paul ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] twisted thumbnail server
Yeah, very big +1 to showing the deferToThread version. I feel bad even trying to spot potential threading issues here... It could be because the default thread pool isn't very large, but you're making many requests. What functionality does boto have that txaws doesn't that you really need here? Perhaps you can avoid blocking (and hence threads) at all. On Fri, Nov 2, 2012 at 4:42 PM, Paul Wiseman wrote: > I hope this will be an easy question for some of you guys :) > > I'm trying to set up a simple server which will accept requests over GET > to create a thumbnail for an image, and server it back as the response. > > The images are stored in two S3 buckets, the originals are in one bucket > (store), and the generated thumbnails are stored in another (thumb) as a > cache so that the work doesn't need to be repeated. > > Currently I'm checking if the thumbnail already exists in the thumb > bucket. I'm redirecting the request if it is or if not I'm downloading the > image from store, generating the thumb using PIL, uploading the thumbnail > to the thumb bucket and then redirecting the request. > > I'm very new to twisted and was wondering if anyone who is more > experienced would be able to take a look at what I have so far and let me > know if anything is wrong/not ideal/will cause problems etc. or just > general style pointers? The more critical the better, as I said I'm very > new to this. > > I've just chucked it up on github: > https://github.com/GP89/thumbs/blob/master/thumb.py > > There's a definite memory leak right now which I believe is PIL, or > possibly StringIO objects not being disposed, hence all the random del > statements trying to cure it (unsuccessfully). Maybe there's something I'm > doing wrong in twisted that is causing things not to be cleaned up that I'm > not aware of as well. > > I did try to use deferToThread, rather than my thread pool but the server > seemed to block up- I probably should have left it incase it was because I > was doing something obviously wrong. I think I'll make a branch quickly > with my deferToThread version. > > Thanks very much for any time you can lend! > > Paul > > ___ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > > -- cheers lvh ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] twisted thumbnail server
On 02/11/12 15:42, Paul Wiseman wrote: > I hope this will be an easy question for some of you guys :) > > I'm trying to set up a simple server which will accept requests over GET > to create a thumbnail for an image, and server it back as the response. > > The images are stored in two S3 buckets, the originals are in one bucket > (store), and the generated thumbnails are stored in another (thumb) as a > cache so that the work doesn't need to be repeated. > > Currently I'm checking if the thumbnail already exists in the thumb > bucket. I'm redirecting the request if it is or if not I'm downloading > the image from store, generating the thumb using PIL, uploading the > thumbnail to the thumb bucket and then redirecting the request. This isn't a criticism, but I trust you are aware of the implications and problems of doing work in threads? FWIW we usually use a child process pool for intensive tasks; this has the advantage you can sensibly kill a long-lived child (just kill the process) and you side-step the lack of concurrency in the python interpreter. [In this case, I'd just start up a bunch of python interpreters using a ProcessProtocol and use a simple request/response command protocol on stdin/stdout - the child interpreters can be non-Twisted processes able to block on PIL operations] If you really do want threads, is there any reason to not use the Twisted threadpool stuff? It's often a personal/style choice, but I don't use StringIO for large volumes of data personally (not Twisted-specific). I'm sure someone will mention tests ;o) ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] deferToThread and reactor loop
Hi Jean-Paul, > >** > >Am I correct that "deferToThread" does not immediately forward the call > >to a background thread, but only the next time the reactor loop runs? > >** > However, I can direct you to the implementation of > deferToThread: > > http://twistedmatrix.com/trac/browser/trunk/twisted/python/threadpool.py#L1 > 19 > > Notice the `self.q.put(o)`. This matches up with the call to `self.q.get` in > the > same module: > > http://twistedmatrix.com/trac/browser/trunk/twisted/python/threadpool.py#L1 > 58 > > Together, these bits of source should demonstrate that there's no waiting for > a > reactor iteration before the work is enqueued. The work goes into the Queue > instance, and instantly any worker thread is free to grab it. Ok, I see. There might be a mutex or something in the Queue implementation (if it's not a lockless queue implementation) or the GIL might be involved. I have no convinving explanation for the behavior I see (which also seems to be platform agnostic). Another wild guess (besides above) I have: maybe the OS does not immediately schedule other process threads for execution if the current thread (the one pushing to the Queue) is very busy .. Thanks, Tobias ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] twisted thumbnail server
On 2 November 2012 15:57, Laurens Van Houtven <_...@lvh.cc> wrote: > Yeah, very big +1 to showing the deferToThread version. I feel bad even > trying to spot potential threading issues here... It could be because the > default thread pool isn't very large, but you're making many requests. > > What functionality does boto have that txaws doesn't that you really need > here? Perhaps you can avoid blocking (and hence threads) at all. > > This is the deferToThread version: https://github.com/GP89/thumbs/blob/defertothread/thumb.py The thread pool size was 60, pretty arbitrary but (ignoring the memory leaks) it should only have at most n images in memory at once where n is the number of threads. If there's more requests that it can process they'll just build up in the queue and not cause memory to fill up with downloaded images (this was an initial problem I had). My understanding is that txaws would only work in a twisted way, but as the work in the thread has blocking code (the PIL bits) I just used boto as I'm more familiar with it. I'm not sure how I'd use txaws in a thread or what benefit that would have? I'd love to avoid blocking and keep it all in 1 thread, but I don't know of anyway to do the image resizing/rotation etc. without blocking. > > On Fri, Nov 2, 2012 at 4:42 PM, Paul Wiseman wrote: > >> I hope this will be an easy question for some of you guys :) >> >> I'm trying to set up a simple server which will accept requests over GET >> to create a thumbnail for an image, and server it back as the response. >> >> The images are stored in two S3 buckets, the originals are in one bucket >> (store), and the generated thumbnails are stored in another (thumb) as a >> cache so that the work doesn't need to be repeated. >> >> Currently I'm checking if the thumbnail already exists in the thumb >> bucket. I'm redirecting the request if it is or if not I'm downloading the >> image from store, generating the thumb using PIL, uploading the thumbnail >> to the thumb bucket and then redirecting the request. >> >> I'm very new to twisted and was wondering if anyone who is more >> experienced would be able to take a look at what I have so far and let me >> know if anything is wrong/not ideal/will cause problems etc. or just >> general style pointers? The more critical the better, as I said I'm very >> new to this. >> >> I've just chucked it up on github: >> https://github.com/GP89/thumbs/blob/master/thumb.py >> >> There's a definite memory leak right now which I believe is PIL, or >> possibly StringIO objects not being disposed, hence all the random del >> statements trying to cure it (unsuccessfully). Maybe there's something I'm >> doing wrong in twisted that is causing things not to be cleaned up that I'm >> not aware of as well. >> >> I did try to use deferToThread, rather than my thread pool but the server >> seemed to block up- I probably should have left it incase it was because I >> was doing something obviously wrong. I think I'll make a branch quickly >> with my deferToThread version. >> >> Thanks very much for any time you can lend! >> >> Paul >> >> ___ >> Twisted-Python mailing list >> Twisted-Python@twistedmatrix.com >> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python >> >> > > > -- > cheers > lvh > > > ___ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > > ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] twisted thumbnail server
On 2 November 2012 16:13, Phil Mayers wrote: > On 02/11/12 15:42, Paul Wiseman wrote: > > I hope this will be an easy question for some of you guys :) > > > > I'm trying to set up a simple server which will accept requests over GET > > to create a thumbnail for an image, and server it back as the response. > > > > The images are stored in two S3 buckets, the originals are in one bucket > > (store), and the generated thumbnails are stored in another (thumb) as a > > cache so that the work doesn't need to be repeated. > > > > Currently I'm checking if the thumbnail already exists in the thumb > > bucket. I'm redirecting the request if it is or if not I'm downloading > > the image from store, generating the thumb using PIL, uploading the > > thumbnail to the thumb bucket and then redirecting the request. > > This isn't a criticism, but I trust you are aware of the implications > and problems of doing work in threads? I think so. I understand the whole idea of twisted is to schedule tasks in an async way in a single main thread. I usually use quite a lot of threads in my code and I'm just learning about this new async style of coding. I've tried to avoid using threads as much as I can here but didn't think I could get away from them based on the fact that PIL is blocking. > FWIW we usually use a child process pool for intensive tasks; this has > the advantage you can sensibly kill a long-lived child (just kill the > process) and you side-step the lack of concurrency in the python > interpreter. > > [In this case, I'd just start up a bunch of python interpreters using a > ProcessProtocol and use a simple request/response command protocol on > stdin/stdout - the child interpreters can be non-Twisted processes able > to block on PIL operations] > > I'd love to get this working using a processes pool rather than a thread pool (I spent quite a lot of time trying to figure out how to do it in twisted but haven't yet worked out how). As this server will be CPU bound this will hopefully get more throughput and also side-step the memory leak in PIL that I believe I'm seeing (although on second thought maybe not, the over head of starting a new python interpreter each time wouldn't be viable). Are there any examples of how to use a process pool? > If you really do want threads, is there any reason to not use the > Twisted threadpool stuff? > > I wasn't aware of a twisted thread pool, I think I've only come across deferToThead, which I imagined was using a threadpool. (if so if there a way to control the size?) > It's often a personal/style choice, but I don't use StringIO for large > volumes of data personally (not Twisted-specific). > > I rarely use it either, but I needed a way to get the data into a file type object for PIL without putting the data to disk. How would you recommend I get the data between download / PIL / upload? > I'm sure someone will mention tests ;o) > tests are something I've often, guiltily, neglected- maybe I should start :) I wouldn't really know where to start writing tests for anything more than trivial functions. I guess just have images and thumbnails that I know are correct for that request, send them to the server and see if they return a thumbnail that matches? something along those lines? > ___ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python