[Twisted-Python] twisted thumbnail server

2012-11-02 Thread Paul Wiseman
I hope this will be an easy question for some of you guys :)

I'm trying to set up a simple server which will accept requests over GET to
create a thumbnail for an image, and server it back as the response.

The images are stored in two S3 buckets, the originals are in one bucket
(store), and the generated thumbnails are stored in another (thumb) as a
cache so that the work doesn't need to be repeated.

Currently I'm checking if the thumbnail already exists in the thumb bucket.
I'm redirecting the request if it is or if not I'm downloading the image
from store, generating the thumb using PIL, uploading the thumbnail to the
thumb bucket and then redirecting the request.

I'm very new to twisted and was wondering if anyone who is more experienced
would be able to take a look at what I have so far and let me know if
anything is wrong/not ideal/will cause problems etc. or just general style
pointers? The more critical the better, as I said I'm very new to this.

I've just chucked it up on github:
https://github.com/GP89/thumbs/blob/master/thumb.py

There's a definite memory leak right now which I believe is PIL, or
possibly StringIO objects not being disposed, hence all the random del
statements trying to cure it (unsuccessfully). Maybe there's something I'm
doing wrong in twisted that is causing things not to be cleaned up that I'm
not aware of as well.

I did try to use deferToThread, rather than my thread pool but the server
seemed to block up- I probably should have left it incase it was because I
was doing something obviously wrong. I think I'll make a branch quickly
with my deferToThread version.

Thanks very much for any time you can lend!

Paul
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] twisted thumbnail server

2012-11-02 Thread Laurens Van Houtven
Yeah, very big +1 to showing the deferToThread version. I feel bad even
trying to spot potential threading issues here... It could be because the
default thread pool isn't very large, but you're making many requests.

What functionality does boto have that txaws doesn't that you really need
here? Perhaps you can avoid blocking (and hence threads) at all.


On Fri, Nov 2, 2012 at 4:42 PM, Paul Wiseman  wrote:

> I hope this will be an easy question for some of you guys :)
>
> I'm trying to set up a simple server which will accept requests over GET
> to create a thumbnail for an image, and server it back as the response.
>
> The images are stored in two S3 buckets, the originals are in one bucket
> (store), and the generated thumbnails are stored in another (thumb) as a
> cache so that the work doesn't need to be repeated.
>
> Currently I'm checking if the thumbnail already exists in the thumb
> bucket. I'm redirecting the request if it is or if not I'm downloading the
> image from store, generating the thumb using PIL, uploading the thumbnail
> to the thumb bucket and then redirecting the request.
>
> I'm very new to twisted and was wondering if anyone who is more
> experienced would be able to take a look at what I have so far and let me
> know if anything is wrong/not ideal/will cause problems etc. or just
> general style pointers? The more critical the better, as I said I'm very
> new to this.
>
> I've just chucked it up on github:
> https://github.com/GP89/thumbs/blob/master/thumb.py
>
> There's a definite memory leak right now which I believe is PIL, or
> possibly StringIO objects not being disposed, hence all the random del
> statements trying to cure it (unsuccessfully). Maybe there's something I'm
> doing wrong in twisted that is causing things not to be cleaned up that I'm
> not aware of as well.
>
> I did try to use deferToThread, rather than my thread pool but the server
> seemed to block up- I probably should have left it incase it was because I
> was doing something obviously wrong. I think I'll make a branch quickly
> with my deferToThread version.
>
> Thanks very much for any time you can lend!
>
> Paul
>
> ___
> Twisted-Python mailing list
> Twisted-Python@twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>
>


-- 
cheers
lvh
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] twisted thumbnail server

2012-11-02 Thread Phil Mayers
On 02/11/12 15:42, Paul Wiseman wrote:
> I hope this will be an easy question for some of you guys :)
>
> I'm trying to set up a simple server which will accept requests over GET
> to create a thumbnail for an image, and server it back as the response.
>
> The images are stored in two S3 buckets, the originals are in one bucket
> (store), and the generated thumbnails are stored in another (thumb) as a
> cache so that the work doesn't need to be repeated.
>
> Currently I'm checking if the thumbnail already exists in the thumb
> bucket. I'm redirecting the request if it is or if not I'm downloading
> the image from store, generating the thumb using PIL, uploading the
> thumbnail to the thumb bucket and then redirecting the request.

This isn't a criticism, but I trust you are aware of the implications 
and problems of doing work in threads?

FWIW we usually use a child process pool for intensive tasks; this has 
the advantage you can sensibly kill a long-lived child (just kill the 
process) and you side-step the lack of concurrency in the python 
interpreter.

[In this case, I'd just start up a bunch of python interpreters using a 
ProcessProtocol and use a simple request/response command protocol on 
stdin/stdout - the child interpreters can be non-Twisted processes able 
to block on PIL operations]

If you really do want threads, is there any reason to not use the 
Twisted threadpool stuff?

It's often a personal/style choice, but I don't use StringIO for large 
volumes of data personally (not Twisted-specific).

I'm sure someone will mention tests ;o)

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] deferToThread and reactor loop

2012-11-02 Thread Tobias Oberstein
Hi Jean-Paul,

> >**
> >Am I correct that "deferToThread" does not immediately forward the call
> >to a background thread, but only the next time the reactor loop runs?
> >**

> However, I can direct you to the implementation of
> deferToThread:
> 
> http://twistedmatrix.com/trac/browser/trunk/twisted/python/threadpool.py#L1
> 19
> 
> Notice the `self.q.put(o)`.  This matches up with the call to `self.q.get` in 
> the
> same module:
> 
> http://twistedmatrix.com/trac/browser/trunk/twisted/python/threadpool.py#L1
> 58
> 
> Together, these bits of source should demonstrate that there's no waiting for 
> a
> reactor iteration before the work is enqueued.  The work goes into the Queue
> instance, and instantly any worker thread is free to grab it.

Ok, I see.

There might be a mutex or something in the Queue implementation (if it's not
a lockless queue implementation) or the GIL might be involved.

I have no convinving explanation for the behavior I see (which also seems to be
platform agnostic).

Another wild guess (besides above) I have: maybe the OS does not immediately
schedule other process threads for execution if the current thread (the one
pushing to the Queue) is very busy ..

Thanks,
Tobias


___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] twisted thumbnail server

2012-11-02 Thread Paul Wiseman
On 2 November 2012 15:57, Laurens Van Houtven <_...@lvh.cc> wrote:

> Yeah, very big +1 to showing the deferToThread version. I feel bad even
> trying to spot potential threading issues here... It could be because the
> default thread pool isn't very large, but you're making many requests.
>
> What functionality does boto have that txaws doesn't that you really need
> here? Perhaps you can avoid blocking (and hence threads) at all.
>
>
This is the deferToThread version:
https://github.com/GP89/thumbs/blob/defertothread/thumb.py

The thread pool size was 60, pretty arbitrary but (ignoring the memory
leaks) it should only have at most n images in memory at once where n is
the number of threads. If there's more requests that it can process they'll
just build up in the queue and not cause memory to fill up with downloaded
images (this was an initial problem I had).

My understanding is that txaws would only work in a twisted way, but as the
work in the thread has blocking code (the PIL bits) I just used boto as I'm
more familiar with it. I'm not sure how I'd use txaws in a thread or what
benefit that would have?

I'd love to avoid blocking and keep it all in 1 thread, but I don't know of
anyway to do the image resizing/rotation etc. without blocking.


>
> On Fri, Nov 2, 2012 at 4:42 PM, Paul Wiseman  wrote:
>
>> I hope this will be an easy question for some of you guys :)
>>
>> I'm trying to set up a simple server which will accept requests over GET
>> to create a thumbnail for an image, and server it back as the response.
>>
>> The images are stored in two S3 buckets, the originals are in one bucket
>> (store), and the generated thumbnails are stored in another (thumb) as a
>> cache so that the work doesn't need to be repeated.
>>
>> Currently I'm checking if the thumbnail already exists in the thumb
>> bucket. I'm redirecting the request if it is or if not I'm downloading the
>> image from store, generating the thumb using PIL, uploading the thumbnail
>> to the thumb bucket and then redirecting the request.
>>
>> I'm very new to twisted and was wondering if anyone who is more
>> experienced would be able to take a look at what I have so far and let me
>> know if anything is wrong/not ideal/will cause problems etc. or just
>> general style pointers? The more critical the better, as I said I'm very
>> new to this.
>>
>> I've just chucked it up on github:
>> https://github.com/GP89/thumbs/blob/master/thumb.py
>>
>> There's a definite memory leak right now which I believe is PIL, or
>> possibly StringIO objects not being disposed, hence all the random del
>> statements trying to cure it (unsuccessfully). Maybe there's something I'm
>> doing wrong in twisted that is causing things not to be cleaned up that I'm
>> not aware of as well.
>>
>> I did try to use deferToThread, rather than my thread pool but the server
>> seemed to block up- I probably should have left it incase it was because I
>> was doing something obviously wrong. I think I'll make a branch quickly
>> with my deferToThread version.
>>
>> Thanks very much for any time you can lend!
>>
>> Paul
>>
>> ___
>> Twisted-Python mailing list
>> Twisted-Python@twistedmatrix.com
>> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>>
>>
>
>
> --
> cheers
> lvh
>
>
> ___
> Twisted-Python mailing list
> Twisted-Python@twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>
>
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] twisted thumbnail server

2012-11-02 Thread Paul Wiseman
On 2 November 2012 16:13, Phil Mayers  wrote:

> On 02/11/12 15:42, Paul Wiseman wrote:
> > I hope this will be an easy question for some of you guys :)
> >
> > I'm trying to set up a simple server which will accept requests over GET
> > to create a thumbnail for an image, and server it back as the response.
> >
> > The images are stored in two S3 buckets, the originals are in one bucket
> > (store), and the generated thumbnails are stored in another (thumb) as a
> > cache so that the work doesn't need to be repeated.
> >
> > Currently I'm checking if the thumbnail already exists in the thumb
> > bucket. I'm redirecting the request if it is or if not I'm downloading
> > the image from store, generating the thumb using PIL, uploading the
> > thumbnail to the thumb bucket and then redirecting the request.
>
> This isn't a criticism, but I trust you are aware of the implications
> and problems of doing work in threads?




I think so. I understand the whole idea of twisted is to schedule tasks in
an async way in a single main thread. I usually use quite a lot of threads
in my code and I'm just learning about this new async style of coding. I've
tried to avoid using threads as much as I can here but didn't think I could
get away from them based on the fact that PIL is blocking.


> FWIW we usually use a child process pool for intensive tasks; this has
> the advantage you can sensibly kill a long-lived child (just kill the
> process) and you side-step the lack of concurrency in the python
> interpreter.
>

> [In this case, I'd just start up a bunch of python interpreters using a
> ProcessProtocol and use a simple request/response command protocol on
> stdin/stdout - the child interpreters can be non-Twisted processes able
> to block on PIL operations]
>
>
I'd love to get this working using a processes pool rather than a thread
pool (I spent quite a lot of time trying to figure out how to do it in
twisted but haven't yet worked out how). As this server will be CPU bound
this will hopefully get more throughput and also side-step the memory leak
in PIL that I believe I'm seeing (although on second thought maybe not, the
over head of starting a new python interpreter each time wouldn't be
viable).

Are there any examples of how to use a process pool?


> If you really do want threads, is there any reason to not use the
> Twisted threadpool stuff?
>
>
I wasn't aware of a twisted thread pool, I think I've only come across
deferToThead, which I imagined was using a threadpool. (if so if there a
way to control the size?)


> It's often a personal/style choice, but I don't use StringIO for large
> volumes of data personally (not Twisted-specific).
>
>
I rarely use it either, but I needed a way to get the data into a file type
object for PIL without putting the data to disk. How would you recommend I
get the data between download / PIL / upload?


> I'm sure someone will mention tests ;o)


>

tests are something I've often, guiltily, neglected- maybe I should start
:) I wouldn't really know where to start writing tests for anything more
than trivial functions. I guess just have images and thumbnails that I know
are correct for that request, send them to the server and see if they
return a thumbnail that matches? something along those lines?


> ___
> Twisted-Python mailing list
> Twisted-Python@twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python