[web2py:30105] Comments on gluon/streamer.py.

Graham Dumpleton Sat, 05 Sep 2009 05:20:20 -0700

In gluon/streamer.py there are a number of places were open() call is
made on files. These are within a try block and exceptions are caught
and the specific reason for the error is determined and different HTTP
error returned. For example on such case from that file is:


    try:
        open(static_file)
    except IOError, e:
        if e[0] == 21:
            raise HTTP(403, error_message, web2py_error='file is a
directory')
        elif e[0] == 13:
            raise HTTP(403, error_message, web2py_error='inaccessible
file')
        else:
            raise HTTP(404, error_message, web2py_error='invalid
file')

Any book on good programming style will tell you that you shouldn't
use magic constants like above as it reduces maintainability and
understandability. I am also not entirely sure that the POSIX error
codes are guaranteed to have the same values on different platforms
which don't share a common ABI. This is why one should always use the
symbolic names for the constants instead. For Python such symbolic
names are in the 'errno' module.

I fully realise that you may save an ever so small amount of time by
using hard wired constants rather than the symbolic names due to the
latter needing a lookup, but this is generally a poor trade off when
it comes to maintainability. It is not even on the main code path
anyway, but an error condition.

Suggest therefore you use instead:

    try:
        open(static_file)
    except IOError, e:
        if e[0] == errno.EISDIR:
            raise HTTP(403, error_message, web2py_error='file is a
directory')
        elif e[0] == errno.EACCES:
            raise HTTP(403, error_message, web2py_error='inaccessible
file')
        else:
            raise HTTP(404, error_message, web2py_error='invalid
file')

Do similar thing for other places in that file where that sort of
check is done.

For the case above, that code is somewhat questionable anyway.

What the section of code does that starts with that bit of code is:

1. Try and open() the file just to see if it is possible. The
resulting file object is then promptly ignored, not even closed. Yes
it will close when scope left, but always better to be explicit and
close as early as possible.

2. Perform a stat() on the file based on the path name to the file.
This is done to determine file size and modification time.

3. Open the file yet again, this time so contents can be streamed
back.

This is a pretty wasteful way of going about things.

First off, the EACCES could have been obtained from failure of the stat
() call.

Second, the EISDIR could have been determined from data contained in
the structure returned by the stat() call.

Thus the whole first open() could have been avoided.

Third, a stat() call is going to be more expensive than a fstat() call
after having opened the file for actual reading as fstat() can refer
direct to the already open record for the file.

Thus, the most efficient way would be to open() the file as if was
going to use it for streaming. If that fails, can return the 403/404
based on that. Having got the file object, then perform a fstat() on
the file object to get size and modification time. Then use the file
handle for streaming.

Yes there is other stuff happening in the stream_file_or_304_or_206()
function, but can't see that it will be hard to fit in with that sort
of change.

On top of that, there is always an ever so small risk that the file
could be written to between time stat is performed and file completely
stream back. Imagine it was a log file for example that was being
written to all the time. For normal case where whole file is to be
returned, streamer() isn't passed bytes argument, ie., number of bytes
to return and which was set in response Content-Length. This means
that streamer() could actually return more bytes than Content-Length
specified.

For Apache/mod_wsgi that will not matter as it will truncate any
returned data longer than Content-Length, but other WSGI servers may
not do so. Thus suggest that bytes for length always be supplied to
streamer. This will just protect against chance that send back more
data than expected. If this were to occur, could stuff up where client
was using keep alive on the connection.

If it turns out less data was sent back because file got truncated,
then Content-Length already sent and can't do anything about it. In
that case client will timeout and close connection without reusing it
for keep alive, so at least can't stuff up following request over same
connection.

One final thing, as already mentioned in prior mail, I think the 1MB
chunk size should be reviewed as possibly performs worse than 64KB for
some WSGI hosting mechanism. Some testing on what gives best results
for a range of WSGI hosting mechanisms and platforms should be
performed. The actual chunk size then perhaps should be configurable.
The comments in the web2py book which seem to highlight this 1MB chunk
size for some reason would need to be changed.

Graham
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to 
web2py+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:30105] Comments on gluon/streamer.py.

Reply via email to