Joshua Kugler wrote:
We just upgraded Python to 2.6 on some of our servers and a number of our
CGI scripts broke because the cgi module has changed the way it handles
POST requests. When the 'action' attribute was not present in the form element on an HTML page the module behaved as if the value of the
attribute was the URL which brought the user to the page with the form,
but without the query (?x=y...) part.

This does not make sense.  Can you give an example?

Sure.  Here's a tiny repro script:

#!/usr/bin/python
import cgi, xml.sax.saxutils
def quote(me): return me and xml.sax.saxutils.quoteattr(str(me)) or ''
print """\
Content-type: text/html

<html><body><form method='post'><input name='x' value=%s>
<input type='submit'>
</form></body></html>""" % quote(cgi.FieldStorage().getvalue('x'))

####################  end of repro script  ########################

Try it out on this pre-2.6 Python page:

http://www.rksystems.com/cgi-bin/cgi-repro.py?x=y

When the page comes up, click Submit. Click it several times. No change in the content of the text field, which is populated when the page first comes up from the GET request's URL, and then subsequently from the POST request's parameters.

For comparison, here's the equivalent Perl page, which behaves the same way:

http://www.rksystems.com/cgi-bin/cgi-repro.pl?x=y

Or PHP; again, same behavior, no matter how many times you click the Submit button:

http://www.rksystems.com/cgi-repro.php?x=y

Now try the Python script above from a server where Python has been upgraded to version 2.6:

http://mahler.nci.nih.gov/cgi-bin/cgi-repro.py?x=y

Notice that when you click on the Submit button, the field is populated with the string representation of the list which FieldStorage.getvalue() returns. Each time you click the submit button you'll see the effect recursively snowballing. This is exactly the same script as the one behind the first URL above, byte for byte.

Now FieldStorage.getvalue () is giving the script a list of two copies of the value for some of the
parameters (folding in the parameters from the previous request) instead
of the single string it used to return for each.

There is a function call to get only one value.  I think it's get_first() or
some such.

That's true, but risky. I have no guarantee that the value entered by the user on the form will be first on the list. I might instead get the initial value carried over from the URL which brought up the form to begin with. We're working around the problem by modifying the broken scripts to explicitly set the action attributes.


Well, the CGI module hasn't had many changes.  There was this bug fix a few
months back:

http://svn.python.org/view?rev=64447&view=rev

Looks like that was where it happened.

It is possible that by fixing a bug, they brought the behavior in line with
what it *should* be.

That's certainly possible. I'm not contending that Perl and PHP and the previous versions of Python all got it right and the new Python is wrong. It could very well be the other way around.[1] But my expectation, based on what I've seen happen over the years with other proposed changes to the language and the libraries, was that there would have been some (possibly extended) discussion of the risks of breaking existing code, and the best way to phase in the change with as little sudden breakage as possible. I haven't been able to find that discussion, and I was hoping some kind soul would point me in the right direction.

  Or maybe the browser behavior changed?

Clearly not, as you will see by using the same browser to try out the URLs above. If you look at the HTML source when the page first comes up for each of the scripts, you'll see it's the same. It's the behavior on the server (that is, in the Python library module) which changes.

  The server
does not care about an "action" attribute.  That only tells the browser
where to send the data.

Well that's a pretty good formulation of the conclusion you would come to based on the behavior of all of Perl, PHP, and (pre-2.6) Python. And intuitively, that's how one (or at least I) would expect things to work. The parameters in the original URL are appropriately used to seed initial values in the form when the form is invoked with a GET request, but after that point it's hard to see them as anything but history. But that's not how the new version of the cgi module is behaving. It's folding in the parameters it finds in the original URL, which it gets from the environment's QUERY_STRING variable, in with the fields it parses from the POST request's body.

  It is possible the browser did not properly format
a request when there was no "action" attribute.

When the 'action' attribute is not present in the form element, the browser implicitly assigns it the value of the original URL which first brought up the page with the form. This browser behavior has not changed. It's doing the same thing no matter which version of which language and libraries are used to implement the CGI script (it has no idea what those are). Nor, as far as I have been able to determine, is this behavior dependent on which (version of which) browser you're using.

 Can you provide more details?

I think we should have enough specifics with what I've provided above to make it clear what's happening, but if you can think of anything I've left out which you think would be useful, let me know and I'll try to supply it.

Cheers,
Bob

[1] I haven't yet finished my attempts to parse the relevant RFCs; I assumed that the original authors and maintainers of this module (which includes the BDFL himself), would have been more adept at that than I am, which is one of the reasons I was hoping to find some discussion in the mailing list archives of the discussion of the proposed change in the module's behavior.

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to