[Twisted-Python] Problem fetching page with getPage

2010-01-02 Thread Terry Jones
I've run into a problem fetching an HTTP page with t.w.client.getPage. It's
not simple to make standalone code showing what's going wrong, but the
following summarizes where I am and why I find this puzzling.

After some setup, I have some a url path, and some headers I want to
send. A summary:

host = 'ec2.amazon.com'
port = 443
path = '/?some=params&are=here&etc=etc'
method = 'GET'
data = ''
headers = { 'some' : 'headers', 'Content-Length' : '0' }
url = 'https://%s:%d%s' % (host, port, path)

the actual details don't matter right now, I don't think.  When I call

  d = getPage(url, headers=headers)

d's errback fires with a twisted.web.error.Error with a 403 status. So
you'd think I had something wrong in my headers, or was trying to access a
forbidden resource, etc.

But when I drop this code in instead of the call to getPage:

import httplib
cx = httplib.HTTPSConnection(host, port)
cx.request(method, path, data, headers)
response = cx.getresponse()
print 'response status:', response.status
body = response.read()
print 'body:', body

I get a 200 status, and the body is exactly as expected.

BTW, the path above does start with a slash. I've tried using
HTTPClientFactory and reactor.connectSSL directly.  I've tried with and
without the '' postdata and Content-Length header. I've tried with Twisted
8.2.0 and 9.0.0.  And of course I've checked many times that the URL and
its query params requested by httplib and getPage are identical (apart from
the time-sensitive signature).

The reason it's not easy to provide a simple example is that the URL and
headers have signed components, based in part on a timestamp, and based in
part on Amazon secret keys, etc. It's not easy to separate all that, and if
I did I'd be posting at least 100 lines of code that would only run if you
had your Amazon AWS details provided etc.

In any case, it looks like the problem is not in the setup of the request.
Can anyone offer a reason why httplib might be able to fetch the page
whereas getPage receives an error?  I'm stumped.

Terry

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Problem fetching page with getPage

2010-01-02 Thread sstein...@gmail.com

On Jan 2, 2010, at 9:34 AM, Terry Jones wrote:
> In any case, it looks like the problem is not in the setup of the request.
> Can anyone offer a reason why httplib might be able to fetch the page
> whereas getPage receives an error?  I'm stumped.

I've had to debug things like this recently and I have two suggestions:

1> Recreate the headers and make it work with curl.  Curl won't add anything to 
your headers and such and you'll be sure that you're getting the result you 
want with completely stripped down case.

2>  Get Charles http://www.charlesproxy.com/ if you're on OS X.  It rocks.  
Otherwise, get one of the Windows tools (sorry, no recos from me on that), and 
watch exactly what goes by.

I had a situation where python's HTTPlib stuff was adding an Accept Encoding 
header that didn't put there, and it exposed a bug in the API I was using.  
When I ran it with curl, worked fine since no additional headers were added.  
Charles helped me see what was going on (unfortunately, long after they had 
fixed that particular bug in the API.

S
aka/Steve Steiner
aka/ssteinerX


___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


[Twisted-Python] Fwd: Problem fetching page with getPage

2010-01-02 Thread sstein...@gmail.com

Sorry...

>> I had a situation where python's HTTPlib stuff was adding an Accept Encoding 
>> header
>> that didn't put there,

that I didn't put there,



___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Problem fetching page with getPage

2010-01-02 Thread Andreas Kostyrka
Am Samstag, den 02.01.2010, 10:03 -0500 schrieb sstein...@gmail.com:
> On Jan 2, 2010, at 9:34 AM, Terry Jones wrote:
> > In any case, it looks like the problem is not in the setup of the request.
> > Can anyone offer a reason why httplib might be able to fetch the page
> > whereas getPage receives an error?  I'm stumped.
> 
> I've had to debug things like this recently and I have two suggestions:
> 
> 1> Recreate the headers and make it work with curl.  Curl won't add anything 
> to your headers and such and you'll be sure that you're getting the result 
> you want with completely stripped down case.
> 
> 2>Get Charles http://www.charlesproxy.com/ if you're on OS X.  It rocks.  
> Otherwise, get one of the Windows tools (sorry, no recos from me on that), 
> and watch exactly what goes by.

Actually, CharlesProxy is a Java tool, AFAIK. And personally I'm really
not that sure that it rocks, but personal opinions do vary :) 

As a free alternative, webscarab can handle the man-in-the-middle
interception too.

Consider also using FoxyProxy (a FF addon), to direct only the URLs you
are interested into a the logging proxy.

Andreas


___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Problem fetching page with getPage

2010-01-02 Thread Glyph Lefkowitz

On Jan 2, 2010, at 9:34 AM, Terry Jones wrote:


> In any case, it looks like the problem is not in the setup of the request.
> Can anyone offer a reason why httplib might be able to fetch the page
> whereas getPage receives an error?  I'm stumped.

Well, I know this isn't terribly helpful, but "a bug in getPage" is really the 
only thing that comes to mind.  Or, some legal-but-unusual behavior in getPage 
which triggers a bug on the EC2 side of things.

The only thing I can suggest is to start wireshark, do a byte-for-byte 
comparison of the requests that getPage and httplib emit, and see if you can 
find any of the differences which might be significant.  I would look carefully 
at any place in the request or response where data is being quoted or unquoted. 
 Based on the other stuff you've said, nothing jumps out at me.


___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Problem fetching page with getPage

2010-01-02 Thread Terry Jones
> "Steve" == sstein...@gmail com  writes:
Steve> On Jan 2, 2010, at 9:34 AM, Terry Jones wrote:
>> In any case, it looks like the problem is not in the setup of the request.
>> Can anyone offer a reason why httplib might be able to fetch the page
>> whereas getPage receives an error?  I'm stumped.
Steve> 
Steve> I've had to debug things like this recently and I have two suggestions:

Hi Steve

Thanks for the helpful reply - I can now make the call successfully.  The
difference turned out to be that httplib puts a Host: hostname:port header
into its calls, whereas getPage uses just Host: hostname. Plus there was
something else going on in some other code I'm using that made this a
problem (it was calculating a signature based on host:port).

Steve> 1> Recreate the headers and make it work with curl.  Curl won't add
Steve>anything to your headers and such and you'll be sure that you're
Steve>getting the result you want with completely stripped down case.

At least on my machine (curl 7.18.0 on Linux Ubuntu/Hardy) it adds a
User-agent, an Accept: */*, and also the Host header.

Steve> 2> Get Charles http://www.charlesproxy.com/ if you're on OS X.  It
Steve>rocks.  Otherwise, get one of the Windows tools (sorry, no recos
Steve>from me on that), and watch exactly what goes by.

It's available for Linux & Windows too. I tried it, but didn't make it work
fully when sending requests from the command line (with SSL, spoofing DNS,
etc). So in the end I just used netcat -l -p 443 and changed to HTTP to see
what was being sent. I wouldn't have thought of doing that without your
suggestion, so thanks a lot for the tip.

Terry

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Problem fetching page with getPage

2010-01-02 Thread Glyph Lefkowitz

On Jan 2, 2010, at 4:14 PM, Terry Jones wrote:

> Thanks for the helpful reply - I can now make the call successfully.  The
> difference turned out to be that httplib puts a Host: hostname:port header
> into its calls, whereas getPage uses just Host: hostname. Plus there was
> something else going on in some other code I'm using that made this a
> problem (it was calculating a signature based on host:port).

I'm glad that you tracked this down!

According to comments on , this 
problem was addressed in the new HTTP client implementation.  Have you 
considered using the new twisted.web.client.Agent instead of getPage?


___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Problem fetching page with getPage

2010-01-02 Thread Terry Jones
Hi Glyph

Thanks for the reply. I just sent another mail in the thread.

> "Glyph" == Glyph Lefkowitz  writes:
Glyph> Well, I know this isn't terribly helpful, but "a bug in getPage" is
Glyph> really the only thing that comes to mind.  Or, some
Glyph> legal-but-unusual behavior in getPage which triggers a bug on the
Glyph> EC2 side of things.

The error arose from a combination of things (signing a string that
included a host:port but then only sending a host in the Host header).
Turns out you can resolve it either way - using a port in both, or omitting
the port from both.


BTW, in reading about the Host header, it seems like getPage (more
specifically HTTPPageGetter) should be sending a port number in the header,
at least when the port is not 80. I base that remark on these:

  http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23
  http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2.2

That's a 1.1 spec as you surely know, and http.py sends an HTTP/1.0 header,
so you could argue that sending the Host is therefore just a nicety and
there's no need for a port. But the Host header isn't described in the HTTP
1.0 RFC, so it seems more like if you're going to send it you may as well
conform to HTTP 1.1.

But I guess that argument is somehow incorrect. I say that because a
comment in some other code I'm looking at that uses httplib, says that
prior to 2.6, httplib *used* to append a ":443" to SSL requests, but that
it no longer does. I guess sending the port was dropped from httplib for
good reason, and so HTTPPageGetter shouldn't add it. But I don't know.

I'm very far from being an expert on HTTP headers though. Not as far as I'd
like to be, though :-)

Thanks again for the reply.

Terry

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Problem fetching page with getPage

2010-01-02 Thread Terry Jones
> "Glyph" == Glyph Lefkowitz  writes:

Glyph> I'm glad that you tracked this down!

Me too.

Glyph> According to comments on ,
Glyph> this problem was addressed in the new HTTP client implementation.
Glyph> Have you considered using the new twisted.web.client.Agent instead
Glyph> of getPage?

I hadn't looked at it, but now have. The _computeHostValue method looks
very promising :-)  We've yet to switch to 9.0.0.

So httplib (apparently) changed to drop the :port part of the Host header
in Python 2.6, and now the Twisted client has added it. I think you guys
are right, so I wonder why httplib dropped it.

Terry

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


[Twisted-Python] Weekly Bug Summary

2010-01-02 Thread exarkun



Bug summary
__
Summary for 2009-12-27 through 2010-01-03
Bugs opened: 5Bugs closed: 8  Total open bugs: 1218 (-3)

|== Type Changes   |== Priority Changes   |== Component Changes   
|Defect:   -4  |High:+1   |Core:+1
|Enhancement:  +0  |Normal:  -4   |Runner:  -1
|Task: +1 |Trial:   -1
  |Web: +1
  |Words:   -3



Total Tickets
Open Tickets



New / Reopened Bugs
__
= High =
[#4195] Security: SO_EXCLUSIVEADDRUSE should be enabled when binding to ports on Windows (opened by davidsarah)
defect  core   http://twistedmatrix.com/trac/ticket/4195

= Normal =
[#4192] Move the content from exarkun's Twisted Web in 60 Seconds series into the Twisted Web howto (opened by jesstess)
taskwebhttp://twistedmatrix.com/trac/ticket/4192

[#4193] NameError: global name 'FilePath' is not defined (opened by thijs) (CLOSED, duplicate)
defect  trial  http://twistedmatrix.com/trac/ticket/4193

[#4194] Several broken API links in the core howto (opened by jesstess)
defect  core   http://twistedmatrix.com/trac/ticket/4194

[#4196] Support for the OpenSoundControl protocol (opened by arjan)
enhancement core   http://twistedmatrix.com/trac/ticket/4196



Closed Bugs
__
= Normal =
[#4186] intermittent test_signalKILL failure on BSD (opened by exarkun, closed by exarkun, fixed)
defect  core   http://twistedmatrix.com/trac/ticket/4186

[#3088] Make trial able to run tests and modules in a specific order (opened by amacleod, closed by Screwtape, duplicate)
enhancement trial  http://twistedmatrix.com/trac/ticket/3088

[#3742] twisted.words.test.test_jabbercomponent.ComponentAuthTest.testAuth fails on Python tr...@head (opened by exarkun, closed by ralphm, fixed)
defect  words  http://twistedmatrix.com/trac/ticket/3742

[#3847] twisted.words.test.test_xmlstream.XmlStreamTest.test_receiveBadXML fails with Python tr...@head (opened by ivank, closed by ralphm, fixed)
defect  words  http://twistedmatrix.com/trac/ticket/3847

[#3741] twisted.words.test.test_jabberclient.IQAuthInitializerTest.testDigest fails on Python tr...@head (opened by exarkun, closed by ralphm, fixed)
defect  words  http://twistedmatrix.com/trac/ticket/3741

[#4193] NameError: global name 'FilePath' is not defined (opened by thijs, closed by exarkun, duplicate)
defect  trial  http://twistedmatrix.com/trac/ticket/4193

[#4135] twisted.trial.runner.TrialRunner._removeSafely is missing a module qualification (opened by arkanes, closed by exarkun, duplicate)
defect  runner http://twistedmatrix.com/trac/ticket/4135

[#4142] twisted/protocols/_c_urlarg.c compilation errors on IBM AIX (opened by aprilmay, closed by spiv, fixed)
defect  core   http://twistedmatrix.com/trac/ticket/4142



Ticket Lifetime Stats
__
Oldest open ticket - [#50] conch command-line client doesn't work in win32 (since 2003-07-12 16:41:06).
Newest open ticket - [#4196] Support for the OpenSoundControl protocol (since 2010-01-01 11:42:02).

Mean open ticket age: 873 days, 10:40:00.056364.
Median: 849 days, 5:36:52.805954.
Standard deviation: 614 days, 20:54:33.488459.
Interquartile range: 940 days, 19:12:58.

Mean time between ticket creation and ticket resolution: 210 days, 18:35:06.549812.
Median: 27 days, 7:16:11.
Standard deviation is 358 days, 6:30:54.166632.
The interquartile range is 247 days, 1:16:49.

Mean time spent in review: 79 days, 7:32:12.129152.
Median: 4 days, 5:54:07.
Standard deviation: 272 days, 5:04:38.049709.
Interquartile range: 17 days, 0:13:25.

Mean number of times a ticket is reviewed: 2.05158069884.
Median: 1
Standard deviation: 1.69289451655.
Interquartile range: 1.


Contributor Stats
__
In the last 4 weeks,
23 unique ticket reporters
8 unique ticket reviewers
9 unique ticket resolvers
In the last 24 weeks,
102 unique ticket reporters
17 unique ticket reviewers
19 unique ticket resolvers
In the last 48 weeks,
181 unique ticket reporters
22 unique ticket reviewers
25 unique ticket resolvers





___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python