[issue17569] urllib2 urlopen truncates https pages after 32768 characters

2013-03-28 Thread J Porter

New submission from J Porter:

When using urllib2 to fetch page data from an https server, I found that only 
the first 32768 characters of the download were retrieved. Other browsers 
returned the full documents, so it does not appear to be a server issue. If 
http, rather than https is used on the same server, the full document is 
retrieved. No problems with shorter documents (<32768 characters). They were 
not truncated.

--
components: Library (Lib)
messages: 185476
nosy: jhp7e
priority: normal
severity: normal
status: open
title: urllib2 urlopen truncates https pages after 32768 characters
type: behavior
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue17569>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17569] urllib2 urlopen truncates https pages after 32768 characters

2013-03-29 Thread J Porter

J Porter added the comment:

Here is the code (security info removed) and the output. I noticed that the 
problem is a bit different between 2.6.5 and 2.7.3 (on one the use of 
authentication is different) so I've included the output for both:

import urllib2

userData="Basic  KEY GOES HERE"

emlUrl="https://pasta.lternet.edu/package/metadata/eml/knb-lter-vcr/25/27";
emlReq=urllib2.Request(emlUrl)
emlReq.add_header('Authorization', userData)
emlSock=urllib2.urlopen(emlReq,timeout=60)
emlString=emlSock.read()
print "Https,authenticated: "+str(len(emlString))

emlReq=urllib2.Request(emlUrl)
emlSock=urllib2.urlopen(emlReq,timeout=60)
emlString=emlSock.read()
print "Https,Not authenticated: "+str(len(emlString))

emlUrl="http://pasta.lternet.edu/package/metadata/eml/knb-lter-vcr/25/27";
emlReq=urllib2.Request(emlUrl)
emlReq.add_header('Authorization', userData)
emlSock=urllib2.urlopen(emlReq,timeout=60)
emlString=emlSock.read()
print "Http,authenticated: "+str(len(emlString))


emlReq=urllib2.Request(emlUrl)
emlSock=urllib2.urlopen(emlReq,timeout=60)
emlString=emlSock.read()
lengthHttpsNotAuthenticated=len(emlString)
print "Http,authenticated: "+str(len(emlString))

OUTPUT when run on PC using Python 2.6.5
Https,authenticated: 32768
Https,Not authenticated: 32768
Http,authenticated: 40898
Http,authenticated: 40898

OUTPUT when run on Ubuntu Linux (12.4LTS):
Https,authenticated: 32768
Https,Not authenticated: 40898
Http,authenticated: 40898
Http,authenticated: 40898

--

___
Python tracker 
<http://bugs.python.org/issue17569>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com