[Python-Dev] PyMem_Malloc() vs PyObject_Malloc()
I noticed something (in 2.5) yesterday, which may be a feature, but is more likely a bug. In tokenizer.c, tok->encoding is allocated using PyMem_MALLOC(). However, this then gets handed to a node->r_str in parsetok.c, and then released in node.c using PyObject_Free(). Now, by coincidence, PyObject_Free() will default to free() for objects that it doesn't recognize, so this works. But is this documented behavior? The reason I ran into this was that I had redirect the PyMem_* API to a different allocator, but left the PyObject_* one alone. My feeling Is that these two APIs shouldn't be interchangeable. Especially since you can't hand a PyObject_Malloc'd object to PyMem_Free() so the inverse shouldn't be expected to work. Any thoughts? Kristján ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyMem_Malloc() vs PyObject_Malloc()
Kristján Valur Jónsson wrote: > > I noticed something (in 2.5) yesterday, which may be a feature, but is more > likely a bug. > In tokenizer.c, tok->encoding is allocated using PyMem_MALLOC(). > However, this then gets handed to a node->r_str in parsetok.c, and then > released in node.c using PyObject_Free(). > > Now, by coincidence, PyObject_Free() will default to free() for objects that > it doesn't recognize, so this works. But is this documented behavior? The > reason I ran into this was that I had redirect the PyMem_* API to a different > allocator, but left the PyObject_* one alone. > > My feeling Is that these two APIs shouldn't be interchangeable. Especially > since you can't hand a PyObject_Malloc'd object to PyMem_Free() so the > inverse shouldn't be expected to work. > > Any thoughts? This is a bug. Please file a bug report for this. In general, either PyObject_* xor PyMem_* memory API should used. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 04 2009) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyMem_Malloc() vs PyObject_Malloc()
Kristján Valur Jónsson wrote: > My feeling Is that these two APIs shouldn’t be interchangeable. > Especially since you can’t hand a PyObject_Malloc’d object to > PyMem_Free() so the inverse shouldn’t be expected to work. I thought this had officially been deemed illegal for a while, and Google found the reference I was looking for in the What's New for 2.5: """Previously these different families all reduced to the platform's malloc() and free() functions. This meant it didn't matter if you got things wrong and allocated memory with the PyMem function but freed it with the PyObject function. With 2.5's changes to obmalloc, these families now do different things and mismatches will probably result in a segfault. You should carefully test your C extension modules with Python 2.5.""" So either the allocation or the release needs to change here. The behaviour of PyObject_Del when handed a pointer it doesn't recognise is currently undocumented. It may be best to make it officially undefined in order to further discourage people from relying on the implicit delegation to PyMem_Free. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia --- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to debug httplib slowness
Guido van Rossum wrote:
You might see a pattern. Is this on Windows?
Well, yes, but I'm not 100%. The problematic machine is a Windows box, but
there are no non-windows boxes on that network and vpn'ing from one of my
non-windows boxes slows things down enough that I'm not confident what I'd
be seeing was indicative of the same problem...
Time to set up a more conclusive test. Do you have something like curl
or wget available on the same box?
Time taken with IE: ~2 seconds
Time taken with wget: 2.2 seconds
Time taken with Python [1]: 20-30 minutes
I did a run of the script through cProfile and got the following:
pstats.Stats('download.profile').strip_dirs().sort_stats('time').print_stats(10)
1604545 function calls in 1956.057 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
1 1950.767 1950.767 1955.952 1955.952 httplib.py:544(_read_chunked)
851251.2350.0001.2350.000 {method 'recv' of
'_socket.socket' objects}
858381.0310.0002.2460.000 socket.py:313(read)
858380.7870.0003.3860.000 httplib.py:601(_safe_read)
429280.6140.0001.7790.000 socket.py:373(readline)
1287750.3440.0000.3440.000 {method 'write' of
'cStringIO.StringO' objects}
2007960.2060.0000.2060.000 {method 'seek' of
'cStringIO.StringO' objects}
858380.1790.0000.1790.000 {min}
1287670.1350.0000.1350.000 {cStringIO.StringIO}
727350.1160.0000.1160.000 {method 'read' of
'cStringIO.StringO' objects}
...which isn't what I was expecting!
Am I right in reading this as most of the time is being spent in
httplib's HTTPResponse._read_chunked and none of the methods it calls?
If so, is there a better way that a bunch of print statements to find
where in that method the time is being spent?
cheers,
Chris
[1] Python 2.6.2 on Windows Server 2003 R2 running this script:
from base64 import encodestring
from httplib import HTTPConnection
from datetime import datetime
conn = HTTPConnection('servername')
headers = {}
a = 'Basic '+encodestring('username:password').strip()
headers['Authorization']=a
t = datetime.now()
print t
conn.request('GET','/some/big/file',None,headers)
print 'request:',datetime.now()-t
response = conn.getresponse()
print 'response:',datetime.now()-t
data = response.read()
if len(data)<2000: print data
print 'read:',datetime.now()-t
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to debug httplib slowness
On Fri, Sep 4, 2009 at 1:11 PM, Chris Withers wrote: > Am I right in reading this as most of the time is being spent in httplib's > HTTPResponse._read_chunked and none of the methods it calls? > > If so, is there a better way that a bunch of print statements to find where > in that method the time is being spent? Well, since the source for _read_chunked includes the comment # XXX This accumulates chunks by repeated string concatenation, # which is not efficient as the number or size of chunks gets big. you might gain some speed improvement with minimal effort by gathering the read data chunks into a list and then returning "".join(chunks) at the end. Schiavo Simon ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to debug httplib slowness
Simon Cross gmail.com> writes: > > Well, since the source for _read_chunked includes the comment > > # XXX This accumulates chunks by repeated string concatenation, > # which is not efficient as the number or size of chunks gets big. > > you might gain some speed improvement with minimal effort by gathering > the read data chunks into a list and then returning "".join(chunks) at > the end. +1 for trying this. Given differences between platforms in realloc() performance, it might be the reason why it goes unnoticed under Linux but degenerates under Windows. As a sidenote, it is interesting that even an stdlib module makes this mistake and acknowledges it without trying to fix it. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to debug httplib slowness
Simon Cross wrote: Well, since the source for _read_chunked includes the comment # XXX This accumulates chunks by repeated string concatenation, # which is not efficient as the number or size of chunks gets big. you might gain some speed improvement with minimal effort by gathering the read data chunks into a list and then returning "".join(chunks) at the end. True, I'll be trying that and reporting back, but, more interestingly, I did some analysis with wireshark (only 200MB-odd of .pcap logs that was fun ;-) to see the differences in the http conversation and noticed more interestingness... So, httplib does this: GET / HTTP/1.1 Host: Accept-Encoding: identity Authorization: Basic HTTP/1.1 200 OK Date: Fri, 04 Sep 2009 11:44:22 GMT Server: Apache-Coyote/1.1 ContentLength: 116245504 Content-Type: application/vnd.excel Transfer-Encoding: chunked While wget does this: GET / HTTP/1.0 User-Agent: Wget/1.11.4 Accept: */* Host: Connection: Keep-Alive Authorization: Basic HTTP/1.1 200 OK Date: Fri, 04 Sep 2009 11:35:19 GMT Server: Apache-Coyote/1.1 ContentLength: 116245504 Content-Type: application/vnd.excel Connection: close Interesting points: - Apache in this instance responds with HTTP 1.1, even though the wget request was 1.0, is that allowed? - Apache responds with a chunked response only to httplib. Why is that? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to debug httplib slowness
On Fri, Sep 04, 2009 at 04:02:39PM +0100, Chris Withers wrote: > So, httplib does this: > > GET / HTTP/1.1 [skip] > While wget does this: > > GET / HTTP/1.0 [skip] > - Apache responds with a chunked response only to httplib. Why is that? Probably because wget uses HTTP/1.0? Oleg. -- Oleg Broytmannhttp://phd.pp.ru/[email protected] Programmers don't die, they just GOSUB without RETURN. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to debug httplib slowness
Chris Withers wrote:
> - Apache in this instance responds with HTTP 1.1, even though the wget
> request was 1.0, is that allowed?
>
> - Apache responds with a chunked response only to httplib. Why is that?
>
I find it very confusing that you say "Apache" since your really want to
say "Coyote" which is to say "Tomcat".
http11processor.j...@1547:
if (entityBody && http11 && keepAlive) {
outputBuffer.addActiveFilter
(outputFilters[Constants.CHUNKED_FILTER]);
contentDelimitation = true;
headers.addValue(
Constants.TRANSFERENCODING).setString(Constants.CHUNKED);
} else {
outputBuffer.addActiveFilter
(outputFilters[Constants.IDENTITY_FILTER]);
}
So, as Oleg said, it's because httplib talks HTTP/1.1 and wget talks
HTTP/1.0. All HTTP/1.1 client must support chunked transfer-encoding,
and apparently Tomcat/Coyote defaults to that unless it is either an
empty message, not a HTTP/1.1 client, or the request is not to be kept
alive ("Connection: close" or no more keep-alive slots on the server).
As Simon said, changing this to do ''.join(chunks) is really the best
first step to take.
-Scott
--
Scott Dial
[email protected]
[email protected]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (08/28/09 - 09/04/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2374 open (+24) / 16285 closed (+18) / 18659 total (+42) Open issues with patches: 939 Average duration of open issues: 661 days. Median duration of open issues: 415 days. Open Issues Breakdown open 2341 (+23) pending32 ( +1) Issues Created Or Reopened (45) ___ Multiple buffer overflows in unicode processing 09/02/09 http://bugs.python.org/issue2620reopened brett.cannon patch os.getpwent returns unsigned 32bit value, os.setuid refuses it 09/02/09 http://bugs.python.org/issue5705reopened brett.cannon patch, 64bit math.log, log10 inconsistency08/28/09 CLOSED http://bugs.python.org/issue6765reopened tjreedy invalid print in tkinter\test\test_ttk\test_widgets.py 08/28/09 CLOSED http://bugs.python.org/issue6796created keithc When Beginning Expression with Lookahead Assertion I get no Matc 08/28/09 CLOSED http://bugs.python.org/issue6797created jwindle Argument for sys.settrace() callbacks documented incorrectly 08/28/09 http://bugs.python.org/issue6798created robert.kern mimetypes does not give cannonical extension for guess_extension 08/29/09 http://bugs.python.org/issue6799created ptarjan os.exec* raises "OSError: [Errno 45] Operation not supported" in 08/29/09 http://bugs.python.org/issue6800created rnk symmetric_difference_update documentation fix08/30/09 CLOSED http://bugs.python.org/issue6801reopened r.david.murray build fails on Snow Leopard 08/29/09 http://bugs.python.org/issue6802created jmr patch Context manager docs refer to contextlib.contextmanager as conte 08/29/09 CLOSED http://bugs.python.org/issue6803created dhaffey patch IDLE: Detect Python files even if name doesn't end in .py08/29/09 http://bugs.python.org/issue6804created gagenellina patch Should be URL for documentation of current release of Python 3 ( 08/30/09 CLOSED http://bugs.python.org/issue6805created MLModel test_platform fails under Snow Leopard 08/30/09 CLOSED http://bugs.python.org/issue6806created brett.cannon easy No such file or directory: 'msisupport.dll' in msi.py08/30/09 http://bugs.python.org/issue6807created pds python 3.1 documentation tutorial classes08/30/09 CLOSED http://bugs.python.org/issue6808created tom_morse Python string.lstrip bug?08/31/09 CLOSED http://bugs.python.org/issue6809created mushywushy add link to the documentation of signal.signal 08/31/09 CLOSED http://bugs.python.org/issue6810created Yinon add a filename argument to marshal.load* 08/31/09 http://bugs.python.org/issue6811created brett.cannon
[Python-Dev] httplib slowness solved - which branches to merge to?
Antoine Pitrou wrote: Simon Cross gmail.com> writes: Well, since the source for _read_chunked includes the comment # XXX This accumulates chunks by repeated string concatenation, # which is not efficient as the number or size of chunks gets big. you might gain some speed improvement with minimal effort by gathering the read data chunks into a list and then returning "".join(chunks) at the end. +1 for trying this. Given differences between platforms in realloc() performance, it might be the reason why it goes unnoticed under Linux but degenerates under Windows. And how! The following change dropped the download time using httplib to 2.3 seconds: http://svn.python.org/view/python/trunk/Lib/httplib.py?r1=74523&r2=74655 As a sidenote, it is interesting that even an stdlib module makes this mistake and acknowledges it without trying to fix it. No longer in this case ;-) The fix is applied on the trunk, but the problem still exists on the 2.6 branch, 3.1 branch and 3.2 branch. Which of these should I merge to? I assume all of them? Do I need to update any changelog files or similar to indicate this bug has been fixed? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] httplib slowness solved - which branches to merge to?
Chris Withers simplistix.co.uk> writes: > > The fix is applied on the trunk, but the problem still exists on the 2.6 > branch, 3.1 branch and 3.2 branch. > > Which of these should I merge to? I assume all of them? The performance problem is sufficiently serious that it should be considered a bug so, yes, you should merge to all remaining branches (3.2, 2.6 and 3.1). > Do I need to update any changelog files or similar to indicate this bug > has been fixed? Yes, add an entry to Misc/NEWS under the "Library" section using the appropriate conventions. Thanks Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] test failure on py3k branch in test___all__
Hi All,
Anyone know what's causing this failure?
test test___all__ failed -- Traceback (most recent call last):
File "Lib/test/test___all__.py", line 106, in test_all
self.check_all("profile")
File "Lib/test/test___all__.py", line 23, in check_all
exec("from %s import *" % modname, names)
File "", line 1, in
AttributeError: 'module' object has no attribute 'help'
cheers,
Chris
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] test failure on py3k branch in test___all__
Chris Withers schrieb:
> Hi All,
>
> Anyone know what's causing this failure?
>
> test test___all__ failed -- Traceback (most recent call last):
>File "Lib/test/test___all__.py", line 106, in test_all
> self.check_all("profile")
>File "Lib/test/test___all__.py", line 23, in check_all
> exec("from %s import *" % modname, names)
>File "", line 1, in
> AttributeError: 'module' object has no attribute 'help'
My fault -- fixed in r74661.
Georg
--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to debug httplib slowness
On Fri, Sep 4, 2009 at 4:28 AM, Simon Cross wrote: > On Fri, Sep 4, 2009 at 1:11 PM, Chris Withers wrote: >> Am I right in reading this as most of the time is being spent in httplib's >> HTTPResponse._read_chunked and none of the methods it calls? >> >> If so, is there a better way that a bunch of print statements to find where >> in that method the time is being spent? > > Well, since the source for _read_chunked includes the comment > > # XXX This accumulates chunks by repeated string concatenation, > # which is not efficient as the number or size of chunks gets big. > > you might gain some speed improvement with minimal effort by gathering > the read data chunks into a list and then returning "".join(chunks) at > the end. +1 on trying this. Constructing a 116MB string by concatenating 1KB buffers surely must take forever. (116MB divided by 85125 recv() calls give 1365 byte per chunk, which is awful.) The HTTP/1.0 business looks like a red herring. Also agreed that this is an embarrassment. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to debug httplib slowness
Guido van Rossum wrote: +1 on trying this. Constructing a 116MB string by concatenating 1KB buffers surely must take forever. (116MB divided by 85125 recv() calls give 1365 byte per chunk, which is awful.) The HTTP/1.0 business looks like a red herring. Also agreed that this is an embarrassment. Embarrassment eradicated ;-) http://bugs.python.org/issue6838 Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how to debug httplib slowness
Guido van Rossum python.org> writes: > > +1 on trying this. Constructing a 116MB string by concatenating 1KB > buffers surely must take forever. (116MB divided by 85125 recv() calls > give 1365 byte per chunk, which is awful.) It certainly is quite small but perhaps the server tries to stay below the detected MTU? (not that there is necessarily any point in doing so for most HTTP content, IMO... except perhaps when the client does some progressive decoding) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial migration: help needed
On 30/08/2009 9:37 PM, Martin Geisler wrote: Mark Hammond writes: 1) I've stalled on the 'none:' patch I promised to resurrect. While doing this, I re-discovered that the tests for win32text appear to check win32 line endings are used by win32text on *all* platforms, not just Windows. I think it is only Patrick Mezard who knows how to run (parts of) the test suite on Windows. I asked for advice from Dirkjan who referred me to the mercurual-devel list, but my request of slightly over a week ago remains unanswered (http://selenic.com/pipermail/mercurial-devel/2009-August/014873.html) - maybe I just need to be more patient... Oh no, that's usually the wrong tactic :-) I've been too busy for real Mercurial work the last couple of weeks, but you should not feel bad about poking us if you don't get a reply. Or come to the IRC channel (#mercurial on irc.freenode.net) where Dirkjan (djc) and myself (mg) hang out when it's daytime in Europe. To be fair, I did mail Dirkjan directly who referred me to the -develop list, which I did with a CC to him and a private mail asking for some help should the mail fall on deaf ears as I feared it would. There really is only so far I'm willing to poke and prod people when I'm well aware we are all volunteers. Further, Martin's comments in this thread indicate he believes a new extension will be necessary rather than 'fixing' win32text. If this is the direction we take, it may mean the none: patch, which targets the implementation of win32text, is no longer necessary anyway. I suggested a new extension for two reasons: ... Thanks, and that does indeed sound excellent. However, this is going a fair way beyond the original scope I signed up for. While I was willing to help implement some new features into an existing extension, taking on the design and implementation of an entire new extension is something I'm not willing to undertake. I don't think such an extension should even come from the Python community or it will end up being a python-only extension - or at best, will need to run the gauntlet of 2 bike-shedding sessions from both the Python and hg communities which will waste much time. What is the hope of an EOL extension which meets our requirements coming directly out of the hg community? If that hope is small, where does that leave us? Cheers, Mark ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
