[Python-Dev] Metaclass problem in the "with" statement semantics in PEP 343
Given the current semantics of PEP 343 and the following class: class null_context(object): def __context__(self): return self def __enter__(self): return self def __exit__(self, *exc_info): pass Mistakenly writing: with null_context: # Oops, passed the class instead of an instance Would give a less than meaningful error message: TypeError: unbound method __context__() must be called with null_context instance as first argument (got nothing instead) It's the usual metaclass problem with invoking a slot (or slot equivalent) via "obj.__slot__()" rather than via "type(obj).__slot__(obj)" the way the underlying C code does. I think we need to fix the proposed semantics so that they access the slots via the type, rather than directly through the instance. Otherwise the slots for the with statement will behave strangely when compared to the slots for other magic methods. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] urlparse brokenness
OK, you've convinced me. But for backwards compatibility (until Python 3000), a new API should be designed. We can't change the old API in an incompatible way. Please submit complete code + docs to SF. (If you think this requires much design work, a PEP may be in order but I think that given the new RFCs it's probably straightforward enough to not require that. --Guido On 11/27/05, Mike Brown <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > IIRC I did it this way because the RFC about parsing urls specifically > > prescribed it had to be done this way. > > That was true as of RFC 1808 (1995-1998), although the grammar actually > allowed for a more generic interpretation. > > Such an interpretation was suggested in RFC 2396 (1998-2004) via a regular > expression for parsing URI 'references' (a formal abstraction introduced in > 2396) into 5 components (not six, since 'params' were moved into 'path' > and eventually became an option on every path segment, not just the end > of the path). The 5 components are: > > scheme, authority (formerly netloc), path, query, fragment. > > Parsing could result in some components being undefined, which is distinct > from being empty (e.g., 'mailto:[EMAIL PROTECTED]' would have an undefined > authority > and fragment, and a defined, but empty, query). > > RFC 3986 / STD 66 (2005-) did not change the regular expression, but makes > several references to these '5 major components' of a URI, and says that these > components are scheme-independent; parsers that operate at the generic syntax > level "can parse any URI reference into its major components. Once the scheme > is determined, further scheme-specific parsing can be performed on the > components." > > > You have to know what the scheme means before you can > > parse the rest -- there is (by design!) no standard parsing for > > anything that follows the scheme and the colon. > > Not since 1998, IMHO. It was implicit, at least since RFC 2396, that all URI > references can be interpreted as having the 5 components, it was made explicit > in RFC 3986 / STD 66. > > > I don't even think > > that you can trust that if the colon is followed by two slashes that > > what follows is a netloc for all schemes. > > You can. > > > But if there's an RFC that says otherwise I'll gladly concede; > > urlparse's main goal in life is to b RFC compliant. > > Its intent seems to be to split a URI into its major components, which are now > by definition scheme-independent (and have been, implicitly, for a long time), > so the function shouldn't distinguish between schemes. > > Do you want to keep returning that 6-tuple, or can we make it return a > 5-tuple? If we keep returning 'params' for backward compatibility, then that > means the 'path' we are returning is not the 'path' that people would expect > (they'll have to concatenate path+params to get what the generic syntax calls > a 'path' nowadays). It's also deceptive because params are now allowed on all > path segments, and the current function only takes them from the last segment. > > Also for backward compatibility, should an absent component continue to > manifest in the result as an empty string? I think a compliant parser should > make a distinction between absent and empty (it could make a difference, in > theory). > > If a regular expression were used for parsing, it would produce None for > absent components and empty-string for empty ones. I implemented it this > way in 4Suite's Ft.Lib.Uri and it works nicely. > > Mike > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Metaclass problem in the "with" statement semantics in PEP 343
On 11/28/05, Nick Coghlan <[EMAIL PROTECTED]> wrote: > Given the current semantics of PEP 343 and the following class: > >class null_context(object): > def __context__(self): > return self > def __enter__(self): > return self > def __exit__(self, *exc_info): > pass > > Mistakenly writing: > > with null_context: > # Oops, passed the class instead of an instance > > Would give a less than meaningful error message: > > TypeError: unbound method __context__() must be called with null_context > instance as first argument (got nothing instead) > > It's the usual metaclass problem with invoking a slot (or slot equivalent) via > "obj.__slot__()" rather than via "type(obj).__slot__(obj)" the way the > underlying C code does. > > I think we need to fix the proposed semantics so that they access the slots > via the type, rather than directly through the instance. Otherwise the slots > for the with statement will behave strangely when compared to the slots for > other magic methods. Maybe it's because I'm just an old fart, but I can't make myself care about this. The code is broken. You get an error message. It even has the correct exception (TypeError). In this particular case the error message isn't that great -- well, the same is true in many other cases (like whenever the invocation is a method call from Python code). That most built-in operations produce a different error message doesn't mean we have to make *all* built-in operations use the same approach. I fail to see the value of the consistency you're calling for. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (no subject)
On 11/24/05, Duncan Grisby <[EMAIL PROTECTED]> wrote: > Hi, > > I posted this to comp.lang.python, but got no response, so I thought I > would consult the wise people here... > > I have encountered a problem with the re module. I have a > multi-threaded program that does lots of regular expression searching, > with some relatively complex regular expressions. Occasionally, events > can conspire to mean that the re search takes minutes. That's bad > enough in and of itself, but the real problem is that the re engine > does not release the interpreter lock while it is running. All the > other threads are therefore blocked for the entire time it takes to do > the regular expression search. Rather than trying to fight the GIL, I suggest that you let a regex expert look at your regex(es) and the input that causes the long running times. As Fredrik suggested, certain patterns are just inefficient but can be rewritten more efficiently. There are plenty of regex experts on c.l.py. Unless you have a multi-CPU box, the performance of your app isn't going to improve by releasing the GIL -- it only affects the responsiveness of other threads. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Patch Req. # 1351020 & 1351036: PythonD modifications
On 11/20/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > The local python community here in Sydney indicated that python.org is > > only upset when groups port the source to 'obscure' systems and *don't* > > submit patches... It is possible that I was misinformed. > > I never heard such concerns. I personally wouldn't notice if somebody > ported Python, and did not feed back the patches. I guess that I'm the source of that sentiment. My reason for wanting people to contribute ports back is that if they don't, the port is more likely to stick on some ancient version of Python (e.g. I believe Nokia is still at 2.2.2). Then, assuming the port remains popular, its users are going to pressure developers of general Python packages to provide support for old versions of Python. While I agree that maintaining port-specific code is a pain whenever Python is upgraded, I still think that accepting patches for odd-platform ports is the better alternative. Even if the patches deteriorate as Python evolves, they should still (in principle) make a re-port easier. Perhaps the following compromise can be made: the PSF accepts patches from reputable platform maintainers. (Of course, like all contributions, they must be of high quality and not break anything, etc., before they are accepted.) If such patches cause problems with later Python versions, the PSF won't maintain them, but instead invite the original contributors (or other developers who are interested in that particular port) to fix them. If there is insufficient response, or if it comes too late given the PSF release schedule, the PSF developers may decide to break or remove support for the affected platform. There's a subtle balance between keeping too much old cruft and being too aggressive in removing cruft that still serves a purpose for someone. I bet that we've erred in both directions at times. > Sometimes, people ask "there is this and that port, why isn't it > integrated", to which the answer is in most cases "because authors > didn't contribute". This is not being upset - it is merely a fact. > This port (djgcc) is the first one in a long time (IIRC) where > anybody proposed rejecting it. > > > I am not sure about the future myself. DJGPP 2.04 has been parked at beta > > for two years now. It might be fair to say that the *general* DJGPP > > developer base has shrunk a little bit. But the PythonD userbase has > > actually grown since the first release three years ago. For the time > > being, people get very angry when the servers go down here :-) > > It's not that much availability of the platform I worry about, but the > commitment of the Python porter. We need somebody to forward bug > reports to, and somebody to intervene if incompatible changes are made. > This person would also indicate that the platform is no longer > available, and hence the port can be removed. It sounds like Ben Decker is for the time being volunteering to provide patches and to maintain them. (I hope I'm reading you right, Ben.) I'm +1 on accepting his patches, *provided* as always they pass muster in terms of general Python development standards. (Jeff Epler's comments should be taken to heart.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SRE should release the GIL (was: no subject)
On Monday 28 November, Guido van Rossum wrote: > On 11/24/05, Duncan Grisby <[EMAIL PROTECTED]> wrote: > > I have encountered a problem with the re module. I have a > > multi-threaded program that does lots of regular expression searching, > > with some relatively complex regular expressions. Occasionally, events > > can conspire to mean that the re search takes minutes. That's bad > > enough in and of itself, but the real problem is that the re engine > > does not release the interpreter lock while it is running. All the > > other threads are therefore blocked for the entire time it takes to do > > the regular expression search. > > Rather than trying to fight the GIL, I suggest that you let a regex > expert look at your regex(es) and the input that causes the long > running times. As Fredrik suggested, certain patterns are just > inefficient but can be rewritten more efficiently. There are plenty of > regex experts on c.l.py. Part of the problem is certainly inefficient regexes, and we have improved things to some extent by changing some of them. Unfortunately, the regexes come from user input, so we can't be certain that our users aren't going to do stupid things. It's not too bad if a stupid regex slows things down for a bit, but it is bad if it causes the whole application to freeze for minutes at a time. > Unless you have a multi-CPU box, the performance of your app isn't > going to improve by releasing the GIL -- it only affects the > responsiveness of other threads. We do have a multi-CPU box. Even with good regexes, regex matching takes up a significant proportion of the time spent processing in our application, so being able to release the GIL will hopefully increase performance overall as well as increasing responsiveness. We are currently testing our application with the patch to sre that Eric posted. Once we get on to some performance tests, we'll post the results of whether releasing the GIL does make a measurable difference for us. Cheers, Duncan. -- -- Duncan Grisby -- -- [EMAIL PROTECTED] -- -- http://www.grisby.org -- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Patch Req. # 1351020 & 1351036: PythonD modifications
Guido van Rossum wrote: > Perhaps the following compromise can be made: the PSF accepts patches > from reputable platform maintainers. (Of course, like all > contributions, they must be of high quality and not break anything, > etc., before they are accepted.) If such patches cause problems with > later Python versions, the PSF won't maintain them, but instead invite > the original contributors (or other developers who are interested in > that particular port) to fix them. If there is insufficient response, > or if it comes too late given the PSF release schedule, the PSF > developers may decide to break or remove support for the affected > platform. This is indeed the compromise I was after. If the contributors indicate that they will maintain it for some time (which happened in this case), then I can happily accept any port (and did indeed in the past). In the specific case, there is an additional twist that we deliberately removed DOS support some time ago, and listed that as officially removed in a PEP. I understand that djgpp somehow isn't quite the same as DOS, although I don't understand the differences (anymore). But if it's fine with you, it is fine with me. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Bug day this Sunday?
Is anyone interested in joining a Python bug day this Sunday? A useful task might be to prepare for the python-core sprint at PyCon by going through the bug and patch managers, and listing bugs/patches that would be good candidates for working on at PyCon. We'd meet in the usual location: #python-dev on irc.freenode.net, from roughly 9AM to 3PM Eastern (2PM to 8PM UTC) on Sunday Dec. 4. --amk ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposed additional keyword argument in logging calls
On 11/22/05, Vinay Sajip <[EMAIL PROTECTED]> wrote: > On numerous occasions, requests have been made for the ability to easily add > user-defined data to logging events. For example, a multi-threaded server > application may want to output specific information to a particular server > thread (e.g. the identity of the client, specific protocol options for the > client connection, etc.) > > This is currently possible, but you have to subclass the Logger class and > override its makeRecord method to put custom attributes in the LogRecord. > These can then be output using a customised format string containing e.g. > "%(foo)s %(bar)d". The approach is usable but requires more work than > necessary. > > I'd like to propose a simpler way of achieving the same result, which > requires use of an additional optional keyword argument in logging calls. > The signature of the (internal) Logger._log method would change from > > def _log(self, level, msg, args, exc_info=None) > > to > > def _log(self, level, msg, args, exc_info=None, extra_info=None) > > The extra_info argument will be passed to Logger.makeRecord, whose signature > will change from > > def makeRecord(self, name, level, fn, lno, msg, args, exc_info): > > to > > def makeRecord(self, name, level, fn, lno, msg, args, exc_info, > extra_info) > > makeRecord will, after doing what it does now, use the extra_info argument > as follows: > > If type(extra_info) != types.DictType, it will be ignored. > > Otherwise, any entries in extra_info whose keys are not already in the > LogRecord's __dict__ will be added to the LogRecord's __dict__. > > Can anyone see any problems with this approach? If not, I propose to post > the approach on python-list and then if there are no strong objections, > check it in to the trunk. (Since it could break existing code, I'm assuming > (please correct me if I'm wrong) that it shouldn't go into the > release24-maint branch.) This looks like a good clean solution to me. I agree with Paul Moore's suggestion that if extra_info is not None you should just go ahead and use it as a dict and let the errors propagate. What's the rationale for not letting it override existing fields? (There may be a good one, I just don't see it without turning on my thinking cap, which would cost extra. :-) Perhaps it makes sense to call it 'extra' instead of 'extra_info'? As a new feature it should definitely not go into 2.4; but I don't see how it could break existing code. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
On 11/18/05, Neil Schemenauer <[EMAIL PROTECTED]> wrote: > Perhaps we should use the memory management technique that the rest > of Python uses: reference counting. I don't see why the AST > structures couldn't be PyObjects. Me neither. Adding yet another memory allocation scheme to Python's already staggering number of memory allocation strategies sounds like a bad idea. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] something is wrong with test___all__
Has this been handled yet? If not, perhaps showing the good and bad bytecode here would help trigger someone's brain into understanding the problem. On 11/22/05, Reinhold Birkenfeld <[EMAIL PROTECTED]> wrote: > Hi, > > on my machine, "make test" hangs at test_colorsys. > > Careful investigation shows that when the bytecode is freshly generated > by "make all" (precisely in test___all__) the .pyc file is different from > what a > direct call to "regrtest.py test_colorsys" produces. > > Curiously, a call to "regrtest.py test___all__" instead of "make test" > produces > the correct bytecode. > > I can only suspect some AST bug here. > > Reinhold > > -- > Mail address is perfectly valid! > > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
On 11/28/05, Guido van Rossum <[EMAIL PROTECTED]> wrote: > On 11/18/05, Neil Schemenauer <[EMAIL PROTECTED]> wrote: > > Perhaps we should use the memory management technique that the rest > > of Python uses: reference counting. I don't see why the AST > > structures couldn't be PyObjects. > > Me neither. Adding yet another memory allocation scheme to Python's > already staggering number of memory allocation strategies sounds like > a bad idea. The reason this thread started was the complaint that reference counting in the compiler is really difficult. Almost every line of code can lead to an error exit. The code becomes quite cluttered when it uses reference counting. Right now, the AST is created with malloc/free, but that makes it hard to free the ast at the right time. It would be fairly complex to convert the ast nodes to pyobjects. They're just simple discriminated unions right now. If they were allocated from an arena, the entire arena could be freed when the compilation pass ends. Jeremy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Patch Req. # 1351020 & 1351036: PythonD modifications
Guido van Rossum wrote: > I don't recall why DOS support was removed (PEP 11 doesn't say) The PEP was actually created after the removal, so you added (or asked me to add) this entry: Name: MS-DOS, MS-Windows 3.x Unsupported in: Python 2.0 Code removed in: Python 2.1 Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Patch Req. # 1351020 & 1351036: PythonD modifications
On 11/28/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > Perhaps the following compromise can be made: the PSF accepts patches > > from reputable platform maintainers. (Of course, like all > > contributions, they must be of high quality and not break anything, > > etc., before they are accepted.) If such patches cause problems with > > later Python versions, the PSF won't maintain them, but instead invite > > the original contributors (or other developers who are interested in > > that particular port) to fix them. If there is insufficient response, > > or if it comes too late given the PSF release schedule, the PSF > > developers may decide to break or remove support for the affected > > platform. > > This is indeed the compromise I was after. If the contributors indicate > that they will maintain it for some time (which happened in this case), > then I can happily accept any port (and did indeed in the past). > > In the specific case, there is an additional twist that we deliberately > removed DOS support some time ago, and listed that as officially removed > in a PEP. I understand that djgpp somehow isn't quite the same as DOS, > although I don't understand the differences (anymore). > > But if it's fine with you, it is fine with me. Thanks. :-) I say, the more platforms the merrier. I don't recall why DOS support was removed (PEP 11 doesn't say) but I presume it was just because nobody volunteered to maintain it, not because we have a particularly dislike for DOS. So now that we have a volunteer let's deal with his patches without prejudice. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
On 11/28/05, Jeremy Hylton <[EMAIL PROTECTED]> wrote: > On 11/28/05, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > On 11/18/05, Neil Schemenauer <[EMAIL PROTECTED]> wrote: > > > Perhaps we should use the memory management technique that the rest > > > of Python uses: reference counting. I don't see why the AST > > > structures couldn't be PyObjects. > > > > Me neither. Adding yet another memory allocation scheme to Python's > > already staggering number of memory allocation strategies sounds like > > a bad idea. > > The reason this thread started was the complaint that reference > counting in the compiler is really difficult. Almost every line of > code can lead to an error exit. Sorry, I forgot that (I've been off-line for a week of quality time with Orlijn, and am now digging my self out from under several hundred emails :-). > The code becomes quite cluttered when > it uses reference counting. Right now, the AST is created with > malloc/free, but that makes it hard to free the ast at the right time. Would fixing the code to add free() calls in all the error exits make it more or less cluttered than using reference counting? > It would be fairly complex to convert the ast nodes to pyobjects. > They're just simple discriminated unions right now. Are they all the same size? > If they were > allocated from an arena, the entire arena could be freed when the > compilation pass ends. Then I don't understand why there was discussion of alloca() earlier on -- surely the lifetime of a node should not be limited by the stack frame that allocated it? I'm not in principle against having an arena for this purpose, but I worry that this will make it really hard to provide a Python API for the AST, which has already been requested and whose feasibility (unless I'm mistaken) also was touted as an argument for switching to the AST compiler in the first place. I hope we'll never have to deal with an API like the parser module provides... -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
On 11/28/05, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > The code becomes quite cluttered when > > it uses reference counting. Right now, the AST is created with > > malloc/free, but that makes it hard to free the ast at the right time. > > Would fixing the code to add free() calls in all the error exits make > it more or less cluttered than using reference counting? If we had an arena API, we'd only need to call free on the arena at top-level entry points. If an error occurs deeps inside the compiler, the arena will still get cleaned up by calling free at the top. > > It would be fairly complex to convert the ast nodes to pyobjects. > > They're just simple discriminated unions right now. > > Are they all the same size? No. Each type is a different size and there are actually a lot of types -- statements, expressions, arguments, slices, &c. All the objects of one type are the same size. > > If they were > > allocated from an arena, the entire arena could be freed when the > > compilation pass ends. > > Then I don't understand why there was discussion of alloca() earlier > on -- surely the lifetime of a node should not be limited by the stack > frame that allocated it? Actually this is a pretty good limit, because all these data structures are temporaries used by the compiler. Once compilation has finished, there's no need for the AST or the compiler state. > I'm not in principle against having an arena for this purpose, but I > worry that this will make it really hard to provide a Python API for > the AST, which has already been requested and whose feasibility > (unless I'm mistaken) also was touted as an argument for switching to > the AST compiler in the first place. I hope we'll never have to deal > with an API like the parser module provides... My preference would be to have the ast shared by value. We generate code to serialize it to and from a byte stream and share that between Python and C. It is less efficient, but it is also very simple. Jeremy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
Jeremy Hylton wrote: > The reason this thread started was the complaint that reference > counting in the compiler is really difficult. Almost every line of > code can lead to an error exit. The code becomes quite cluttered when > it uses reference counting. Right now, the AST is created with > malloc/free, but that makes it hard to free the ast at the right time. > It would be fairly complex to convert the ast nodes to pyobjects. > They're just simple discriminated unions right now. If they were > allocated from an arena, the entire arena could be freed when the > compilation pass ends. I haven't looked at the AST code at all so far, but my experience with gcc is that such an approach is fundamentally flawed: you would always have memory that ought to survive the parsing, so you will have to copy it out of the arena. This will either lead to dangling pointers, or garbage memory. So in gcc, they eventually moved to a full garbage collector (after several iterations). Reference counting has the advantage that you can always DECREF at the end of the function. So if you put all local variables at the beginning of the function, and all DECREFs at the end, getting clean memory management should be doable, IMO. Plus, contributors would be familiar with the scheme in place. I don't know if details have already been proposed, but I would update asdl to generate a hierarchy of classes: i.e. class mod(object):pass class Module(mod): def __init__(self, body): self.body = body # List of stmt #... class Expression(mod): def __init__(self, body): self.body = body # expr # ... class Raise(stmt): def __init__(self, dest, values, nl): self.dest # expr or None self.values # List of expr self.bl # bool (True or False) There would be convenience functions, like PyObject *mod_Module(PyObject* body); enum mod_kind mod_kind(PyObject* mod); // Module, Interactive, Expression, or mod_INVALID PyObject *mod_Expression_body(PyObject*); //... PyObject *stmt_Raise_dest(PyObject*); (whether the accessors return new or borrowed reference could be debated; plain C struct accesses would also be possible) Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
On Mon, Nov 28, 2005 at 03:47:07PM -0500, Jeremy Hylton wrote: > The reason this thread started was the complaint that reference > counting in the compiler is really difficult. I don't think that's exactly right. The problem is that the AST compiler mixes its own memory management strategy with reference counting and the result doesn't quite work. The AST compiler mainly keeps track of memory via containment: for example, if B is an attribute of A then B gets freed when A gets freed. That works fine as long as B is never shared. My memory of the problems is a little fuzzy. Maybe Neal Norwitz can explain it better. Neil ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
[Guido] > > Then I don't understand why there was discussion of alloca() earlier > > on -- surely the lifetime of a node should not be limited by the stack > > frame that allocated it? [Jeremy] > Actually this is a pretty good limit, because all these data > structures are temporaries used by the compiler. Once compilation has > finished, there's no need for the AST or the compiler state. Are you really saying that there is one function which is called only once (per compilation) which allocates *all* the AST nodes? That's the only situation where I'd see alloca() working -- unless your alloca() doesn't allocate memory on the stack. I was somehow assuming that the tree would be built piecemeal by parser callbacks or some such mechanism. There's still a stack frame whose lifetime limits the AST lifetime, but it is not usually the current stackframe when a new node is allocated, so alloca() can't be used. I guess I don't understand the AST compiler code enough to participate in this discussion. Or perhaps we are agreeing violently? > > I'm not in principle against having an arena for this purpose, but I > > worry that this will make it really hard to provide a Python API for > > the AST, which has already been requested and whose feasibility > > (unless I'm mistaken) also was touted as an argument for switching to > > the AST compiler in the first place. I hope we'll never have to deal > > with an API like the parser module provides... > > My preference would be to have the ast shared by value. We generate > code to serialize it to and from a byte stream and share that between > Python and C. It is less efficient, but it is also very simple. So there would still be a Python-objects version of the AST but the compiler itself doesn't use it. At least by-value makes sense to me -- if you're making tree transformations you don't want accidental sharing to cause unexpected side effects. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
On 11/28/05, Guido van Rossum <[EMAIL PROTECTED]> wrote: > [Guido] > > > Then I don't understand why there was discussion of alloca() earlier > > > on -- surely the lifetime of a node should not be limited by the stack > > > frame that allocated it? > > [Jeremy] > > Actually this is a pretty good limit, because all these data > > structures are temporaries used by the compiler. Once compilation has > > finished, there's no need for the AST or the compiler state. > > Are you really saying that there is one function which is called only > once (per compilation) which allocates *all* the AST nodes? Nope, there isn't for everything. It's just that some are temporary to internal functions and thus can stand to be freed later (unless my memory is really shot). Otherwise it is piece-meal. There is the main data structure such as the compiler struct and the top-level node for the AST, but otherwise everything (currently) is allocated as needed. > That's the > only situation where I'd see alloca() working -- unless your alloca() > doesn't allocate memory on the stack. I was somehow assuming that the > tree would be built piecemeal by parser callbacks or some such > mechanism. There's still a stack frame whose lifetime limits the AST > lifetime, but it is not usually the current stackframe when a new node > is allocated, so alloca() can't be used. > > I guess I don't understand the AST compiler code enough to participate > in this discussion. Or perhaps we are agreeing violently? > I don't think your knowledge of the codebase precludes your participation. Actually, I think it makes it even more important since if some scheme is devised that is not easily explained it is really going to hinder who can help out with maintenance and enhancements on the compiler. > > > I'm not in principle against having an arena for this purpose, but I > > > worry that this will make it really hard to provide a Python API for > > > the AST, which has already been requested and whose feasibility > > > (unless I'm mistaken) also was touted as an argument for switching to > > > the AST compiler in the first place. I hope we'll never have to deal > > > with an API like the parser module provides... > > > > My preference would be to have the ast shared by value. We generate > > code to serialize it to and from a byte stream and share that between > > Python and C. It is less efficient, but it is also very simple. > > So there would still be a Python-objects version of the AST but the > compiler itself doesn't use it. > Yep. The idea was be to return a PyString formatted ala the parser module where it is just a bunch of nested items in a Scheme-like format. There would then be Python or C code that would generate a Python object representation from that. Then, when you were finished tweaking the structure, you would write back out as a PyString and then recreate the internal representation. That makes it pass-by-value since you pass the serialized PyString version across the C-Python boundary. > At least by-value makes sense to me -- if you're making tree > transformations you don't want accidental sharing to cause unexpected > side effects. > Yeah, that could be bad. =) -Brett ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] reference leaks
Neal Norwitz wrote: > On 11/25/05, Walter Dörwald <[EMAIL PROTECTED]> wrote: >> Can you move the call to codecs.register_error() out of test_callbacks() >> and retry? > > It then leaks 3 refs on each call to test_callbacks(). This should be fixed now in r41555 and r41556. Bye, Walter Dörwald ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
Jeremy Hylton wrote: > Almost every line of > code can lead to an error exit. The code becomes quite cluttered when > it uses reference counting. I don't see why very many more error exits should become possible just by introducing refcounting. Errors are possible whenever you allocate something, however you do it, so you need error checks on all your allocations in any case. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
Neal Norwitz wrote:
> This is an entire function from Python/ast.c.
> Sequences do not know what type they hold, so there needs to be
> different dealloc functions to free them properly (asdl_*_seq_free()).
Well, that's one complication that would go away if
the nodes were PyObjects.
> The memory leak occurs when FunctionDef fails. name, args, body, and
> decorator_seq are all local and would not be freed. The simple
> variables can be freed in each "constructor" like FunctionDef(), but
> the sequences cannot unless they keep the info about which type they
> hold.
If FunctionDef's reference semantics are defined so
that it steals references to its arguments, then here
is how the same function would look with PyObject
AST nodes, as far as I can see:
static PyObject *
ast_for_funcdef(struct compiling *c, const node *n)
{
/* funcdef: 'def' [decorators] NAME parameters ':' suite */
PyObject *name = NULL;
PyObject *args = NULL;
PyObject *body = NULL;
PyObject *decorator_seq = NULL;
int name_i;
REQ(n, funcdef);
if (NCH(n) == 6) { /* decorators are present */
decorator_seq = ast_for_decorators(c, CHILD(n, 0));
if (!decorator_seq)
goto error;
name_i = 2;
}
else {
name_i = 1;
}
name = NEW_IDENTIFIER(CHILD(n, name_i));
if (!name)
goto error;
else if (!strcmp(STR(CHILD(n, name_i)), "None")) {
ast_error(CHILD(n, name_i), "assignment to None");
goto error;
}
args = ast_for_arguments(c, CHILD(n, name_i + 1));
if (!args)
goto error;
body = ast_for_suite(c, CHILD(n, name_i + 3));
if (!body)
goto error;
return FunctionDef(name, args, body, decorator_seq, LINENO(n));
error:
Py_XDECREF(body);
Py_XDECREF(decorator_seq);
Py_XDECREF(args);
Py_XDECREF(name);
return NULL;
}
The only things I've changed are turning some type
declarations into PyObject * and replacing the
deallocation functions at the end with Py_XDECREF!
Maybe there are other functions where it would not
be so straightforward, but if this really is a
typical AST function, switching to PyObjects looks
like it wouldn't be difficult at all, and would
actually make some things simpler.
--
Greg Ewing, Computer Science Dept, +--+
University of Canterbury, | A citizen of NewZealandCorp, a |
Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. |
[EMAIL PROTECTED] +--+
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
Here's a somewhat radical idea: Why not write the parser and bytecode compiler in Python? A .pyc could be bootstrapped from it and frozen into the executable. -- Greg Ewing, Computer Science Dept, +--+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | [EMAIL PROTECTED] +--+ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
Neal Norwitz wrote:
> Hope this helps explain a bit. Please speak up with how this can be
> improved. Gotta run.
I would rewrite it as
static PyObject*
ast_for_funcdef(struct compiling *c, const node *n)
{
/* funcdef: [decorators] 'def' NAME parameters ':' suite */
PyObject *name = NULL;
PyObject *args = NULL;
PyObject *body = NULL;
PyObject *decorator_seq = NULL;
PyObject *result = NULL;
int name_i;
REQ(n, funcdef);
if (NCH(n) == 6) { /* decorators are present */
decorator_seq = ast_for_decorators(c, CHILD(n, 0));
if (!decorator_seq)
goto error;
name_i = 2;
}
else {
name_i = 1;
}
name = NEW_IDENTIFIER(CHILD(n, name_i));
if (!name)
goto error;
else if (!strcmp(STR(CHILD(n, name_i)), "None")) {
ast_error(CHILD(n, name_i), "assignment to None");
goto error;
}
args = ast_for_arguments(c, CHILD(n, name_i + 1));
if (!args)
goto error;
body = ast_for_suite(c, CHILD(n, name_i + 3));
if (!body)
goto error;
result = FunctionDef(name, args, body, decorator_seq, LINENO(n));
error:
Py_XDECREF(name);
Py_XDECREF(args);
Py_XDECREF(body);
Py_XDECREF(decorator_seq);
return result;
}
The convention would be that ast_for_* returns new references, which
have to be released regardless of success or failure. FunctionDef
would duplicate all of its parameter references if it succeeds,
and leave them untouched if it fails.
One could develop a checker that verifies that:
a) all PyObject* local variables are initialized to NULL, and
b) all such variables are Py_XDECREF'ed after the error label.
c) result is initialized to NULL, and returned.
Then, "goto error" at any point in the code would be correct
(assuming an exception had been set prior to the goto).
No special release function for the body or the decorators
would be necessary - they would be plain Python lists.
Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
On 11/28/05, Greg Ewing <[EMAIL PROTECTED]> wrote: > Here's a somewhat radical idea: > > Why not write the parser and bytecode compiler in Python? > > A .pyc could be bootstrapped from it and frozen into > the executable. > Is there a specific reason you are leaving out the AST, Greg, or do you count that as part of the bytecode compiler (I think of that as the AST->bytecode step handled by Python/compile.c)? While ease of maintenance would be fantastic and would probably lead to much more language experimentation if more of the core parts of Python were written in Python, I would worry about performance. While generating bytecode is not necessarily an everytime thing, I know Guido has said he doesn't like punishing the performance of small scripts in the name of large-scale apps (reason why interpreter startup time has always been an issue) which tend not to have a .pyc file. -Brett ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CVS repository mostly closed now
On 11/27/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > I tried removing the CVS repository from SF; it turns > out that this operation is not supported. Instead, it > is only possible to remove it from the project page; > pserver and ssh access remain indefinitely, as does > viewcvs. There's a hacky trick to remove them: put rm -rf $CVSROOT/src into CVSROOT/loginfo and remove the line then and commit again. :) Hye-Shik ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
On 11/28/05, Guido van Rossum <[EMAIL PROTECTED]> wrote:
>
> I guess I don't understand the AST compiler code enough to participate
> in this discussion.
I hope everyone while chime in here. This is important to improve and
learn from others.
Let me try to describe the current situation with a small amount of
code. Hopefully it will give some idea of the larger problems.
This is an entire function from Python/ast.c. It demonstrates the
issues fairly clearly. It contains at least one memory leak. It uses
asdl_seq which are barely more than somewhat dynamic arrays.
Sequences do not know what type they hold, so there needs to be
different dealloc functions to free them properly (asdl_*_seq_free()).
ast_for_*() allocate memory, so in case of an error, the memory will
need to be freed. Most of this memory is internal to the AST code.
However, there are some identifiers (PyString's) that must be
DECREF'ed. See below for the memory leak.
static stmt_ty
ast_for_funcdef(struct compiling *c, const node *n)
{
/* funcdef: 'def' [decorators] NAME parameters ':' suite */
identifier name = NULL;
arguments_ty args = NULL;
asdl_seq *body = NULL;
asdl_seq *decorator_seq = NULL;
int name_i;
REQ(n, funcdef);
if (NCH(n) == 6) { /* decorators are present */
decorator_seq = ast_for_decorators(c, CHILD(n, 0));
if (!decorator_seq)
goto error;
name_i = 2;
}
else {
name_i = 1;
}
name = NEW_IDENTIFIER(CHILD(n, name_i));
if (!name)
goto error;
else if (!strcmp(STR(CHILD(n, name_i)), "None")) {
ast_error(CHILD(n, name_i), "assignment to None");
goto error;
}
args = ast_for_arguments(c, CHILD(n, name_i + 1));
if (!args)
goto error;
body = ast_for_suite(c, CHILD(n, name_i + 3));
if (!body)
goto error;
return FunctionDef(name, args, body, decorator_seq, LINENO(n));
error:
asdl_stmt_seq_free(body);
asdl_expr_seq_free(decorator_seq);
free_arguments(args);
Py_XDECREF(name);
return NULL;
}
The memory leak occurs when FunctionDef fails. name, args, body, and
decorator_seq are all local and would not be freed. The simple
variables can be freed in each "constructor" like FunctionDef(), but
the sequences cannot unless they keep the info about which type they
hold. That would help quite a bit, but I'm not sure it's the
right/best solution.
Hope this helps explain a bit. Please speak up with how this can be
improved. Gotta run.
n
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CVS repository mostly closed now
On Monday 28 November 2005 20:14, 장혜식 wrote: > There's a hacky trick to remove them: > put rm -rf $CVSROOT/src into CVSROOT/loginfo > and remove the line then and commit again. :) Wow, that is tricky! Glad it wasn't me who thought of this one. :-) -Fred -- Fred L. Drake, Jr. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
Brett Cannon wrote: > Is there a specific reason you are leaving out the AST, Greg, or do > you count that as part of the bytecode compiler No, I consider it part of the parser. My mental model of parsing & compiling in the presence of a parse tree is like this: [source] -> scanner -> [tokens] -> parser -> [AST] -> code_generator -> [code] The fact that there still seems to be another kind of parse tree in between the scanner and the AST generator is an oddity which I hope will eventually disappear. > I know > Guido has said he doesn't like punishing the performance of small > scripts in the name of large-scale apps To me, that's an argument in favour of always generating a .pyc, even for scripts. Greg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] CVS repository mostly closed now
장혜식 wrote: > There's a hacky trick to remove them: > put rm -rf $CVSROOT/src into CVSROOT/loginfo > and remove the line then and commit again. :) Sure :-) SF makes a big fuss as to how good a service this is: open source will never go away. I tend to agree, somewhat. For historical reasons, it is surely nice to be able to browse the CVS repository (in particular if you need to correlate CVS revision numbers and svn revision numbers); also, people can take any time they want to convert CVS sandboxes. So instead of hacking them, I thought we better comply. With the mechanics in place, anybody should notice we switched to subversion (but I will write something on c.l.p.a, anyway). Regards, Martin P.S. Sorry for not getting your name right in the To: field; that's thunderbird. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
On 11/28/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Neal Norwitz wrote:
> > Hope this helps explain a bit. Please speak up with how this can be
> > improved. Gotta run.
>
> I would rewrite it as
[code snipped]
For those watching, Greg's and Martin's version were almost the same.
However, Greg's version left in the memory leak, while Martin fixed it
by letting the result fall through. Martin added some helpful rules
about dealing with the memory. Martin also gets bonus points for
talking about developing a checker. :-)
In both cases, their modified code is similar to the existing AST
code, but all deallocation is done with Py_[X]DECREFs rather than a
type specific deallocator. Definitely nicer than the current
situation. It's also the same as the rest of the python code.
With arenas the code would presumably look something like this:
static stmt_ty
ast_for_funcdef(struct compiling *c, const node *n)
{
/* funcdef: 'def' [decorators] NAME parameters ':' suite */
identifier name;
arguments_ty args;
asdl_seq *body;
asdl_seq *decorator_seq = NULL;
int name_i;
REQ(n, funcdef);
if (NCH(n) == 6) { /* decorators are present */
decorator_seq = ast_for_decorators(c, CHILD(n, 0));
if (!decorator_seq)
return NULL;
name_i = 2;
}
else {
name_i = 1;
}
name = NEW_IDENTIFIER(CHILD(n, name_i));
if (!name)
return NULL;
Py_AST_Register(name);
if (!strcmp(STR(CHILD(n, name_i)), "None")) {
ast_error(CHILD(n, name_i), "assignment to None");
return NULL;
}
args = ast_for_arguments(c, CHILD(n, name_i + 1));
body = ast_for_suite(c, CHILD(n, name_i + 3));
if (!args || !body)
return NULL;
return FunctionDef(name, args, body, decorator_seq, LINENO(n));
}
All the goto's become return NULLs. After allocating a PyObject, it
would need to be registered (ie, the mythical Py_AST_Register(name)).
This is easier than using all PyObjects in that when an error occurs,
there's nothing to think about, just return. Only optional values
(like decorator_seq) need to be initialized. It's harder in that one
must remember to register any PyObject so it can be Py_DECREFed at the
end. Since the arena is allocated in big hunk(s), it would presumably
be faster than using PyObjects since there would be less memory
allocation (and fragmentation). It should be possible to get rid of
some of the conditionals too (I joined body and args above).
Using all PyObjects has another benefit that may have been mentioned
elsewhere, ie that the rest of Python uses the same techniques for
handling deallocation.
I'm not really advocating any particular approach. I *think* arenas
would be easiest, but it's not a clear winner. I think Martin's note
about GCC using GC is interesting. AFAIK GCC is a lot more complex
than the Python code, so I'm not sure it's 100% relevant. OTOH, we
need to weigh that experience.
n
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory management in the AST parser & compiler
Neal Norwitz wrote: > For those watching, Greg's and Martin's version were almost the same. > However, Greg's version left in the memory leak, while Martin fixed it > by letting the result fall through. Actually, Greg said (correctly) that his version also fixes the leak: he assumed that FunctionDef would *consume* the references being passed (whether it is successful or not). I don't think this is a good convention, though. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
