Re: [Python-Dev] Memory management in the AST parser & compiler

2005-11-23 Thread Thomas Lee
Neil Schemenauer wrote:

>Fredrik Lundh <[EMAIL PROTECTED]> wrote:
>  
>
>>Thomas Lee wrote:
>>
>>
>>
>>>Even if it meant we had just one function call - one, safe function call
>>>that deallocated all the memory allocated within a function - that we
>>>had to put before each and every return, that's better than what we
>>>have.
>>>  
>>>
>>alloca?
>>
>>
>
>Perhaps we should use the memory management technique that the rest
>of Python uses: reference counting.  I don't see why the AST
>structures couldn't be PyObjects.
>
>  Neil
>
>  
>
I'm +1 for reference counting. It's going to be a little error prone 
initially (certainly much less error prone than the current system in 
the long run), but the pooling/arena idea is going to screw with all 
sorts of stuff within the AST and possibly in bits of Python/compile.c 
too. At least, all my attempts wound up looking that way :)

Cheers,
Tom

>___
>Python-Dev mailing list
>[email protected]
>http://mail.python.org/mailman/listinfo/python-dev
>Unsubscribe: 
>http://mail.python.org/mailman/options/python-dev/krumms%40gmail.com
>
>  
>

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 302, PEP 338 and imp.getloader (was Re: a Python interface for the AST (WAS: DRAFT: python-dev...)

2005-11-23 Thread Nick Coghlan
Phillip J. Eby wrote:
> At 06:32 PM 11/22/2005 -0800, Brett Cannon wrote:
>>>  Hmm, it would be nice to give a function a module
>>> name (like from an import statement) and have Python resolve it using
>>> the normal sys.path iteration.
>>>
>> Yep, import path -> filename path would be cool.
> 
> Zipped and frozen modules don't have filename paths, so I'd personally 
> rather see fewer stdlib modules making the assumption that modules are 
> files.  Instead, extensions to the PEP 302 loader protocol should be used 
> to support introspection, assuming there aren't already equivalent 
> capabilities available.  For example, PEP 302 allows a 'get_source()' 
> method on loaders, and I believe the zipimport loader supports that.  (I 
> don't know about frozen modules.)
> 
> The main barrier to this being really usable is the absence of loader 
> objects for the built-in import process.  This was proposed by PEP 302, but 
> never actually implemented, probably due to time constraints on the Python 
> 2.3 release schedule.
> 
> It's relatively easy to implement this "missing loader class" in Python, 
> though, and in fact the PEP 302 regression test in the stdlib does exactly 
> that.  Some work, however, would be required to port this to C and expose 
> it from an appropriate module (imp?).

Prompted by this, I finally got around to reading PEP 302 to see how it 
related to PEP 338 (which is intended to fix the current limitations of the 
'-m' switch by providing a Python fallback when the basic C code can't find 
the module to be run).

The key thing that is missing is the "imp.getloader" functionality discussed 
at the end of PEP 302.

Using that functionality and the exec statement, PEP 338 could easily be 
modified to support any module accessed via a loader which supports get_code() 
(and it could probably also get rid of all of the current cruft dealing with 
normal filesystem packages).

So with that in mind, I'm thinking of updating PEP 338 to propose the following:

1. A new pure Python module called "runpy"

2. A function called "runpy.execmodule" that is very similar to execfile, but 
takes a module reference instead of a filename. It will NOT support 
modification of the caller's namespace (based on recent discussions regarding 
the exec statement). argv[0] and the name __file__ in the execution dictionary 
will be set to the file name for real files (those of type PY_SOURCE or 
PY_COMPILED), and the module reference otherwise. An optional argument will 
permit argv[0] (and __file__) to be forced to a specific value.**

3. A function called "runpy.get_source" that, given a module reference, 
retrieves the source code for that module via loader.get_source()

4. A function called "runpy.get_code" that, given a module reference, 
retrieves the code object for that module via loader.get_code()

5. A function called "runpy.is_runnable" that, given a module reference, 
determines if execmodule will work on that module (e.g. by checking that the 
loader provides the getcode method, that loader.is_package returns false, etc)

6. If invoked as a script, runpy interprets argv[1] as the module to run

7. If the '-m' switch fails to find a module, it invokes runpy as a fallback.

To make PEP 338 independent of the C implementation of imp.getloader for PEP 
302 being finished, it would propose two private elements in runpy: 
runpy._getloader and runpy._StandardImportMetaHook

If imp.getloader was available, it would be assigned to runpy._getloader, 
otherwise runpy would fall back on the Python equivalents.

** I'm open to suggestions on how to deal with argv[0] and __file__. They 
should be set to whatever __file__ would be set to by the module loader, but 
the Importer Protocol in PEP 302 doesn't seem to expose that information. The 
current proposal is a compromise that matches the existing behaviour of -m 
(which supports scripts like regrtest.py) while still giving a meaningful 
value for scripts which are not part of the normal filesystem.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] urlparse brokenness

2005-11-23 Thread Paul Jimenez

It is my assertion that urlparse is currently broken.  Specifically, I 
think that urlparse breaks an abstraction boundary with ill effect.

In writing a mailclient, I wished to allow my users to specify their
imap server as a url, such as 'imap://user:[EMAIL PROTECTED]:port/'. Which
worked fine. I then thought that the natural extension to support
configuration of imapssl would be 'imaps://user:[EMAIL PROTECTED]:port/'
which failed - user:[EMAIL PROTECTED]:port got parsed as the *path* of
the URL instead of the network location. It turns out that urlparse
keeps a table of url schemes that 'use netloc'... that is to say,
that have a 'user:[EMAIL PROTECTED]:port' part to their URL. I think this
'special knowledge' about particular schemes 1) breaks an abstraction
boundary by having a function whose charter is to pull apart a
particularly-formatted string behave differently based on the meaning of
the string instead of the structure of it and 2) fails to be extensible
or forward compatible due to hardcoded 'magic' strings - if schemes were
somehow 'registerable' as 'netloc using' or not, then this objection
might be nullified, but the previous objection would still stand.

So I propose that urlsplit, the main offender, be replaced with something
that looks like:

def urlsplit(url, scheme='', allow_fragments=1, default=('','','','','')):
"""Parse a URL into 5 components:
:///?#
Return a 5-tuple: (scheme, netloc, path, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes."""
key = url, scheme, allow_fragments, default
cached = _parse_cache.get(key, None)
if cached:
return cached
if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth
clear_cache()

if "://" in url:
uscheme, npqf = url.split("://", 1)
else:
uscheme = scheme
if not uscheme:
uscheme = default[0]
npqf = url
pathidx = npqf.find('/')
if pathidx == -1:  # not found
netloc = npqf
path, query, fragment = default[1:4]
else:
netloc = npqf[:pathidx]
pqf = npqf[pathidx:]
if '?' in pqf:
path, qf = pqf.split('?',1)
else:
path, qf = pqf, ''.join(default[3:5])
if ('#' in qf) and allow_fragments:
query, fragment = qf.split('#',1)
else:
query, fragment = default[3:5]
tuple = (uscheme, netloc, path, query, fragment)
_parse_cache[key] = tuple
return tuple

Note that I'm not sold on the _parse_cache, but I'm assuming it was there
for a reason so I'm leaving that functionality as-is.

If this isn't the right forum for this discussion, or the right place to 
submit code, please let me know.  Also, please cc: me directly on responses
as I'm not subscribed to the firehose that is python-dev.

  --pj

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urlparse brokenness

2005-11-23 Thread Aahz
On Tue, Nov 22, 2005, Paul Jimenez wrote:
>
> If this isn't the right forum for this discussion, or the right place
> to submit code, please let me know.  Also, please cc: me directly on
> responses as I'm not subscribed to the firehose that is python-dev.

This is the right forum for discussion.  You should post your patch to
SourceForge *before* starting a discussion on python-dev, including a
link to the patch in your post.  It is not essential, but it is certainly
a courtesy to subscribe to python-dev for the duration of the discussion;
you can feel feel to filter threads you're not interested in.
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 302, PEP 338 and imp.getloader (was Re: a Python interface for the AST (WAS: DRAFT: python-dev...)

2005-11-23 Thread Phillip J. Eby
At 11:51 PM 11/23/2005 +1000, Nick Coghlan wrote:
>The key thing that is missing is the "imp.getloader" functionality discussed
>at the end of PEP 302.

This isn't hard to implement per se; setuptools for example has a 
'get_importer' function, and going from importer to loader is simple:

def get_importer(path_item):
 """Retrieve a PEP 302 "importer" for the given path item

 If there is no importer, this returns a wrapper around the builtin import
 machinery.  The returned importer is only cached if it was created by a
 path hook.
 """
 try:
 importer = sys.path_importer_cache[path_item]
 except KeyError:
 for hook in sys.path_hooks:
 try:
 importer = hook(path_item)
 except ImportError:
 pass
 else:
 break
 else:
 importer = None

 sys.path_importer_cache.setdefault(path_item,importer)
 if importer is None:
 try:
 importer = ImpWrapper(path_item)
 except ImportError:
 pass
 return importer

So with the above function you could do something like:

def get_loader(fullname, path):
 for path_item in path:
 try:
 loader = get_importer(path_item).find_module(fullname)
 if loader is not None:
 return loader
 except ImportError:
 continue
 else:
 return None

in order to implement the rest.


>** I'm open to suggestions on how to deal with argv[0] and __file__. They
>should be set to whatever __file__ would be set to by the module loader, but
>the Importer Protocol in PEP 302 doesn't seem to expose that information. The
>current proposal is a compromise that matches the existing behaviour of -m
>(which supports scripts like regrtest.py) while still giving a meaningful
>value for scripts which are not part of the normal filesystem.

Ugh.  Those are tricky, no question.  I can think of several simple answers 
for each, all of which are wrong in some way.  :)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a Python interface for the AST (WAS: DRAFT: python-dev...)

2005-11-23 Thread Greg Ewing
Brett Cannon wrote:

> There are two problems to this topic; how to
> get the AST structs into Python objects and how to allow Python code
> to modify the AST before bytecode emission

I'm astounded to hear that the AST isn't made from
Python objects in the first place. Is there a particular
reason it wasn't done that way?

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | A citizen of NewZealandCorp, a   |
Christchurch, New Zealand  | wholly-owned subsidiary of USA Inc.  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] urlparse brokenness

2005-11-23 Thread Mike Brown
Paul Jimenez wrote:
> So I propose that urlsplit, the main offender, be replaced with something
> that looks like:
> 
> def urlsplit(url, scheme='', allow_fragments=1, default=('','','','','')):

+1 in principle.

You should probably do a
global _parse_cache

and add 'is not None' after 'if cached'.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a Python interface for the AST (WAS: DRAFT: python-dev...)

2005-11-23 Thread Brett Cannon
On 11/23/05, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Brett Cannon wrote:
>
> > There are two problems to this topic; how to
> > get the AST structs into Python objects and how to allow Python code
> > to modify the AST before bytecode emission
>
> I'm astounded to hear that the AST isn't made from
> Python objects in the first place. Is there a particular
> reason it wasn't done that way?
>

I honestly don't know, Greg.  All of the structs are generated by
Parser/asdl_c.py which reads in the AST definition from
Parser/Python.asdl .  The code that is used to allocate and initialize
the structs is in Python/Python-ast.c and is also auto-generated by
Parser/asdl_c.py .

I am guessing here, but it might have to do with type safety.  Some
nodes can be different kinds of subnodes (like the stmt node) and thus
are created using a single struct and a bunch unions internally.  So
there is some added security that stuff is being done correctly.

Otherwise memory is the only other reason I can think of.  Or Jeremy
just didn't think of doing it that way when this was all started years
ago.  =)  But since it is all auto-generated it should be doable to
make them Python objects.

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com