Re: [Python-Dev] Importing .pyc in -O mode and vice versa

2006-11-05 Thread Steve Holden
[Off-list]
Brett Cannon wrote:
[...]
> 
> Hopefully my import rewrite is flexible enough that people will be able 
> to plug in their own importer/loader for the filesystem so that they can 
> tune how things like this are handled (e.g., caching what files are in a 
> directory, skipping bytecode files, etc.).
> 
I just wondered whether you plan to support other importers of the PEP 
302 style? I have been experimenting with import from database, and 
would like to see that work migrate to your rewrite if possible.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread stephen
Michael Urman writes:

 > Ah, but how do you know when that's wrong? At least under ftp:// your
 > root is often a mid-level directory until you change up out of it.
 > http:// will tend to treat the targets as roots, but I don't know that
 > there's any requirement for a /.. to be meaningless (even if it often
 > is).

ftp and http schemes both have authority ("host") components, so the
meaning of ".." path components is defined in the same way for both by
section 5 of RFC 3986.

Of course an FTP server is not bound to interpret the protocol so as
to mimic URL semantics.  But that's a different question.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Andrew Dalke
Steve:
> > I'm darned if I know. I simply know that it isn't right for http resources.

/F:
> the URI specification disagrees; an URI that starts with "../" is per-
> fectly legal, and the specification explicitly states how it should be
> interpreted.

I have looked at the spec, and can't figure out how its explanation
matches the observed urljoin results.  Steve's excerpt trimmed out
the strangest example.

>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../../")
'http://blah.com/../'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../../..")  # What?!
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c";, "../../../../")
'http://blah.com/../../'
>>>

> (it's important to realize that "urijoin" produces equivalent URI:s, not
> file names)

Both, though, are "paths".  The OP, Mik Orr, wrote:

   I agree that supporting non-filesystem directories (zip files,
   CSV/Subversion sandboxes, URLs) would be nice, but we already have a
   big enough project without that.  What constraints should a Path
   object keep in mind in order to be forward-compatible with this?

Is the answer therefore that URLs and URI behaviour should not
place constraints on a Path object becuse they are sufficiently
dissimilar from file-system paths?  Do these other non-FS hierarchical
structures have similar differences causing a semantic mismatch?

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of pairing_heap.py?

2006-11-05 Thread Paul Chiusano
Hi Martin,

Yes, I'm familiar with the heapq module, but it doesn't do all that
I'd like. The main functionality I am looking for is the ability to
adjust the value of an item in the heap and delete items from the
heap. There's a lot of heap applications where this is useful. (I
might even say most heap applications!)

To support this, the insert method needs to return a reference to an
object which I can then pass to adjust_key() and delete() methods.
It's extremely difficult to have this functionality with array-based
heaps because the index of an item in the array changes as items are
inserted and removed.

I guess I don't need a pairing heap, but of the pointer-based heaps
I've looked at, pairing heaps seem to be the simplest while still
having good complexity guarantees.

> Anyway, the immediate author of this code is Dan Stutzbach (as
> Raymond Hettinger's checkin message says); you probably should
> contact him to find out whether the project is still alive.

Okay, I'll do that. What needs to be done to move the project along
and possibly get a pairing heap incorporated into a future version of
python?

Best,
Paul

On 11/4/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Paul Chiusano schrieb:
> > I was looking for a good pairing_heap implementation and came across
> > one that had apparently been checked in a couple years ago (!).
>
> Have you looked at the heapq module? What application do you have
> for a pairing heap that you can't do readily with the heapq module?
>
> Anyway, the immediate author of this code is Dan Stutzbach (as
> Raymond Hettinger's checkin message says); you probably should
> contact him to find out whether the project is still alive.
>
> Regards,
> Martin
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Importing .pyc in -O mode and vice versa

2006-11-05 Thread Aahz
On Sun, Nov 05, 2006, "Martin v. L?wis" wrote:
> Greg Ewing schrieb:
>> Fredrik Lundh wrote:
>>> 
>>> well, from a performance perspective, it would be nice if Python looked 
>>> for *fewer* things, not more things.
>> 
>> Instead of searching for things by doing a stat call for each
>> possible file name, would it perhaps be faster to read the contents
>> of all the directories along sys.path into memory and then go
>> searching through that?
>
> That should never be better: the system will cache the directory
> blocks, also, and it will do a better job than Python will.

Maybe so, but I recently dealt with a painful bottleneck in Python code
caused by excessive stat() calls on a directory with thousands of files,
while the os.listdir() function was bogging things down hardly at all.
Granted, Python bytecode was almost certainly the cause of much of the
overhead, but I still suspect that a simple listing will be faster in C
code because of fewer system calls.  It should be a matter of profiling
before this suggestion is rejected rather than making assertions about
what "should" be happening.
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles:  boring syntax, unsurprising semantics,
few automatic coercions, etc etc.  But that's one of the things I like
about it."  --Tim Peters on Python, 16 Sep 1993
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Mike Orr
On 11/5/06, Andrew Dalke <[EMAIL PROTECTED]> wrote:
>
>I agree that supporting non-filesystem directories (zip files,
>CSV/Subversion sandboxes, URLs) would be nice, but we already have a
>big enough project without that.  What constraints should a Path
>object keep in mind in order to be forward-compatible with this?
>
> Is the answer therefore that URLs and URI behaviour should not
> place constraints on a Path object becuse they are sufficiently
> dissimilar from file-system paths?  Do these other non-FS hierarchical
> structures have similar differences causing a semantic mismatch?

This discussion has renforced my belief that os.path.join's behavior
is correct with non-initial absolute args:

os.path.join('/usr/bin', '/usr/local/bin/python')

I've used that in applications and haven't found it a burden.

Its behavior with '..' seems justifiable too, and Talin's trick of
wrapping everything in os.path.normpath is a great one.

I do think join should take more care to avoid multiple slashes
together in the middle of a path, although this is really the
responsibility of the platform library, not a generic function/method.
 Join is true to its documentation of only adding separators and never
than deleting them, but that seems like a bit of sloppiness.   On the
other hand, the filesystems don't care; I don't think anybody has
mentioned a case where it actually creates a path the filesystem can't
handle.

urljoin clearly has a different job.  When we talked about extending
path to URLs, I was thinking more in terms of opening files, fetching
resources, deleting, renaming, etc. rather than split-modify-rejoin.
A hypothetical urlpath module would clearly have to follow the URL
rules.  I don't see a contradition in supporting both URL joining
rules and having a non-initial absolute argument, just to avoid
cross-"platform" surprises.  But urlpath would also need methods to
parse the scheme and host on demand, query strings, #fragments, a
class method for building a URL from the smallest parts, etc.

As for supporting path fragments and '..' in join arguments (for
filesystem paths), it's clearly too widely used to eliminate.  Users
can voluntarily refrain from passing arguments containing separators.
For cases involving a user-supplied -- possibly hostile -- path,
either a separate method (safe_join, child) could achieve this, or a
subclass implemetation that allows only safe arguments.

Regarding pathname-manipulation methods and filesystem-access methods,
I'm not sure how workable it is to have separate objects for them.

os.mkdir(   Path("/usr/local/lib/python/Cheetah/Template.py").parent   )
Path("/usr/local/lib/python/Cheetah/Template.py").parent.mkdir()
FileAccess(
Path("/usr/local/lib/python/Cheetah/Template.py").parent   ).mkdir()

The first two are reasonable.  The third... who would want to do this
for every path?  How often would you reuse the FileAccess object?  I
typically create Path objects from configuration values and keep them
around for the entire application; e.g., data_dir.  Then I create
derived paths as necessary. I suppose if the FileAccess object has a
.path attribute, it could do double-duty so you wouldn't have to store
the path separately.  Is this what the advocates of two classes have
in mind?  With usage like this?

my_file = FileAccess(   file_access_obj.path.joinpath("my_file")   )
my_file = FileAccess(   Path(file_access_obj,path, "my_file")   )

Working on my Path implementation.  (Yes it's necessary, Glyph, at
least to me.)  It's going slow because I just got a Macintosh laptop
and am still rounding up packages to install.

-- 
Mike Orr <[EMAIL PROTECTED]>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of pairing_heap.py?

2006-11-05 Thread Josiah Carlson

"Paul Chiusano" <[EMAIL PROTECTED]> wrote:
> 
> > It is not required.  If you are careful, you can implement a pairing
> > heap with a structure combining a dictionary and list.
> 
> That's interesting. Can you give an overview of how you can do that? I
> can't really picture it. You can support all the pairing heap
> operations with the same complexity guarantees? Do you mean a linked
> list here or an array?

I mean a Python list.  The trick is to implement a sequence API that
keeps track of the position of any 'pair'.  That is, ph[posn] will
return a 'pair' object, but when you perform ph[posn] = pair, you also
update a mapping; ph.mapping[pair.value] = posn .  With a few other bits,
one can use heapq directly and get all of the features of the pairing
heap API without keeping an explicit tree with links, etc.

In terms of running time, adjust_key, delete, and extract(0) are all
O(logn), meld is O(min(n+m, mlog(n+m))), empty and peek are O(1), values
is O(n), and extract_all is O(nlogn) but uses list.sort() rather than
repeatedly pulling from the heap (heapq's documentation suggests this is
faster in terms of comparisions, but likely very much faster in terms of
actual running time).

Attached is a sample implementation using this method with a small test
example.  It may or may not use less memory than the sandbox
pairing_heap.py, and using bare lists rather than pairs may result in
less memory overall (if there exists a list "free list"), but this
should give you something to start with.

 - Josiah

> Paul
> 
> On 11/4/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> >
> > "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > > Paul Chiusano schrieb:
> > > > To support this, the insert method needs to return a reference to an
> > > > object which I can then pass to adjust_key() and delete() methods.
> > > > It's extremely difficult to have this functionality with array-based
> > > > heaps because the index of an item in the array changes as items are
> > > > inserted and removed.
> > >
> > > I see.
> >
> > It is not required.  If you are careful, you can implement a pairing
> > heap with a structure combining a dictionary and list.  It requires that
> > all values be unique and hashable, but it is possible (I developed one
> > for a commercial project).
> >
> > If other people find the need for it, I could rewrite it (can't release
> > the closed source).  It would use far less memory than the pairing heap
> > implementation provided in the sandbox, and could be converted to C if
> > desired and/or required.  On the other hand, I've found the pure Python
> > version to be fast enough for most things I've needed it for.
> >
> >  - Josiah
> >
> >


pair_heap.py
Description: Binary data
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Martin v. Löwis
Andrew Dalke schrieb:
> I have looked at the spec, and can't figure out how its explanation
> matches the observed urljoin results.  Steve's excerpt trimmed out
> the strangest example.

Unfortunately, you didn't say which of these you want explained.
As it is tedious to write down even a single one, I restrain to the
one with the What?! remark.

 urlparse.urljoin("http://blah.com/a/b/c";, "../../../..")  # What?!
> 'http://blah.com/'

Please follow me through section 5 of

http://www.ietf.org/rfc/rfc3986.txt

5.2.1: Pre-parse the Base URI
 B.scheme = "http"
 B.authority = "blah.com"
 B.path = "/a/b/c"
 B.query = undefined
 B.fragment = undefined

5.2.2: Transform References
 parse("../../../..")
 R.scheme = R.authority = R.query = R.fragment = undefined
 R.path = "../../../.."
 (strictness not relevant, R.scheme is already undefined)
 R.scheme is not defined
 R.authority is not defined
 R.path is not ""
 R.path does not start with /
 T.path = merge("/a/b/c", "../../../..")
 T.path = remove_dot_segments(T.path)
 T.authority = "blah.com"
 T.scheme = "http"
 T.fragment = undefined

5.2.3 Merge paths
 merge("/a/b/c", "../../../..") =
 (base URI does have path)
 "/a/b/../../../.."

5.2.4 Remove Dot Segments
 remove_dot_segments("/a/b/../../../..")
 1. I = "/a/b/../../../.."
O = ""
 2. A (does not apply)
B (does not apply)
C (does not apply)
D (does not apply)
E O="/a" I="/b/../../../.."
 2. E O="/a/b" I="/../../../.."
 2. C O="/a" I="/../../.."
 2. C O="" I="/../.."
 2. C O="" I="/.."
 2. C O="" I="/"
 2. E O="/" I=""
 3. Result: "/"

5.3 Component Recomposition
 result = ""
 (scheme is defined)
 result = "http:"
 (authority is defined)
 result = "http://blah.com";
 (append path)
 result = "http://blah.com/";

HTH,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Importing .pyc in -O mode and vice versa

2006-11-05 Thread Brett Cannon
On 11/5/06, Steve Holden <[EMAIL PROTECTED]> wrote:
[Off-list]Brett Cannon wrote:[...]>> Hopefully my import rewrite is flexible enough that people will be able> to plug in their own importer/loader for the filesystem so that they can> tune how things like this are handled (
e.g., caching what files are in a> directory, skipping bytecode files, etc.).>I just wondered whether you plan to support other importers of the PEP302 style? I have been experimenting with import from database, and
would like to see that work migrate to your rewrite if possible.Yep.  The main point of this rewrite is to refactor the built-in importers to be PEP 302 importers so that they can easily be left out to protect imports.  Plus I have made sure that doing something like .ptl files off the filesystem is simple (a subclass with a single method overloaded) or introducing a DB as a back-end store (should only require the importer/loader part; can even use an existing class to handle whether bytecode should be recreated or not).
Since a DB back-end is a specific use-case I even have notes in the module docstring stating how I would go about doing it.-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Importing .pyc in -O mode and vice versa

2006-11-05 Thread Martin v. Löwis
Aahz schrieb:
> Maybe so, but I recently dealt with a painful bottleneck in Python code
> caused by excessive stat() calls on a directory with thousands of files,
> while the os.listdir() function was bogging things down hardly at all.
> Granted, Python bytecode was almost certainly the cause of much of the
> overhead, but I still suspect that a simple listing will be faster in C
> code because of fewer system calls.  It should be a matter of profiling
> before this suggestion is rejected rather than making assertions about
> what "should" be happening.

That works both ways, of course: whoever implements such a patch should
also provide profiling information.

Last time I changed the importing code to reduce the number of stat
calls, I could hardly demonstrate a speedup.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Andrew Dalke
Martin:
> Unfortunately, you didn't say which of these you want explained.
> As it is tedious to write down even a single one, I restrain to the
> one with the What?! remark.
>
>  urlparse.urljoin("http://blah.com/a/b/c";, "../../../..")  # What?!
> > 'http://blah.com/'

The "What?!" is in context with the previous and next entries.  I've
reduced it to a simpler case

>>> urlparse.urljoin("http://blah.com/";, "..")
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/";, "../")
'http://blah.com/../'
>>> urlparse.urljoin("http://blah.com/";, "../..")
'http://blah.com/'

Does the result make sense to you?  Does it make
sense that the last of these is shorter than the middle
one?  It sure doesn't to me.  I thought it was obvious
that there was an error; obvious enough that I didn't
bother to track down why - especially as my main point
was to argue there are different ways to deal with
hierarchical/path-like schemes, each correct for its
given domain.

> Please follow me through section 5 of
>
> http://www.ietf.org/rfc/rfc3986.txt

The core algorithm causing the "what?!" comes from
"reduce_dot_segments", section 5.2.4.  In parallel my
3 cases should give:

5.2.4 Remove Dot Segments
 remove_dot_segments("/..")r_d_s("/../")r_d_s("/../..")

 1. I = "/.."   I="/../"I="/../.."
O = ""  O=""O=""
 2A. (does not apply) 2A. (does not apply)  2A. (does not apply)
 2B. (does not apply) 2B. (does not apply)  2B. (does not apply)
 2C. O="" I="/"   2C. O="" I="/"2C. O="" I="/.."
 2A. (does not apply) 2A. (does not apply)   .. reduces to r_d_s("/..")
 2B. (does not apply) 2B. (does not apply)  3. Result "/"
 2C. (does not apply) 2C. (does not apply)
 2D. (does not apply) 2D. (does not apply)
 2E. O="/", I=""  2E. O="/", I=""
 3. Result: "/"   3. Result "/"

My reading of the RFC 3986 says all three examples should
produce the same result.  The fact that my "what?!" comment happens
to be correct according to that RFC is purely coincidental.

Then again, urlparse.py does *not* claim to be RFC 3986 compliant.
The module docstring is

"""Parse (absolute and relative) URLs.

See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding,
UC Irvine, June 1995.
"""

I tried the same code with 4Suite, which does claim compliance, and get

>>> import Ft
>>> from Ft.Lib import Uri
>>> Uri.Absolutize("..", "http://blah.com/";)
'http://blah.com/'
>>> Uri.Absolutize("../", "http://blah.com/";)
'http://blah.com/'
>>> Uri.Absolutize("../..", "http://blah.com/";)
'http://blah.com/'
>>>

The text of it's Uri.py says

This function is similar to urlparse.urljoin() and urllib.basejoin().
Those functions, however, are (as of Python 2.3) outdated, buggy, and/or
designed to produce results acceptable for use with other core Python
libraries, rather than being earnest implementations of the relevant
specs. Their problems are most noticeable in their handling of
same-document references and 'file:' URIs, both being situations that
come up far too often to consider the functions reliable enough for
general use.
"""
# Reasons to avoid using urllib.basejoin() and urlparse.urljoin():
# - Both are partial implementations of long-obsolete specs.
# - Both accept relative URLs as the base, which no spec allows.
# - urllib.basejoin() mishandles the '' and '..' references.
# - If the base URL uses a non-hierarchical or relative path,
#or if the URL scheme is unrecognized, the result is not
#always as expected (partly due to issues in RFC 1808).
# - If the authority component of a 'file' URI is empty,
#the authority component is removed altogether. If it was
#not present, an empty authority component is in the result.
# - '.' and '..' segments are not always collapsed as well as they
#should be (partly due to issues in RFC 1808).
# - Effective Python 2.4, urllib.basejoin() *is* urlparse.urljoin(),
#but urlparse.urljoin() is still based on RFC 1808.

In searching the archives
  http://mail.python.org/pipermail/python-dev/2005-September/056152.html

Fabien Schwob:
> I'm using the module urlparse and I think I've found a bug in the
> urlparse module. When you merge an url and a link
> like"../../../page.html" with urljoin, the new url created keep some
> "../" in it. Here is an example :
>
>  >>> import urlparse
>  >>> begin = "http://www.example.com/folder/page.html";
>  >>> end = "../../../otherpage.html"
>  >>> urlparse.urljoin(begin, end)
> 'http://www.example.com/../../otherpage.html'

Guido:
> You shouldn't be giving more "../" sequences than are possible. I find
> the current behavior acceptable.

(Aparently for RFC 1808 that's a valid answer; it was an implementation
choice in how to handle that case.)

While not directly relevant, postings like John J Lee's
 http://mail.python.org/pipermail/python-bugs-lis

Re: [Python-Dev] Path object design

2006-11-05 Thread Martin v. Löwis
Andrew Dalke schrieb:
 urlparse.urljoin("http://blah.com/";, "..")
> 'http://blah.com/'
 urlparse.urljoin("http://blah.com/";, "../")
> 'http://blah.com/../'
 urlparse.urljoin("http://blah.com/";, "../..")
> 'http://blah.com/'
> 
> Does the result make sense to you?  Does it make
> sense that the last of these is shorter than the middle
> one?  It sure doesn't to me.  I thought it was obvious
> that there was an error;

That wasn't obvious at all to me. Now looking at the
examples, I agree there is an error. The middle one
is incorrect;

urlparse.urljoin("http://blah.com/";, "../")

should also give 'http://blah.com/'.

>> You shouldn't be giving more "../" sequences than are possible. I find
>> the current behavior acceptable.
> 
> (Aparently for RFC 1808 that's a valid answer; it was an implementation
> choice in how to handle that case.)

There is still some text left to that respect in 5.4.2 of RFC 3986.

> While not directly relevant, postings like John J Lee's
> http://mail.python.org/pipermail/python-bugs-list/2006-February/031875.html
>> The urlparse.urlparse() code should not be changed, for
>> backwards compatibility reasons.
> 
> strongly suggest a desire to not change that code.

This is John J Lee's opinion, of course. I don't see a reason not to fix
such bugs, or to update the implementation to the current RFCs.

> As this is not a bug, I have added the feature request 1591035 to SF
> titled "update urlparse to RFC 3986".  Nothing else appeared to exist
> on that specific topic.

Thanks. It always helps to be more specific; being less specific often
hurts. I find there is a difference between "urllib behaves
non-intuitively" and "urllib gives result A for parameters B and C,
but should give result D instead". Can you please add specific examples
to your report that demonstrate the difference between implemented
and expected behavior?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-3000] Mini Path object

2006-11-05 Thread Greg Ewing
Mike Orr wrote:

> .abspath()
> .normpath()
> .realpath()
> .splitpath()
> .relpath()
> .relpathto()

Seeing as the whole class is about paths, having
"path" in the method names seems redundant. I'd
prefer to see terser method names without any
noise characters in them.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Importing .pyc in -O mode and vice versa

2006-11-05 Thread Greg Ewing
Martin v. Löwis wrote:

> That should never be better: the system will cache the directory
> blocks, also, and it will do a better job than Python will.

If that's really the case, then why do discussions
of how improve Python startup speeds seem to focus
on the number of stat calls made?

Also, cacheing isn't the only thing to consider.
Last time I looked at the implementation of unix
file systems, they mostly seemed to do directory
lookups by linear search. Unless that's changed
a lot, I have a hard time seeing how that's
going to beat Python's highly-tuned dictionaries.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Andrew Dalke
Me [Andrew]:
> > As this is not a bug, I have added the feature request 1591035 to SF
> > titled "update urlparse to RFC 3986".  Nothing else appeared to exist
> > on that specific topic.

Martin:
> Thanks. It always helps to be more specific; being less specific often
> hurts.

So does being more specific.  I wasn't trying to report a bug in
urlparse.  I figured everyone knew the problems existed.  The code
comments say so and various back discussions on this list say so.

All I wanted to do what point out that two seemingly similar problems -
path traversal of hierarchical structures - had two different expected
behaviors.  Now I've spent entirely too much time on specifics I didn't
care about and didn't think were important.

I've also been known to do the full report and have people ignore what
I wrote because it was too long.

> I find there is a difference between "urllib behaves
> non-intuitively" and "urllib gives result A for parameters B and C,
> but should give result D instead". Can you please add specific examples
> to your report that demonstrate the difference between implemented
> and expected behavior?

No.

I consider the "../" cases to be unimportant edge cases and
I would rather people fixed the other problems highlighted in the
text I copied from 4Suite's Uri.py -- like improperly allowing a
relative URL as the base url, which I incorrectly assumed was
legit - and that others have reported on python-dev, easily found
with Google.

If I only add test cases for "../" then I believe that that's all that
will be fixed.

Given the back history of this problem and lack of followup I
also believe it won't be fixed unless someone develops a brand
new module, from scratch, which will be added to some future
Python version.  There's probably a compliance suite out there
to use for this sort of task.  I hadn't bothered to look as I am
no more proficient than others here at Google.

Finally, I see that my report is a dup.  SF search is poor.  As
Nick Coghlan reported, Paul Jimenez has a replacement for urlparse.
Summarized in
 http://www.python.org/dev/summary/2006-04-01_2006-04-15/
It was submitted in spring as a patch - SF# 1462525 at
  
http://sourceforge.net/tracker/index.php?func=detail&aid=1462525&group_id=5470&atid=305470
which I didn't find in my earlier searching.

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] idea for data-type (data-format) PEP

2006-11-05 Thread Greg Ewing
Travis Oliphant wrote:

> In NumPy, the data-type objects have function pointers to accomplish all 
> the things NumPy does quickly.

If the datatype object is to be extracted and made a
stand-alone feature, that might need to be refactored.

Perhaps there could be a facility for traversing a
datatype with a user-supplied dispatch table?

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch)

2006-11-05 Thread James Y Knight

On Nov 4, 2006, at 3:49 AM, Martin v. Löwis wrote:

> Notice that at least the following objects are shared between
> interpreters, as they are singletons:
> - None, True, False, (), "", u""
> - strings of length 1, Unicode strings of length 1 with ord < 256
> - integers between -5 and 256
> How do you deal with the reference counters of these objects?
>
> Also, type objects (in particular exception types) are shared between
> interpreters. These are mutable objects, so you have actually
> dictionaries shared between interpreters. How would you deal with  
> these?

All these should be dealt with by making them per-interpreter  
singletons, not per address space. That should be simple enough,  
unfortunately the margins of this email are too small to describe  
how. ;) Also it'd be backwards incompatible with current extension  
modules.

James
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch)

2006-11-05 Thread Guido van Rossum
On 11/5/06, James Y Knight <[EMAIL PROTECTED]> wrote:
>
> On Nov 4, 2006, at 3:49 AM, Martin v. Löwis wrote:
>
> > Notice that at least the following objects are shared between
> > interpreters, as they are singletons:
> > - None, True, False, (), "", u""
> > - strings of length 1, Unicode strings of length 1 with ord < 256
> > - integers between -5 and 256
> > How do you deal with the reference counters of these objects?
> >
> > Also, type objects (in particular exception types) are shared between
> > interpreters. These are mutable objects, so you have actually
> > dictionaries shared between interpreters. How would you deal with
> > these?
>
> All these should be dealt with by making them per-interpreter
> singletons, not per address space. That should be simple enough,
> unfortunately the margins of this email are too small to describe
> how. ;) Also it'd be backwards incompatible with current extension
> modules.

I don't know how you define simple. In order to be able to have
separate GILs  you have to remove *all* sharing of objects between
interpreters. And all other data structures, too. It would probably
kill performance too, because currently obmalloc relies on the GIL.

So I don't see much point in continuing this thread.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Feature Request: Py_NewInterpreter to create separate GIL (branch)

2006-11-05 Thread Talin
Guido van Rossum wrote:
> I don't know how you define simple. In order to be able to have
> separate GILs  you have to remove *all* sharing of objects between
> interpreters. And all other data structures, too. It would probably
> kill performance too, because currently obmalloc relies on the GIL.

Nitpick: You have to remove all sharing of *mutable* objects. One day, 
when we get "pure" GC with no refcounting, that will be a meaningful 
distinction. :)

-- Talin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Importing .pyc in -O mode and vice versa

2006-11-05 Thread Martin v. Löwis
Greg Ewing schrieb:
>> That should never be better: the system will cache the directory
>> blocks, also, and it will do a better job than Python will.
> 
> If that's really the case, then why do discussions
> of how improve Python startup speeds seem to focus
> on the number of stat calls made?

A stat call will not only look at the directory entry, but also
look at the inode. This will require another disk access, as the
inode is at a different location of the disk.

> Also, cacheing isn't the only thing to consider.
> Last time I looked at the implementation of unix
> file systems, they mostly seemed to do directory
> lookups by linear search. Unless that's changed
> a lot, I have a hard time seeing how that's
> going to beat Python's highly-tuned dictionaries.

It depends on the file system you are using. An NTFS directory
lookup is a B-Tree search; NT has not been doing linear search
since its introduction 15 years ago. Linux only recently started
doing tree-based directories with the introduction of ext4.
However, Linux' in-memory directory cache (the dcache) doesn't
need to scan over the directory block structure; not sure whether
it uses linear search still.

For a small directory, the difference is likely negligible. For
a large directory, the cost of reading in the entire directory
might be higher than the savings gained from not having to
search it. Also, if we do our own directory caching, the question
is when to invalidate the cache.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path object design

2006-11-05 Thread Martin v. Löwis
Andrew Dalke schrieb:
>> I find there is a difference between "urllib behaves
>> non-intuitively" and "urllib gives result A for parameters B and C,
>> but should give result D instead". Can you please add specific examples
>> to your report that demonstrate the difference between implemented
>> and expected behavior?
> 
> No.
> 
> I consider the "../" cases to be unimportant edge cases and
> I would rather people fixed the other problems highlighted in the
> text I copied from 4Suite's Uri.py -- like improperly allowing a
> relative URL as the base url, which I incorrectly assumed was
> legit - and that others have reported on python-dev, easily found
> with Google.

It still should be possible to come up with examples for these as
well, no? For example, if you pass a relative URI as the base
URI, what would you like to see happen?

> If I only add test cases for "../" then I believe that that's all that
> will be fixed.

That's true. Actually, it's probably not true; it will only get fixed
if some volunteer contributes a fix.

> Finally, I see that my report is a dup.  SF search is poor.  As
> Nick Coghlan reported, Paul Jimenez has a replacement for urlparse.
> Summarized in
>  http://www.python.org/dev/summary/2006-04-01_2006-04-15/
> It was submitted in spring as a patch - SF# 1462525 at
>   
> http://sourceforge.net/tracker/index.php?func=detail&aid=1462525&group_id=5470&atid=305470
> which I didn't find in my earlier searching.

So do you think this patch meets your requirements?

This topic (URL parsing) is not only inherently difficult to
implement, it is just as tedious to review. Without anybody
reviewing the contributed code, it's certain that it will never
be incorporated.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com