Re: [Python-Dev] PEP 428 (pathlib) now committed
On 24 Nov 2013 01:21, "Antoine Pitrou" wrote: > > On Sat, 23 Nov 2013 15:32:58 +0200 > Serhiy Storchaka wrote: > > 22.11.13 18:44, Antoine Pitrou написав(ла): > > > I've pushed pathlib to the repository. I'm hopeful there won't be > > > new buildbot failures because of it, but still, there may be some > > > platform-specific issues I'm unaware of. > > > > Congratuate Antoine! > > > > Does it means that issues #11344 (Add os.path.splitpath(path) function) > > [1] and #13968 (Support recursive globs) [2] have no chance? Both are > > ready for commit and waits for reviews almost a year. Are the os.path > > and glob modules deprecated now? > > They are not deprecated, no. I am not terribly interested in reviewing > those patches, personally, but other people may be :-) Right, pathlib is an abstraction layer on top of the lower level implementation APIs, rather than a replacement for them. Cheers, Nick. > > Regards > > Antoine. > > > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] buildbot's are needlessly compiling -O0
On 24 Nov 2013 17:15, "Gregory P. Smith" wrote: > > our buildbots are setup to configure --with-pydebug which also unfortunately causes them to compile with -O0... this results in a python binary that is excruciatingly slow and makes the entire test suite run take a long time. > > given that nobody is ever going to run a gdb or another debugger on the buildbot generated transient binaries themselves how about we speed all of the buildbot's up by adding CFLAGS=-O2 to the configure command line? The main problem is that doing so would disable test_gdb. Humans don't run gdb on those binaries, but the test suite does. I agree it would be nice to figure out a way to run most of the tests on an optimised build, though. Cheers, Nick. > > Sure, the compile step will take a bit longer but that is dwarfed by the test time as it is: > > http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.x/builds/3224 > http://buildbot.python.org/all/builders/ARMv7%203.x/builds/7 > http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/639 > > It should dramatically decrease the turnaround latency for buildbot results. > > -gps > > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Disable annoying tests which doesn't work optimized pickles.
On Sun, 24 Nov 2013 05:58:24 +0100 (CET) alexandre.vassalotti wrote: > http://hg.python.org/cpython/rev/a68c303eb8dc > changeset: 87486:a68c303eb8dc > user:Alexandre Vassalotti > date:Sat Nov 23 20:58:24 2013 -0800 > summary: > Disable annoying tests which doesn't work optimized pickles. We should probably disable them only on optimized pickles, then :-) Regards Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] buildbot's are needlessly compiling -O0
On Sun, Nov 24, 2013 at 12:43 PM, Nick Coghlan wrote: > > On 24 Nov 2013 17:15, "Gregory P. Smith" wrote: >> >> our buildbots are setup to configure --with-pydebug which also >> unfortunately causes them to compile with -O0... this results in a python >> binary that is excruciatingly slow and makes the entire test suite run take >> a long time. >> >> given that nobody is ever going to run a gdb or another debugger on the >> buildbot generated transient binaries themselves how about we speed all of >> the buildbot's up by adding CFLAGS=-O2 to the configure command line? > > The main problem is that doing so would disable test_gdb. Humans don't run > gdb on those binaries, but the test suite does. Is there a danger that the code tested under GDB is not tested in "natural environment" for pythons? -- anatoly t. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] buildbot's are needlessly compiling -O0
On Sun, Nov 24, 2013 at 6:12 AM, anatoly techtonik wrote: > On Sun, Nov 24, 2013 at 12:43 PM, Nick Coghlan wrote: > > > > On 24 Nov 2013 17:15, "Gregory P. Smith" wrote: > >> > >> our buildbots are setup to configure --with-pydebug which also > >> unfortunately causes them to compile with -O0... this results in a > python > >> binary that is excruciatingly slow and makes the entire test suite run > take > >> a long time. > >> > >> given that nobody is ever going to run a gdb or another debugger on the > >> buildbot generated transient binaries themselves how about we speed all > of > >> the buildbot's up by adding CFLAGS=-O2 to the configure command line? > > > > The main problem is that doing so would disable test_gdb. Humans don't > run > > gdb on those binaries, but the test suite does. > > Is there a danger that the code tested under GDB is not tested in > "natural environment" for pythons? > -- > What are you talking about? Have you actually looked at test_gdb before writing this email? Eli ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python 3.4.0b1 is now tagged, feature-freeze is now in effect
Please refrain from checking in any new features to Python trunk until after the 3.4 release branch is created (which will be a few months). Instead, let's concentrate our efforts on polishing Python 3.4 until it's the best and most-defect-free release yet! Thanks, //arry/ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] buildbot's are needlessly compiling -O0
On Nov 23, 2013, at 11:13 PM, Gregory P. Smith wrote: >our buildbots are setup to configure --with-pydebug which also >unfortunately causes them to compile with -O0... this results in a python >binary that is excruciatingly slow and makes the entire test suite run take >a long time. It would be fine(-ish) to add this for improved buildbot performance, but please do not change this for default --with-pydebug builds. When you're debugging Python, -O0 just makes so much more sense. -Barry ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] can someone create a buildbot slave name & password for me?
On Nov 23, 2013, at 11:01 PM, Gregory P. Smith wrote: >http://buildbot.python.org/all/buildslaves/gps-ubuntu-exynos5-armv7l Cool thanks. Antoine, do you still want or need my buildbot, or can I take it off-line? (FWIW, because the hardware is no longer supported, it's pretty much stuck at Ubuntu 12.10.) -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] can someone create a buildbot slave name & password for me?
On Sun, 24 Nov 2013 11:47:42 -0500 Barry Warsaw wrote: > On Nov 23, 2013, at 11:01 PM, Gregory P. Smith wrote: > > >http://buildbot.python.org/all/buildslaves/gps-ubuntu-exynos5-armv7l > > Cool thanks. Antoine, do you still want or need my buildbot, or can I take it > off-line? (FWIW, because the hardware is no longer supported, it's pretty > much stuck at Ubuntu 12.10.) Well, your buildbot has already been off-line for something like a month :-) http://buildbot.python.org/all/buildslaves/warsaw-ubuntu-arm If the hardware is not supported anymore, and since the machine was rather slow, I agree it's ok to let it go. Regards Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] can someone create a buildbot slave name & password for me?
On Nov 24, 2013, at 06:02 PM, Antoine Pitrou wrote: >If the hardware is not supported anymore, and since the machine was >rather slow, I agree it's ok to let it go. Done! (The machine's name was 'hope', so now we're hope-less :). -Barry ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 428 - pathlib API questions
PEP 428 looks nice. Thanks, Antoine!
I have a couple of questions about the module name and API. I think
I've read through most of the previous discussion, but may have missed
some, so please point me to the right place if there have already been
discussions about these things.
1) Someone on reddit.com/r/Python asked "Is the import going to be
'pathlib'? I thought the renaming going on of std lib things with the
transition to Python 3 sought to remove the spurious usage of
appending 'lib' to libs?" I wondered about this too. Has this been
discussed/answered?
2) I think the operation of "suffix" and "suffixes" is good, but not
so much the name. I saw Ben Finney's original suggestion about
multiple extensions etc
(https://mail.python.org/pipermail/python-ideas/2012-October/016437.html).
However, it seems there was no further discussion about why not
"extension" and "extensions"? I have never heard a filename extension
being called a "suffix". I know it is a suffix in the sense of the
English word, but I've never heard it called that in this context, and
I think context is important. Put another way, "extension" is obvious
and guessable, "suffix" isn't.
3) Obviously pathlib isn't going in the stdlib in Python 2.x, but I'm
wondering about writing portable code when you want the string version
of the path. In Python 3.x you'll call str(path_obj), but in Python
2.x that will fail if the path has unicode chars in it, and you'll
need to use unicode(path_obj), which of course doesn't work 3.x. Is
this just a fact of life, or would .str() or .as_string() help for
2.x/3.x portability?
4) Is path_obj.glob() recursive? In the PEP it looks like it is if the
pattern starts with '**', but in the pep428 branch of the code there
are both glob() and rglob() functions. I've never seen the ** syntax
before (though admittedly I'm a Windows dev), and much prefer the
explicitness of having two functions, or maybe even better,
path_obj.glob('*.py', recursive=True).
Seems much more Pythonic to provide an actual argument (or different
function) for this change in behaviour, rather than stuffing the
"recursive flag" inside the pattern string.
Has this ship already sailed with http://bugs.python.org/issue13968?
Which I also think should also be rglob(pattern) or glob(pattern,
recursive=True). Of course, if this ship has already sailed, it's
definitely better for pathlib's glob to match glob.glob.
Thanks,
Ben
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [RELEASED] Python 3.4.0b1
On behalf of the Python development team, it's my privilege to announce the first beta release of Python 3.4. This is a preview release, and its use is not recommended for production settings. Python 3.4 includes a range of improvements of the 3.x series, including hundreds of small improvements and bug fixes. Major new features and changes in the 3.4 release series include: * PEP 435, a standardized "enum" module * PEP 436, a build enhancement that will help generate introspection information for builtins * PEP 442, improved semantics for object finalization * PEP 443, adding single-dispatch generic functions to the standard library * PEP 445, a new C API for implementing custom memory allocators * PEP 446, changing file descriptors to not be inherited by default in subprocesses * PEP 450, a new "statistics" module * PEP 453, a bundled installer for the *pip* package manager * PEP 456, a new hash algorithm for Python strings and binary data * PEP 3154, a new and improved protocol for pickled objects * PEP 3156, a new "asyncio" module, a new framework for asynchronous I/O Python 3.4 is now in "feature freeze", meaning that no new features will be added. The final release is projected for late February 2014. To download Python 3.4.0b1 visit: http://www.python.org/download/releases/3.4.0/ Please consider trying Python 3.4.0b1 with your code and reporting any new issues you notice to: http://bugs.python.org/ Enjoy! -- Larry Hastings, Release Manager larry at hastings.org (on behalf of the entire python-dev team and 3.4's contributors) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
Hi folks, I decided to start another thread for my thoughts on the interaction between pathlib (Antoine's new PEP 428), issue 11406 (proposal for a directory iterator returning stat-like info), and my own scandir library, which implements something along the lines of issue 11406. My scandir library (https://github.com/benhoyt/scandir) is something I've been working on for a while -- it provides a scandir() function which uses the OS's directory iterator functions to expose as much stat-like information as possible (readdir and FindFirstFile etc). This way functions like os.walk() can use the info (particularly "is_dir()") and not require tons of extra calls to os.stat(). This provides a huge speed boost for os.walk() in many cases: I've seen 3-4x on Linux, and up to 20x on Windows. (It depends on various things, not least of which is Windows' weird stat caching -- if I run my scandir benchmark "fresh", I get os.walk() running 8-9 times as fast as the built-in one. But if I run it after an un-hibernate, suddenly it runs 18-20 times as fast as the built-in one. Either way, huge gains, especially on Windows.) scandir.scandir() returns a DirEntry object, which has .isdir(), .isfile(), .islink(), and .lstat() attributes. Look familiar? When I was reading PEP 428 and saw .is_file(), .is_dir(), and .stat(), I thought -- surely I can merge this with pathlib and Path objects. The first thing I can do to scandir is rename my isdir() type attributes to match PEP 428's, so that DirEntry quacks like a Path object where it can. However, I'm wondering if I can change scandir to return actual Path objects. Or better, because Path already helpfully provides iterdir() which yields Path objects, and Path objects have .is_dir() etc, can scandir()-like behaviour simply work out-of-the-box? This mainly depends on how Path is going to cache stat information. If it caches it, then this will just work. Sounds like Guido's opinion was that both cached and uncached use cases are important, but that it should be very clear which one you're getting. I personally like the .stat() and .restat() idea. The other related thing is that DirEntry only provides .lstat(), because it's providing stat-like info without following links. Note in this context that it's not just "network filesystems" on which stat() is slow (https://mail.python.org/pipermail/python-dev/2013-May/125805.html). It's quite slow in Windows under various conditions too. See also Nick Coghlan's post about a DirEntry-style object on the issue 11406 thread: https://mail.python.org/pipermail/python-dev/2013-May/126148.html Thoughts and suggestions for how to merge scandir with pathlib's approach? It's important to me that pathlib's API doesn't cut itself off from a more efficient implement of the ideas from issue 11406 and scandir... Thanks, Ben. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib API questions
Hello, On Mon, 25 Nov 2013 11:00:09 +1300 Ben Hoyt wrote: > > 1) Someone on reddit.com/r/Python asked "Is the import going to be > 'pathlib'? I thought the renaming going on of std lib things with the > transition to Python 3 sought to remove the spurious usage of > appending 'lib' to libs?" I wondered about this too. Has this been > discussed/answered? Well, "path" is much too common already, and it's an obvious variable name for a filesystem path, so "pathlib" is better to avoid name clashes. > 2) I think the operation of "suffix" and "suffixes" is good, but not > so much the name. I saw Ben Finney's original suggestion about > multiple extensions etc > (https://mail.python.org/pipermail/python-ideas/2012-October/016437.html). > > However, it seems there was no further discussion about why not > "extension" and "extensions"? I have never heard a filename extension > being called a "suffix". I know it is a suffix in the sense of the > English word, but I've never heard it called that in this context, and > I think context is important. Put another way, "extension" is obvious > and guessable, "suffix" isn't. Well, perhaps :-), but nobody opposed suffix and suffixes at the time. Note the API is provisional, so we can still make it change, but obviously the barrier for changes is higher now that the PEP is accepted and the beta has been cut. > 3) Obviously pathlib isn't going in the stdlib in Python 2.x, but I'm > wondering about writing portable code when you want the string version > of the path. In Python 3.x you'll call str(path_obj), but in Python > 2.x that will fail if the path has unicode chars in it, and you'll > need to use unicode(path_obj), which of course doesn't work 3.x. The behaviour of unicode paths in Python 2 is erratic (system-dependent). pathlib can't really fix it: Python 2 doesn't know about a well-defined filesystem encoding. > 4) Is path_obj.glob() recursive? This is documented: http://docs.python.org/dev/library/pathlib.html#pathlib.Path.glob http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob > Seems much more Pythonic to provide an actual argument (or different > function) for this change in behaviour, rather than stuffing the > "recursive flag" inside the pattern string. It's not a flag, it's a different wildcard. This allows e.g. a library function to call glob() and users to pass a recursive or non-recursive pattern as they wish. > Has this ship already sailed with http://bugs.python.org/issue13968? This issue is still open, so no :-) Regards Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [RELEASED] Python 3.4.0b1
On 11/24/2013 02:00 PM, Larry Hastings wrote: Python 3.4 includes a range of improvements of the 3.x series, including hundreds of small improvements and bug fixes. Major new features and changes in the 3.4 release series include: Whoops, sorry, I missed a couple of PEPs there: * PEP 428, a "pathlib" module providing object-oriented filesystem paths * PEP 451, standardizing module metadata for Python's module import system * PEP 454, a new "tracemalloc" module for tracing Python memory allocations They're on the web site already, and they'll be in the next announcement. Sorry for the oversight! //arry/ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
On Mon, 25 Nov 2013 11:20:08 +1300 Ben Hoyt wrote: > > This mainly depends on how Path is going to cache stat information. If > it caches it, then this will just work. Sounds like Guido's opinion > was that both cached and uncached use cases are important, but that it > should be very clear which one you're getting. I personally like the > .stat() and .restat() idea. Right now, pathlib doesn't cache. Guido decided it was safer to start off like that, and perhaps later we can add some optional caching. One reason caching didn't go in is that it's not clear which API is best. Working on pluggin scandir() into pathlib would actually help choosing a stat-caching API. (or, rather, lstat-caching...) > The other related thing is that DirEntry only provides .lstat(), > because it's providing stat-like info without following links. Path.is_dir() and friends use stat(), i.e. they inform you about whether a symlink's target is a directory (not the symlink itself). Of course, if the DirEntry says the path is a symlink, Path.is_dir() could then run stat() to find out about the target. Do you plan to propose scandir() for inclusion in the stdlib? Regards Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib API questions
> Well, "path" is much too common already, and it's an obvious variable
> name for a filesystem path, so "pathlib" is better to avoid name
> clashes.
Yep, that makes total sense, thanks.
>> However, it seems there was no further discussion about why not
>> "extension" and "extensions"? I have never heard a filename extension
>> being called a "suffix". I know it is a suffix in the sense of the
>> English word, but I've never heard it called that in this context, and
>> I think context is important. Put another way, "extension" is obvious
>> and guessable, "suffix" isn't.
>
> Well, perhaps :-), but nobody opposed suffix and suffixes at the time.
> Note the API is provisional, so we can still make it change, but
> obviously the barrier for changes is higher now that the PEP is
> accepted and the beta has been cut.
Okay. I won't push hard :-) as "suffix" isn't terrible, but has anyone
else never (or rarely) heard the term "suffix" applied to filename
extensions?
>> 3) Obviously pathlib isn't going in the stdlib in Python 2.x, but I'm
>> wondering about writing portable code when you want the string version
>> of the path. In Python 3.x you'll call str(path_obj), but in Python
>> 2.x that will fail if the path has unicode chars in it, and you'll
>> need to use unicode(path_obj), which of course doesn't work 3.x.
>
> The behaviour of unicode paths in Python 2 is erratic
> (system-dependent). pathlib can't really fix it: Python 2 doesn't know
> about a well-defined filesystem encoding.
Fair enough.
>> 4) Is path_obj.glob() recursive?
>
> This is documented:
> http://docs.python.org/dev/library/pathlib.html#pathlib.Path.glob
> http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob
>
>> Seems much more Pythonic to provide an actual argument (or different
>> function) for this change in behaviour, rather than stuffing the
>> "recursive flag" inside the pattern string.
>
> It's not a flag, it's a different wildcard. This allows e.g. a library
> function to call glob() and users to pass a recursive or non-recursive
> pattern as they wish.
Okay, just saw those docs now -- thanks. Fair enough re "it's a
different wildcard". At the least I don't think there should be two
ways to do it -- in other words, either rglob() or glob('**'), both
seems very un-PEP 20 to me. My preference is rglob(), but
glob(recursive=True) would be fine too.
>> Has this ship already sailed with http://bugs.python.org/issue13968?
>
> This issue is still open, so no :-)
Same goes for this issue -- there should be OOWTDI, and my preference
is rglob() or glob(recursive=True). But maybe issue 13968's behaviour
can be determined by pathlib's now that pathlib is the one getting
done first.
Thanks,
Ben.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
> Right now, pathlib doesn't cache. Guido decided it was safer to start > off like that, and perhaps later we can add some optional caching. > > One reason caching didn't go in is that it's not clear which API is > best. Working on pluggin scandir() into pathlib would actually help > choosing a stat-caching API. > > (or, rather, lstat-caching...) > >> The other related thing is that DirEntry only provides .lstat(), >> because it's providing stat-like info without following links. > > Path.is_dir() and friends use stat(), i.e. they inform you about > whether a symlink's target is a directory (not the symlink itself). Of > course, if the DirEntry says the path is a symlink, Path.is_dir() could > then run stat() to find out about the target. > > Do you plan to propose scandir() for inclusion in the stdlib? Yes, I was hoping to propose adding "os.scandir() -> yields DirEntry objects" for inclusion into the stdlib, and also speed up os.walk() as a result. However, pathlib's API with .is_dir() and .lstat() etc are so close to DirEntry, I'd be much keener to roll up the scandir functionality into pathlib's iterdir(), as that's already going in the standard library, and iterdir() already returns Path objects. I'm just not sure it's possible or useful without stat caching. We could do Path.lstat(cached=True), but we'd also really want is_dir(cached=True), so that API kinda sucks. Alternatively you could have iterdir(cached=True) return PathWithCachedStat style objects -- probably better, but kinda messy. For these reasons, I would much prefer stat caching on by default in Path -- in my experience, the cached behaviour is desired much much more often than the non-cached. I've written directory walkers more often than I can count, whereas I've maybe only once written a long-running process that needs to re-stat, and if it's clearly documented as cached, then it's super easy to call restat(), or create a new Path instance to get new stat info. This would allow iterdir() to take advantage of the huge performance improvements you can get when walking directories. Guido, are you at all open to reconsidering the uncached-by-default in light of this? -Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib API questions
Ben Hoyt wrote: However, it seems there was no further discussion about why not "extension" and "extensions"? I have never heard a filename extension being called a "suffix". You can't have read many unix man pages, then! I just searched for "suffix" in the gcc man page, and found this: For any given input file, the file name suffix determines what kind of compilation is done: I know it is a suffix in the sense of the English word, but I've never heard it called that in this context, and I think context is important. This probably depends on your background. In my experience, the term "extension" arose in OSes where it was a formal part of the filename syntax, often highly constrained. E.g. RT11, CP/M, early MS-DOS. Unix has never had a formal notion of extensions like that, only informal conventions, and has called them suffixes at least some of the time for as long as I can remember. 4) Is path_obj.glob() recursive? In the PEP it looks like it is if the pattern starts with '**', I don't think it has to *start* with **. Rather, the ** is a pattern that can span directory separators. It's not a flag that applies to the whole thing -- a pattern could have a * in one place and a ** in another. -- Greg ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib API questions
>> However, it seems there was no further discussion about why not >> "extension" and "extensions"? I have never heard a filename extension >> being called a "suffix". > > > You can't have read many unix man pages, then! Huh, no I haven't! Certainly not regularly, as I'm almost exclusively a Windows user. :-) > This probably depends on your background. In my experience, > the term "extension" arose in OSes where it was a formal > part of the filename syntax, often highly constrained. > E.g. RT11, CP/M, early MS-DOS. > > Unix has never had a formal notion of extensions like that, > only informal conventions, and has called them suffixes at > least some of the time for as long as I can remember. Yes, seems like it definitely is background-dependent. I'm Windows-centric. I stand corrected, and recant my position on "suffix". :-) >> 4) Is path_obj.glob() recursive? In the PEP it looks like it is if the >> pattern starts with '**', > > > I don't think it has to *start* with **. Rather, the ** is > a pattern that can span directory separators. It's not a > flag that applies to the whole thing -- a pattern could have > a * in one place and a ** in another. Oh okay, that makes more sense. It definitely needs more thorough documentation in that case. I would still prefer the simpler and more explicit rglob() / recursive=True rather than pattern new syntax, but I don't feel as strongly anymore. -Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
On Mon, 25 Nov 2013 12:04:28 +1300 Ben Hoyt wrote: > > Right now, pathlib doesn't cache. Guido decided it was safer to start > > off like that, and perhaps later we can add some optional caching. > > > > One reason caching didn't go in is that it's not clear which API is > > best. Working on pluggin scandir() into pathlib would actually help > > choosing a stat-caching API. > > > > (or, rather, lstat-caching...) > > > >> The other related thing is that DirEntry only provides .lstat(), > >> because it's providing stat-like info without following links. > > > > Path.is_dir() and friends use stat(), i.e. they inform you about > > whether a symlink's target is a directory (not the symlink itself). Of > > course, if the DirEntry says the path is a symlink, Path.is_dir() could > > then run stat() to find out about the target. > > > > Do you plan to propose scandir() for inclusion in the stdlib? > > Yes, I was hoping to propose adding "os.scandir() -> yields DirEntry > objects" for inclusion into the stdlib, and also speed up os.walk() as > a result. > > However, pathlib's API with .is_dir() and .lstat() etc are so close to > DirEntry, I'd be much keener to roll up the scandir functionality into > pathlib's iterdir(), as that's already going in the standard library, > and iterdir() already returns Path objects. We could still expose scandir() as a low-level API, *and* call it in pathlib for optimizations. > We could do Path.lstat(cached=True), but we'd also really want > is_dir(cached=True), so that API kinda sucks. Alternatively you could > have iterdir(cached=True) return PathWithCachedStat style objects -- > probably better, but kinda messy. Perhaps Path.enable_caching()? It would enable caching not only on this path objects, but all objects constructed from it. Regards Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [RELEASED] Python 3.4.0b1
In article <[email protected]>, Larry Hastings wrote: > On behalf of the Python development team, it's my privilege to announce > the first beta release of Python 3.4. > > This is a preview release, and its use is not recommended for > production settings. Note to users of the python.org Mac OS X binary installers: if you have installed earlier preview releases of Python 3.4, be aware that the batteries-included built-in Tcl/Tk library support introduced in 3.4.0a2 has been reverted in 3.4.0b1 because it was found to break some third-party packages. As is the case with earlier releases of Python, if you use the python.org 64-bit installer for OS X, you will again need to have a compatible third-party copy of Tcl/Tk 8.5 installed, such as ActiveTcl 8.5.15.1, to avoid the problematic system versions shipped in OS X 10.6+. See http://www.python.org/download/mac/tcltk/ for more information. -- Ned Deily, [email protected] ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
On 25 Nov 2013 09:07, "Ben Hoyt" wrote: > > > Right now, pathlib doesn't cache. Guido decided it was safer to start > > off like that, and perhaps later we can add some optional caching. > > > > One reason caching didn't go in is that it's not clear which API is > > best. Working on pluggin scandir() into pathlib would actually help > > choosing a stat-caching API. > > > > (or, rather, lstat-caching...) > > > >> The other related thing is that DirEntry only provides .lstat(), > >> because it's providing stat-like info without following links. > > > > Path.is_dir() and friends use stat(), i.e. they inform you about > > whether a symlink's target is a directory (not the symlink itself). Of > > course, if the DirEntry says the path is a symlink, Path.is_dir() could > > then run stat() to find out about the target. > > > > Do you plan to propose scandir() for inclusion in the stdlib? > > Yes, I was hoping to propose adding "os.scandir() -> yields DirEntry > objects" for inclusion into the stdlib, and also speed up os.walk() as > a result. > > However, pathlib's API with .is_dir() and .lstat() etc are so close to > DirEntry, I'd be much keener to roll up the scandir functionality into > pathlib's iterdir(), as that's already going in the standard library, > and iterdir() already returns Path objects. > > I'm just not sure it's possible or useful without stat caching. > > We could do Path.lstat(cached=True), but we'd also really want > is_dir(cached=True), so that API kinda sucks. Alternatively you could > have iterdir(cached=True) return PathWithCachedStat style objects -- > probably better, but kinda messy. > > For these reasons, I would much prefer stat caching on by default in > Path -- in my experience, the cached behaviour is desired much much > more often than the non-cached. I've written directory walkers more > often than I can count, whereas I've maybe only once written a > long-running process that needs to re-stat, and if it's clearly > documented as cached, then it's super easy to call restat(), or create > a new Path instance to get new stat info. > > This would allow iterdir() to take advantage of the huge performance > improvements you can get when walking directories. > > Guido, are you at all open to reconsidering the uncached-by-default in > light of this? No, caching on the object is dangerously unintuitive - it means two Path objects can compare equal, but give different answers for stat-dependent queries. A global string (or Path) keyed cache (rather than a per-object cache) would actually be a safer option, since it would ensure distinct path objects always gave the same answer. That's the approach I will likely pursue at some point in walkdir. It's also quite likely the "rich stat object" API will be pursued for 3.5, which is a much safer approach to stat result caching than trying to embed it directly in pathlib.Path objects. That's why we decided to punt on the caching question until 3.5 - it's better to provide a predictable building block that doesn't provide caching, and then work out how to provide a sensible caching layer on top of that, rather than trying to rush a potentially flawed caching design that leads to inconsistent behaviour. Cheers, Nick. > > -Ben > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
On Sun, Nov 24, 2013 at 3:04 PM, Ben Hoyt wrote: > > Right now, pathlib doesn't cache. Guido decided it was safer to start > > off like that, and perhaps later we can add some optional caching. > > > > One reason caching didn't go in is that it's not clear which API is > > best. Working on pluggin scandir() into pathlib would actually help > > choosing a stat-caching API. > > > > (or, rather, lstat-caching...) > > > >> The other related thing is that DirEntry only provides .lstat(), > >> because it's providing stat-like info without following links. > > > > Path.is_dir() and friends use stat(), i.e. they inform you about > > whether a symlink's target is a directory (not the symlink itself). Of > > course, if the DirEntry says the path is a symlink, Path.is_dir() could > > then run stat() to find out about the target. > > > > Do you plan to propose scandir() for inclusion in the stdlib? > > Yes, I was hoping to propose adding "os.scandir() -> yields DirEntry > objects" for inclusion into the stdlib, and also speed up os.walk() as > a result. > > However, pathlib's API with .is_dir() and .lstat() etc are so close to > DirEntry, I'd be much keener to roll up the scandir functionality into > pathlib's iterdir(), as that's already going in the standard library, > and iterdir() already returns Path objects. > > I'm just not sure it's possible or useful without stat caching. > > We could do Path.lstat(cached=True), but we'd also really want > is_dir(cached=True), so that API kinda sucks. Alternatively you could > have iterdir(cached=True) return PathWithCachedStat style objects -- > probably better, but kinda messy. > > For these reasons, I would much prefer stat caching on by default in > Path -- in my experience, the cached behaviour is desired much much > more often than the non-cached. I've written directory walkers more > often than I can count, whereas I've maybe only once written a > long-running process that needs to re-stat, and if it's clearly > documented as cached, then it's super easy to call restat(), or create > a new Path instance to get new stat info. > > This would allow iterdir() to take advantage of the huge performance > improvements you can get when walking directories. > > Guido, are you at all open to reconsidering the uncached-by-default in > light of this? I think we should think hard and deep about all the consequences. I was initially in favor of stat caching, but during offline review of PEP 428 Nick pointed out that there are too many different ways to do stat caching, and convinced me that it would be wrong to rush it. Now that beta 1 is out I really don't want to reconsider this -- we really need to stick to the plan. The ship has likewise sailed for adding scandir() (whether to os or pathlib). By all means experiment and get it ready for consideration for 3.5, but I don't want to add it to 3.4. In general I think there are some tough choices regarding stat caching. You already brought up stat vs. lstat -- there's also the issue of what to do if [l]stat fails -- do we cache the exception? IMO, the current incarnation is for convenience, correctness and cross-platform semantics -- three C's. The next incarnation can add a fourth C, caching. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
Antoine's class-global flag seems like a bad idea. > A global string (or Path) keyed cache (rather than a per-object cache) would > actually be a safer option, since it would ensure distinct path objects > always gave the same answer. That's the approach I will likely pursue at > some point in walkdir. Interesting approach. This wouldn't really solve the problem for scandir / DirEntry / performance issues, but it's a fair idea in general. > It's also quite likely the "rich stat object" API will be pursued for 3.5, > which is a much safer approach to stat result caching than trying to embed > it directly in pathlib.Path objects. As a Windows dev, I'm not sure I love the "rich stat object idea", because stat_result objects are sooo Posixy. On Windows, (some of) the file attribute info is stuffed into a stat_result struct. Which kinda works, but I like how Path exposes the higher-level, cross-platform stuff like .is_dir() so that most of the time you don't need to worry about stat. (You still need to worry about caching, though.) > That's why we decided to punt on the caching question until 3.5 - it's > better to provide a predictable building block that doesn't provide caching, > and then work out how to provide a sensible caching layer on top of that, > rather than trying to rush a potentially flawed caching design that leads to > inconsistent behaviour. Yep, agreed about rushing in a potentially flawed caching design. But I also don't want to "rush in" a design that prohibits scandir()-style performance optimizations -- though I guess it can still go in there one way or the other. "Worst case", we can add os.scandir() separately, which return DirEntry, "path-like" objects. -Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
> I think we should think hard and deep about all the consequences. I was > initially in favor of stat caching, but during offline review of PEP 428 > Nick pointed out that there are too many different ways to do stat caching, > and convinced me that it would be wrong to rush it. Now that beta 1 is out I > really don't want to reconsider this -- we really need to stick to the plan. Fair call, and thanks for the response. > The ship has likewise sailed for adding scandir() (whether to os or > pathlib). By all means experiment and get it ready for consideration for > 3.5, but I don't want to add it to 3.4. Yes, I was definitely thinking about 3.5 at this stage. :-) What would be the next step for getting something like os.scandir() added for 3.5 -- a PEP referencing the various issues? > In general I think there are some tough choices regarding stat caching. You > already brought up stat vs. lstat -- there's also the issue of what to do if > [l]stat fails -- do we cache the exception? > > IMO, the current incarnation is for convenience, correctness and > cross-platform semantics -- three C's. The next incarnation can add a fourth > C, caching. Three/four C's, I like it! -Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib API questions
On 25 Nov 2013 09:14, "Ben Hoyt" wrote: > > >> 4) Is path_obj.glob() recursive? In the PEP it looks like it is if the > >> pattern starts with '**', > > > > > > I don't think it has to *start* with **. Rather, the ** is > > a pattern that can span directory separators. It's not a > > flag that applies to the whole thing -- a pattern could have > > a * in one place and a ** in another. > > Oh okay, that makes more sense. It definitely needs more thorough > documentation in that case. I would still prefer the simpler and more > explicit rglob() / recursive=True rather than pattern new syntax, but > I don't feel as strongly anymore. Using "**" for directory spanning globs is also another case of us borrowing a reasonably common idiom from *nix systems that may not be familiar to Windows users. Cheers, Nick. > > -Ben > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib API questions
> Using "**" for directory spanning globs is also another case of us borrowing > a reasonably common idiom from *nix systems that may not be familiar to > Windows users. Okay, *nix wins then. :-) Python's stdlib is already fairly *nix-oriented (even when it's being cross-platform), so I guess it's not a big deal. My only remaining concern then is that there shouldn't be more than one way to do recursive globbing in a new API like this. Why does rglob() exist when the documentation simply says "like calling glob() but with '**' added in front of the pattern"? http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob -Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
On 25 Nov 2013 09:31, "Ben Hoyt" wrote: > > > It's also quite likely the "rich stat object" API will be pursued for 3.5, > > which is a much safer approach to stat result caching than trying to embed > > it directly in pathlib.Path objects. > > As a Windows dev, I'm not sure I love the "rich stat object idea", > because stat_result objects are sooo Posixy. On Windows, (some of) the > file attribute info is stuffed into a stat_result struct. Which kinda > works, but I like how Path exposes the higher-level, cross-platform > stuff like .is_dir() so that most of the time you don't need to worry > about stat. (You still need to worry about caching, though.) The idea of the rich stat result object is that has all that info prepopulated, based on an initial stat call. "Caching" it amounts to "keep a reference to it". It is suggested that it would be a subset of the pathlib.Path API: http://bugs.python.org/issue19725 If it's also a superset of the existing stat object API, then at least Path.stat and Path.lstat (and perhaps the lower level APIs) can be updated to return it in 3.5. > > That's why we decided to punt on the caching question until 3.5 - it's > > better to provide a predictable building block that doesn't provide caching, > > and then work out how to provide a sensible caching layer on top of that, > > rather than trying to rush a potentially flawed caching design that leads to > > inconsistent behaviour. > > Yep, agreed about rushing in a potentially flawed caching design. But > I also don't want to "rush in" a design that prohibits scandir()-style > performance optimizations -- though I guess it can still go in there > one way or the other. Yeah, the realisation that an initial non-caching approach didn't lock us out of external caching may not have been well communicated to the list. I was discussing the walkdir integration possibilities with Antoine and Guido and realised I would likely still need an external cache, even if pathlib had its own internal caching. At that point, it seemed highly desirable to duck the caching question entirely. > "Worst case", we can add os.scandir() separately, which return > DirEntry, "path-like" objects. Indeed, we may still want such an object API, since dirent doesn't provide full stat info. A PEP reviewing all this for 3.5 and proposing a specific os.scandir API would be a good thing. Cheers, Nick. > > -Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib API questions
On 25 Nov 2013 09:42, "Ben Hoyt" wrote: > > > Using "**" for directory spanning globs is also another case of us borrowing > > a reasonably common idiom from *nix systems that may not be familiar to > > Windows users. > > Okay, *nix wins then. :-) Python's stdlib is already fairly > *nix-oriented (even when it's being cross-platform), so I guess it's > not a big deal. > > My only remaining concern then is that there shouldn't be more than > one way to do recursive globbing in a new API like this. Why does > rglob() exist when the documentation simply says "like calling glob() > but with '**' added in front of the pattern"? > > http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob Because it's a layered API - embedding ** in the pattern is a strictly more powerful interface, but can be a little tricky to get your head around (especially if you don't use a shell that has the feature). rglob() is simpler, but not as flexible. We offer that kind of multi-level API fairly often. For example, subprocess.call() and friends are simpler interfaces for particular ways of using the powerful-but-complex subprocess.Popen API. The metaprogramming stack (functions, classes, decorators, descriptors, metaclasses) similarly offers the ability to trade increased complexity for increases in power and flexibility. In these cases, the "obvious way" is to use the simplest API that covers the use case, and only reach for the more complex API when you genuinely need it. Cheers, Nick. > > -Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
> The idea of the rich stat result object is that has all that info > prepopulated, based on an initial stat call. "Caching" it amounts to "keep a > reference to it". > > It is suggested that it would be a subset of the pathlib.Path API: > http://bugs.python.org/issue19725 > > If it's also a superset of the existing stat object API, then at least > Path.stat and Path.lstat (and perhaps the lower level APIs) can be updated > to return it in 3.5. Got it. >> "Worst case", we can add os.scandir() separately, which return >> DirEntry, "path-like" objects. > > Indeed, we may still want such an object API, since dirent doesn't provide > full stat info. I'm not quite sure what you're suggesting here. In any case, I'm going to modify my scandir() so its DirEntry objects are closer to pathlib.Path, particularly: * isdir() -> is_dir() * isfile() -> is_file() * islink() -> is_symlink() * add is_socket(), is_fifo(), is_block_device(), and is_char_device() I'm considering removing DirEntry's .dirent attribute entirely. The above is_* functions cover everything in .dirent.d_type in a much more Pythonic and cross-platform way, and the only other info in .dirent is d_ino -- can a non-Windows dev tell me how or when d_ino would be useful? If it's useful, is it useful in a higher-level, cross-platform API such as scandir()? Hmmm, I wonder about this "rich stat object" idea in light of the above. Do the methods on pathlib.Path basically supercede the need for this? Because otherwise folks will always be wondering whether to say "path.is_dir()" or "path.stat().is_dir" ... two ways to do it, right next to each other. So I'd prefer to add the "rich" stuff on the higher-level Path instead of the lower-level stat. > A PEP reviewing all this for 3.5 and proposing a specific os.scandir API > would be a good thing. Thanks, I'll definitely consider writing a PEP. -Ben ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib API questions
25.11.13 01:35, Nick Coghlan написав(ла): Using "**" for directory spanning globs is also another case of us borrowing a reasonably common idiom from *nix systems that may not be familiar to Windows users. Rather from Java world. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib and issue 11406 (a directory iterator returning stat-like info)
On 25 November 2013 03:18, Ben Hoyt wrote: > d_ino -- can a non-Windows dev tell me how or when d_ino would be > useful? If it's useful, is it useful in a higher-level, cross-platform > API such as scandir()? OK, so I'm a Windows dev, but my understanding is that d_ino is useful to tell if two files are identical - hard links to the same physical file have the same d_ino value. I don't believe it's possible to do this on Windows at all. I've seen it used in tools like diff, to short-circuit doing the actual diff if you know from a stat that the 2 files are the same. Paul ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
