Re: [Python-Dev] PEP 3147: PYC Repository Directories
Brett Cannon wrote: > If we add a new method like get_filenames(), I would suggest going > with Antoine's suggestion of a tuple for __compiled__ (allowing > loaders to indicate that they actually constructed the runtime > bytecode from multiple cached files on-disk). > > > Does code exist out there where people are constructing bytecode from > multiple files for a single module? I'm quite prepared to call YAGNI on that idea and just return a 2-tuple of source filename and compiled filename. The theoretical use case was for a module that was partially compiled to native code in advance, so it's "compiled" version was a combination of a shared library and a bytecode file. It isn't really all that compelling an idea - it would be easy enough for a loader to pick one or the other and stick that in __compiled__. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia --- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] IO module improvements
Hello The new modular io system of python is awesome, but I'm running into some of its limits currently, while replacing the raw FileIO with a more advanced stream. So here are a few ideas and questions regarding the mechanisms of this IO system. Note that I'm speaking in python terms, but these ideas should also apply to the C implementation (with more programming hassle of course). - some streams have specific attributes (i.e mode, name...), but since they'll often been wrapped inside buffering or encoding streams, these attributes will not be available to the end user. So wouldn't it be great to implement some "transversal inheritance", simply by delegating to the underlying buffer/raw-stream, attribute retrievals which fail on the current stream ? A little __getattr__ should do it fine, shouldn't it ? By the way, I'm having trouble with the "name" attribute of raw files, which can be string or integer (confusing), ambiguous if containing a relative path, and which isn't able to handle the new case of my library, i.e opening a file from an existing file handle (which is ALSO an integer, like C file descriptors...) ; I propose we deprecate it for the benefit or more precise attributes, like "path" (absolute path) and "origin" (which can be "path", "fileno", "handle" and can be extended...). Methods too would deserve some auto-forwarding. If you want to bufferize a raw stream which also offers size(), times(), lock_file() and other methods, how can these be accessed from a top-level buffering/text stream ? So it would be interesting to have a system through which a stream can expose its additional features to top level streams, and at the same time tell these if they must flush() or not before calling these new methods (eg. asking the inode number of a file doesn't require flushing, but knowing its real size DOES require it.). - I feel thread-safety locking and stream stream status checking are currently overly complicated. All methods are filled with locking calls and CheckClosed() calls, which is both a performance loss (most io streams will have 3 such levels of locking, when 1 would suffice) and error-prone (some times ago I've seen in sources several functions in which checks and locks seemed lacking). Since we're anyway in a mood of imbricating streams, why not simply adding a "safety stream" on top of each stream chain returned by open() ? That layer could gracefully handle mutex locking, CheckClosed() calls, and even, maybe, the attribute/method forwarding I evocated above. I know a pure metaprogramming solution would maybe not suffice for performance-seekers, but static implementations should be doable as well. - some semantic decisions of the current system are somehow dangerous. For example, flushing errors occuring on close are swallowed. It seems to me that it's of the utmost importance that the user be warned if the bytes he wrote disappeared before reaching the kernel ; shouldn't we decidedly enforce a "don't hide errors" everywhere in the io module ?. Regards, Pascal ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module improvements
Pascal Chambon gmail.com> writes:
>
> By the way, I'm having trouble with the "name" attribute of raw files,
> which can be string or integer (confusing), ambiguous if containing a
> relative path, and which isn't able to handle the new case of my
> library, i.e opening a file from an existing file handle (which is ALSO
> an integer, like C file descriptors...)
What is the difference between "file handle" and a regular C file descriptor?
Is it some Windows-specific thing?
If so, then perhaps it deserves some Windows-specific attribute ("handle"?).
> Methods too would deserve some auto-forwarding. If you want to bufferize
> a raw stream which also offers size(), times(), lock_file() and other
> methods, how can these be accessed from a top-level buffering/text
> stream ?
I think it's a bad idea. If you forget to implement one of the standard IO
methods (e.g. seek()), it will get forwarded to the raw stream, but with the
wrong semantics (because it won't take buffering into account).
It's better to require the implementor to do the forwarding explicitly if
desired, IMO.
> - I feel thread-safety locking and stream stream status checking are
> currently overly complicated. All methods are filled with locking calls
> and CheckClosed() calls, which is both a performance loss (most io
> streams will have 3 such levels of locking, when 1 would suffice)
FileIO objects don't have a lock, so there are 2 levels of locking at worse, not
3 (and, actually, TextIOWrapper doesn't have a lock either, although perhaps it
should).
As for the checkClosed() calls, they are probably cheap, especially if they
bypass regular attribute lookup.
> Since we're anyway in a mood of imbricating streams, why not simply
> adding a "safety stream" on top of each stream chain returned by open()
> ? That layer could gracefully handle mutex locking, CheckClosed() calls,
> and even, maybe, the attribute/method forwarding I evocated above.
It's an interesting idea, but it could also end up slower than the current
situation.
First because you are adding a level of indirection (i.e. additional method
lookups and method calls).
Second because currently the locks aren't always taken. For example, in
BufferedIOReader, we needn't take the lock when the requested data is available
in our buffer (the GIL already protects us). Having a separate "synchronizing"
wrapper would forbid such micro-optimizations.
If you want to experiment with this, you can use iobench (in the Tools
directory) to measure file IO performance.
> - some semantic decisions of the current system are somehow dangerous.
> For example, flushing errors occuring on close are swallowed. It seems
> to me that it's of the utmost importance that the user be warned if the
> bytes he wrote disappeared before reaching the kernel ; shouldn't we
> decidedly enforce a "don't hide errors" everywhere in the io module ?
It may be a bug. Can you report it, along with a script or test showcasing it?
Regards
Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module improvements
On Fri, Feb 5, 2010 at 5:28 AM, Antoine Pitrou wrote:
> Pascal Chambon gmail.com> writes:
>>
>> By the way, I'm having trouble with the "name" attribute of raw files,
>> which can be string or integer (confusing), ambiguous if containing a
>> relative path,
Why is it ambiguous? It sounds like you're using str() of the name and
then can't tell whether the file is named e.g. '1' or whether it
refers to file descriptor 1 (i.e. sys.stdout).
>> and which isn't able to handle the new case of my
>> library, i.e opening a file from an existing file handle (which is ALSO
>> an integer, like C file descriptors...)
>
> What is the difference between "file handle" and a regular C file descriptor?
> Is it some Windows-specific thing?
> If so, then perhaps it deserves some Windows-specific attribute ("handle"?).
Make it mirror the fileno() attribute.
>> Methods too would deserve some auto-forwarding. If you want to bufferize
>> a raw stream which also offers size(), times(), lock_file() and other
>> methods, how can these be accessed from a top-level buffering/text
>> stream ?
>
> I think it's a bad idea. If you forget to implement one of the standard IO
> methods (e.g. seek()), it will get forwarded to the raw stream, but with the
> wrong semantics (because it won't take buffering into account).
>
> It's better to require the implementor to do the forwarding explicitly if
> desired, IMO.
Agreed. If an underlying stream has a certain property that doesn't
mean the above stream has the same property. Calling methods on the
underlying stream that move the file position may wreak havoc on the
buffer consistency of the above stream. Etc., etc. Please don't do
this. Antoine has the right idea.
>> - I feel thread-safety locking and stream stream status checking are
>> currently overly complicated. All methods are filled with locking calls
>> and CheckClosed() calls, which is both a performance loss (most io
>> streams will have 3 such levels of locking, when 1 would suffice)
>
> FileIO objects don't have a lock, so there are 2 levels of locking at worse,
> not
> 3 (and, actually, TextIOWrapper doesn't have a lock either, although perhaps
> it
> should).
> As for the checkClosed() calls, they are probably cheap, especially if they
> bypass regular attribute lookup.
>
>> Since we're anyway in a mood of imbricating streams, why not simply
>> adding a "safety stream" on top of each stream chain returned by open()
>> ? That layer could gracefully handle mutex locking, CheckClosed() calls,
>> and even, maybe, the attribute/method forwarding I evocated above.
>
> It's an interesting idea, but it could also end up slower than the current
> situation.
> First because you are adding a level of indirection (i.e. additional method
> lookups and method calls).
> Second because currently the locks aren't always taken. For example, in
> BufferedIOReader, we needn't take the lock when the requested data is
> available
> in our buffer (the GIL already protects us). Having a separate "synchronizing"
> wrapper would forbid such micro-optimizations.
>
> If you want to experiment with this, you can use iobench (in the Tools
> directory) to measure file IO performance.
>
>> - some semantic decisions of the current system are somehow dangerous.
>> For example, flushing errors occuring on close are swallowed. It seems
>> to me that it's of the utmost importance that the user be warned if the
>> bytes he wrote disappeared before reaching the kernel ; shouldn't we
>> decidedly enforce a "don't hide errors" everywhere in the io module ?
>
> It may be a bug. Can you report it, along with a script or test showcasing it?
>
> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>
--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module improvements
On 03:57 pm, [email protected] wrote: On Fri, Feb 5, 2010 at 5:28 AM, Antoine Pitrou wrote: Pascal Chambon gmail.com> writes: By the way, I'm having trouble with the "name" attribute of raw files, which can be string or integer (confusing), ambiguous if containing a relative path, Why is it ambiguous? It sounds like you're using str() of the name and then can't tell whether the file is named e.g. '1' or whether it refers to file descriptor 1 (i.e. sys.stdout). I think string/integer and ambiguity were different points. Here's the ambiguity: exar...@boson:~$ python Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15)[GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import os, io >>> f = io.open('.bashrc') >>> os.chdir('/') >>> f.name '.bashrc' >>> os.path.abspath(f.name) '/.bashrc' >>> Jean-Paul ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (01/29/10 - 02/05/10)
Python tracker at http://bugs.python.org/
To view or respond to any of the issues listed below, click on the issue
number. Do NOT respond to this message.
2602 open (+38) / 17079 closed (+15) / 19681 total (+53)
Open issues with patches: 1069
Average duration of open issues: 707 days.
Median duration of open issues: 461 days.
Open Issues Breakdown
open 2568 (+38)
pending33 ( +0)
Issues Created Or Reopened (57)
___
allow unicode keyword args 02/04/10
http://bugs.python.org/issue4978reopened barry
patch, needs review
A selection of spelling errors and typos throughout source 02/04/10
http://bugs.python.org/issue5341reopened ezio.melotti
patch
enable compilation of readline module on Mac OS X 10.5 and 10.6 02/04/10
http://bugs.python.org/issue6877reopened barry
patch, 26backport, needs review
segfault when deleting from a list using slice with very big `st 02/03/10
http://bugs.python.org/issue7788reopened mark.dickinson
patch
test_macostools fails on OS X 10.6: no attribute 'FSSpec'01/29/10
http://bugs.python.org/issue7807created mark.dickinson
test_bsddb3 leaks references 01/29/10
http://bugs.python.org/issue7808created flox
patch
Documentation for random module should indicate that a call to s 01/30/10
CLOSED http://bugs.python.org/issue7809created Justin.Lebar
fix_callable breakage01/30/10
CLOSED http://bugs.python.org/issue7810created loewis
[decimal] ValueError -> TypeError in from_tuple 01/30/10
http://bugs.python.org/issue7811created skrah
Call to gestalt('sysu') on OSX can lead to freeze in wxPython ap 01/30/10
http://bugs.python.org/issue7812created phansen
Bug in command-line module launcher 01/30/10
http://bugs.python.org/issue7813created pakal
patch
SimpleXMLRPCServer Example uses "mul" instead of "div" in client 01/30/10
CLOSED http://bugs.python.org/issue7814created mnewman
Regression in unittest traceback formating extensibility 01/30/10
http://bugs.python.org/issue7815created gz
test_capi crashes when run with "-R" 01/30/10
CLOSED http://bugs.python.org/issue7816created flox
patch
Pythonw.exe fails to start 01/31/10
CLOSED http://bugs.python.org/issue7817created ZDan
Improve set().test_c_api(): don't expect a set("abc"), modify th 01/31/10
http://bugs.python.org/issue7818created haypo
patch
sys.call_tracing(): check arguments type 01/31/10
CLOSED http://bugs.python.org/issue7819created haypo
patch
parser: restores all bytes in the right order if check_bom() fai 01/31/10
http://bugs.python.org/issue7820created haypo
patch
Command line option -U not documented01/31/10
CLOSED http://bugs.python.org/issue7821created stevenjd
Re: [Python-Dev] Rational for PEP 3147 (PYC Respository Directories)
Dne 3.2.2010 18:39, Antoine Pitrou napsal(a): > Neil Schemenauer arctrix.com> writes: >> >> Thanks for doing the work of writing a PEP. The rational section >> could use some strengthing, I think. Who is benefiting from this >> feature? Is it the distribution package maintainers? Maybe people >> who use a distribution packaged Python and install packages from >> PyPI. It's not clear to me, anyhow. > > It would also be nice to have other packagers' take on this (Redhat, Mandriva, > etc.). But of course you aren't responsible if they don't show up. As the SUSE guy, i don't care either way. This has simply no benefits or drawbacks for us. This solution can only be beneficial in systems like Debian's python-support, where you byte-compile when installing. We byte-compile at build time, so if we wanted to support more than one python within one package, we would need to distribute a rpm full of different .pycs for all supported python versions. Yes, that was not possible before and it is possible with this PEP, but there is no sense in doing it ;) That said, i don't particularly care whether the installed pycs are in a separate __pycache__ directory or next to their sources. (there were very good arguments in the other thread against subdir clutter, one more from me: each subdirectory has a separate entry in rpm database, so by creating subdir clutter you're also cluttering our packaging system) +0 from me regards m. > > cheers > > Antoine. > > > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/jmatejek%40suse.cz ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module improvements
On Fri, Feb 5, 2010 at 8:46 AM, wrote: > On 03:57 pm, [email protected] wrote: >> >> On Fri, Feb 5, 2010 at 5:28 AM, Antoine Pitrou >> wrote: >>> >>> Pascal Chambon gmail.com> writes: By the way, I'm having trouble with the "name" attribute of raw files, which can be string or integer (confusing), ambiguous if containing a relative path, >> >> Why is it ambiguous? It sounds like you're using str() of the name and >> then can't tell whether the file is named e.g. '1' or whether it >> refers to file descriptor 1 (i.e. sys.stdout). > > I think string/integer and ambiguity were different points. Here's the > ambiguity: > > exar...@boson:~$ python > Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15) [GCC 4.4.1] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import os, io > >>> f = io.open('.bashrc') > >>> os.chdir('/') > >>> f.name > '.bashrc' > >>> os.path.abspath(f.name) > '/.bashrc' > >>> > Jean-Paul You're right, I didn't see the OP's comma. :-) I don't think this can be helped though -- I really don't want open() to be slowed down or complicated by an attempt to do path manipulation. If this matters to the app author they should use os.path.abspath() or os.path.realpath() or whatever before calling open(). -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module improvements
Guido van Rossum wrote:
> You're right, I didn't see the OP's comma. :-)
>
> I don't think this can be helped though -- I really don't want open()
> to be slowed down or complicated by an attempt to do path
> manipulation. If this matters to the app author they should use
> os.path.abspath() or os.path.realpath() or whatever before calling
> open().
I had the idea to add a property that returns the file name based on the
file descriptor. However there isn't a plain way to lookup the file
based on the fd on POSIX OSes. fstat() returns only the inode and
device. The combination of inode + device references 0 to n files due to
anonymous files and hard links. On POSIX OSes with a /proc file systems
it's possible to do a reverse lookup by (ab)using /proc/self/fd/, but
that's a hack.
>>> import os
>>> f = open("/etc/passwd")
>>> fd = f.fileno()
>>> os.readlink("/proc/self/fds/%i" % fd)
'/etc/passwd'
On Windows it's possible to get the file name from the handle with
GetFileInformationByHandleEx().
This doesn't strike me as a feasible options ...
Christian
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module improvements
On Fri, Feb 5, 2010 at 3:47 PM, Christian Heimes wrote:
> I had the idea to add a property that returns the file name based on the
> file descriptor. However there isn't a plain way to lookup the file
> based on the fd on POSIX OSes. fstat() returns only the inode and
> device. The combination of inode + device references 0 to n files due to
> anonymous files and hard links. On POSIX OSes with a /proc file systems
> it's possible to do a reverse lookup by (ab)using /proc/self/fd/, but
> that's a hack.
>
import os
f = open("/etc/passwd")
fd = f.fileno()
os.readlink("/proc/self/fds/%i" % fd)
> '/etc/passwd'
>
> On Windows it's possible to get the file name from the handle with
> GetFileInformationByHandleEx().
>
> This doesn't strike me as a feasible options ...
It's good to know about such options, but I really don't like to add
such brittle APIs to the standard I/O objects. So, agreed, this is not
feasible.
--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module improvements
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Antoine Pitrou wrote:
> Pascal Chambon gmail.com> writes:
>> By the way, I'm having trouble with the "name" attribute of raw files,
>> which can be string or integer (confusing), ambiguous if containing a
>> relative path, and which isn't able to handle the new case of my
>> library, i.e opening a file from an existing file handle (which is ALSO
>> an integer, like C file descriptors...)
>
> What is the difference between "file handle" and a regular C file descriptor?
> Is it some Windows-specific thing?
> If so, then perhaps it deserves some Windows-specific attribute ("handle"?).
File descriptors are integer indexes into a process-specific table.
File handles are pointers to opaque structs which contain other
information the kernel knows about the file. MS Windows muddies the
distinction, using "file handle" to refer to the integer index.
[1] http://en.wikipedia.org/wiki/File_handle
Tres.
- --
===
Tres Seaver +1 540-429-0999 [email protected]
Palladion Software "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkttGsUACgkQ+gerLs4ltQ733gCgqrkKNryUrWvLLEjoOWL7z5IY
PnkAnREQKkY3CbPikOdEq4sYQcUylKxw
=Sr71
-END PGP SIGNATURE-
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IO module improvements
On Sat, Feb 6, 2010 at 4:31 PM, Tres Seaver wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Antoine Pitrou wrote:
>> Pascal Chambon gmail.com> writes:
>>> By the way, I'm having trouble with the "name" attribute of raw files,
>>> which can be string or integer (confusing), ambiguous if containing a
>>> relative path, and which isn't able to handle the new case of my
>>> library, i.e opening a file from an existing file handle (which is ALSO
>>> an integer, like C file descriptors...)
>>
>> What is the difference between "file handle" and a regular C file descriptor?
>> Is it some Windows-specific thing?
>> If so, then perhaps it deserves some Windows-specific attribute ("handle"?).
>
> File descriptors are integer indexes into a process-specific table.
AFAIK, they aren't simple indexes in windows, and that's partly why
even file descriptors cannot be safely passed between C runtimes on
windows (whereas they can in most unices).
David
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
