date:20081208

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Stephen J. Turnbull

Glenn Linderman writes:
 > On approximately 12/7/2008 8:13 PM, came the following characters from 

 > I have no problem with having strict validation available.  But
 > doesn't validation take significantly longer than decoding?

I think you're thinking of XML, where validation can take significant
resources over and above syntax checking.  For Unicode, not unless
you're seriously CPU-bound.  Unicode validation is a matter of a few
range checks and a couple of flags to handle things like lone
surrogates.

In the case of "excess length" in UTF-8, you can actually often do it
in *zero* time if you use a table to analyze the leading byte (eg,
0xC0 and 0xC1 are invalid UTF-8 leading bytes because they would
necessarily decode to U+ to U+007F, ie, the ASCII range), because
you have to make a check for 0xFE and 0xFF anyway, which can't be
UTF-8 leading bytes.  (I'm not sure this generalizes to longer UTF-8
sequences, but it would reject the use of 0xC0 0xAF to sneak in a "/"
in zero time!)

 > So I think it should be logically decoupled... do validation
 > when/where it is needed for security reasons,

Security is an important application, but the real issue is that
naively decoded text is a bomb with a sensitive impact fuse.  Pass it
around long enough, and it will blow up eventually.

The whole point of the fairly complex rules about Unicode formats and
the *requirement* that broken coding be a fatal error *in a
connforming Unicode process* is intended to ensure that Unicode
exceptions[1] only ever occur on input (or memory corruption and the
like, which is actually a form of I/O, of course).  That's where
efficiency comes from.

I think Python 3 should aspire to (eventually) be a conforming process
by default, with lax behavior an option.

 > and allow internal [de]coding to be faster.

"Internal decoding" is (or should be) an oxymoron.  Why would your
software be passing around text in any format other than internal?  So
decoding will happen (a) on I/O, which is itself almost certainly
slower than making a few checks for Unicode hygiene, or (b) on receipt
of data from other software that whose sanitation you shouldn't trust
more than you trust the Internet.

Encoding isn't a problem, AFAICS.

 > You didn't address the issue that if the decoding to a canonical
 > form is done first, many of the insecurities just go away, so why
 > throw errors?

Because as long as you're decoding anyway, it costs no more to do it
right, except in rare cases.  Why do you think Python should aspire to
"quick and dirty" in a context where dirty is known to be unhealthy,
and there is no known need for speed?  Why impose "doing it right" on
the application programmer when there's a well-defined spec for that
that we could implement in the standard library?

It's the errors themselves that people are objecting to.  See Guido's
posts for concisely stated arguments for a "don't ask, don't tell"
policy toward Unicode breakage.  I agree that Python should implement
that policy as an option, but I think that the user should have to
request it either with a runtime option or (in the case of user == app
programmer) by deliberately specifying a lax codec.  The default
*Unicode* codecs should definitely aspire to full Unicode conformance
within their sphere of responsibility.

Footnotes: 
[1]  A character outside the repertoire that the app can handle is not
a "Unicode exception", unless the reason the app can't handle it is
that the Unicode handler blew up.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Stephen J. Turnbull

Glenn Linderman writes:

 > "significantly" seems to be the only word at question; it seems that 
 > there are a fair number of validation checks that could be performed; 
 > the numeric part of UTF-8 decoding is just a sequence of shifts, masks, 
 > and ORs, so can be coded pretty tightly in C or assembly language.
 > 
 > Anything extra would be slower; how much slower is hard to predict prior 
 > to the implementation.

Not much, see my previous response.

 > This also seems to be supported by Stephen's comment "That's a lot
 > to ask, as it turns out."

Not what I meant.  Inefficiency is not an objection to checking for
validity at the level a codec can handle.  The objection is that "we
don't want *any* exceptions thrown that we didn't explicitly ask for",
and adding validation certainly will violate that.

 > So I don't understand how this is responsive to the "decoding removes 
 > many insecurities" issue?

Because you have to recheck every time the data crosses from Python
into your code.  To the extent that Python codecs promise validation
and keep that promise, internal code *never* has to make those checks.
That is a significant savings in programmer effort, because auditing a
large body of code for *any* I/O from Python is going to be costly.

 > So when you examine a library for potential use, you have documentation 
 > or code to help you set your expectations about what it does, and 
 > whether or not it may have vulnerabilities, and whether or not those 
 > vulnerabilities are likely or unlikely, whether you can reduce the 
 > likelihood or prevent the vulnerabilities by wrapping the API, etc.  And 
 > so you choose to use the library, or not.

Python is precisely such a component that people will choose to use,
or not, based on whether they can expect that when Python hands them a
Unicode object freshly input from the outside world, it won't contain
lone surrogates, or invalid UTF-8 characters that got through a
3rd-party spam filter, or whatever.

 > This whole discussion about libraries seems somewhat irrelevant to the 
 > question at hand,

No, it's the *only* point that matters.  IMO, speed is not relevant
here.  The question is whether throwing a Unicode exception on invalid
encoding by default generally does more good than harm.  Guido seems
to think "not!", which gives me pause.  I still disagree, though.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Ulrich Eckhardt

On Friday 05 December 2008, James Y Knight wrote:
> On Dec 5, 2008, at 5:27 AM, Ulrich Eckhardt wrote:
> > Using the byte variant is equally fubar, because e.g. on MS Windows
> > it is not supported, except through a very lossy roundtrip through
> > the locale's codepage, limiting your functionality.
>
> Yeah, IMO whole mess could have been avoided by keeping the filename/
> args/environ simply *bytes*, like it really is, on unix. Then, make
> the Windows version of python use (always! not dependent upon locale!)
> utf-8 to decode the utf-8 bytestring to the UTF-16 that the Windows
> platform APIs expect (and vice versa).

If possible, I would try to avoid this useless roundtrip from UTF-16 to UTF-8 
and back.

> And never use the ASCII variant of the windows APIs.

That's okay, but I'm afraid it's not possible. The problem is not so much 
doing it, but finding all those places where it is currently done. Those 
could be outside of Python itself. So, even to Python code, there could still 
be APIs that would need the MBCS-encoded strings.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**
   Visit our website at 
**
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten 
bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen 
Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein 
sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, 
weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte 
Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht 
verantwortlich.

**

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Glenn Linderman

On approximately 12/8/2008 12:57 AM, came the following characters from 
the keyboard of Stephen J. Turnbull:



"Internal decoding" is (or should be) an oxymoron.  Why would your
software be passing around text in any format other than internal?  So
decoding will happen (a) on I/O, which is itself almost certainly
slower than making a few checks for Unicode hygiene, or (b) on receipt
of data from other software that whose sanitation you shouldn't trust
more than you trust the Internet.

Encoding isn't a problem, AFAICS.



So I can see validating user supplied data, which always comes in via I/O.

But during manipulation of internal data, including file and database 
I/O, there is a need for encoding and decoding also.  If all the data 
has already been validated, then there would be no need to revalidate on 
every conversion.


I hear you when you say that clever coding can make the validation 
nearly free, and I applaud that: the UTF-8 coder that I wrote predated 
most of the rules that have been created since, so I didn't attempt to 
be clever in that regard.


Thanks to you and Adam for your explanations; I see your points, and if 
it is nearly free, I withdraw most of my negativity on this topic.



--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.0.1 possibilities

2008-12-08 Thread Christian Heimes


Martin v. Löwis wrote:

I wasn't (primarily) talking about fixing this particular issue.
Time needs to be made available also for the upcoming 2.4.6 and 2.5.3
releases (which should, IMO, get priority over a 3.0 bugfix release
at this point)


I've no opinion on the priority of the releases. Since you are 
responsible for the 2.4 and 2.5 releases as well as the Windows 
binaries, it's your choice. For the future we should find somebody to 
assist you with the Windows installers in order to release some pressure 
from you.



I think 3.0.1 should also address other serious bugs in 3.0, such
as
- various IDLE bugs with non-ASCII characters (2827, 4008, 4323, 4410)
- various ways to crash Python through the buffer protocol
  (4583, 4509; also 4580)


My list wasn't complete. I'm +1 for your additions.


IIUC, you want the bugfix version number to be sync'ed. I don't
think that is a useful thing to have.


Yeah. Barry also said it's a neat thing to have - but just a neat thing.


I don't recall such policy, and I can't see anything wrong with
including performance fixes in a bug fix release. Maybe you were
confusing this with whether performance fixes can be considered
release-critical (which they shouldn't, IMO)?


Maybe I'm a confused person? :]

Christian

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.0.1 possibilities

2008-12-08 Thread Barry Warsaw


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Dec 7, 2008, at 7:56 PM, Christian Heimes wrote:


Barry Warsaw wrote:
I'm personally okay with performance fixes in point releases, as  
long it doesn't change API or add additional features.


Does your okay include or exclude new internal APIs like new helper  
functions or a new C modules?


I /personally/ don't have a problem with that, but we need consensus  
before that becomes policy.


- -Barry

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBST0c7nEjvBPtnXfVAQJvQwQAjrCuivCuLT3HNq6n5VvUKVkxto5wyBzW
ka9YuFoBCVRDt7Z7Sn59UeLGVgrsL9Zw2rSra4cXE/1QaUzpxJlaFpafWVJilCPh
+hv6/t6ky0Ww0FsEv+56SRHOVRlfqgNMIbmDXemf40Oo/IYxqNL5HP59NeIvk0oa
u3Mmc7qsP1k=
=ZK8M
-END PGP SIGNATURE-
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.0.1 possibilities

2008-12-08 Thread Barry Warsaw


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Dec 7, 2008, at 11:17 PM, Martin v. Löwis wrote:


I don't recall such policy, and I can't see anything wrong with
including performance fixes in a bug fix release. Maybe you were
confusing this with whether performance fixes can be considered
release-critical (which they shouldn't, IMO)?


I agree with that.
- -Barry

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBST0dJHEjvBPtnXfVAQIqhwQAkdJgQs8aq452mQRWGdNKLBw5Fsu1m/uV
PGcYbRvfD5nzKPhRvCK42okPaUTWXOAuLHf8gvLT+LwRewmztsMVb0JZKVf1MIuT
Msw60Du7jjNgjcbgd55i5mn7swQmGONB7iFfyq5htL3Bp1zQIi+Fhhi4/hZconHl
BTnbqfLGz1Q=
=u9GH
-END PGP SIGNATURE-
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Deciding on dbm API in setup.py

2008-12-08 Thread skip

Several packages provide a dbm-compatible API.  Currently, the code in
Python's setup.py hardcodes the order of consideration: ndbm, then gdbm,
then Berkeley DB.  While the APIs are compatible, the file formats are all
different as far as I know.  If you have ndbm but want to use Berkeley DB
format, you're stuck.  Right now editing setup.py is the only way to
influence the order.

I opened an issue on the bug tracker about this: 

http://bugs.python.org/issue4587

It includes a patch which adds an optional environment variable
(PYDBMLIBORDER) which builders can use to override the order of the default
library checks.  I'm not sure that's the "correct" way to do this, but I'm
at a loss to figure out how else to do it.  Is it possible to easily add a
flag to setup.py, say --dbm-order=gdbm:bdb:ndbm?

If you've got any -- even passing -- interest in this, please read the issue
and add a comment if you feel so moved.

This grew out of a change to adapt to new gdbm library organization:

http://bugs.python.org/issue4487

Unbeknownst to me, I apparently wound up fixing a previously reported issue
about the change:

http://bugs.python.org/issue1167

Skip

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Nick Coghlan

Terry Reedy wrote:
> This to be is an argument for keeping the default the current behavior,
> but not for rejecting flexibility.  The computing world seems to be
> messier than we would like and worse that I realized until this week. As
> you say below, people need to better anticipate the future, and an
> errors parameter would help do that.

It just occurred to me that this seems like a perfect situation to
address via the warning system. The normal warnings mechanics can then
be used to turn it into an exception if so desired, and this can be done
once per application rather than having to pass a separate argument
every time the affected APIs are called.

And the decoding problems don't pass silently either - they just get
emitted as a warning by default instead of causing the application to crash.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Ulrich Eckhardt

On Sunday 07 December 2008, Guido van Rossum wrote:
> My problem with raising exceptions *by default* when an undecodable
> name exists is that it may render an app completely useless in a
> situation where the developer is no longer around. This happened all
> the time with the 2.x Unicode API, where the developer hadn't
> anticipated a particular input potentially containing non-ASCII bytes,
> and the user fed the application non-ASCII text. Making os.listdir
> raise an exception when a directory contains a single undecodable file
> means that the entire directory can't be read, and most likely the
> entire app crashes at that point. Most likely the developer never
> anticipated this situation (since in most places it is either
> impossible or very unlikely) -- after all, if they had anticipated it
> they would have used the bytes API in the first place.

There is another way to handle this that noisily signals errors but doesn't 
cause programs to suddenly fail. Using os.listdir as example, the problem 
there is that the OS actually returns a list of strings that can not be 
reliably decoded, so I would propose to simply not decode them.

Now, the idea is what if this function simply returned neither a byte string 
nor a Unicode string, but e.g. an environment string type (called env_str)? 
os.listdir would only fail if it really failed to read the dir. If a user 
wants to display an element from the returned list, they would get something 
akin to what repr() returns, i.e. a recognisable string that can be written 
to a logfile. However, this thing will also include additional markup that 
makes it clear that it is not just a piece of text and not suitable to 
display to the end user.

This type distinction is important, because it means that any developer will 
immediately see that something unexpected is going on here. They will 
invoke "type(lst[0])" and see the unexpected type env_str, which will (via 
documentation) redirect them to the issue with different encodings and that 
all they have to do is 'map( unicode, lst)' in order to get at a list of real 
text strings, but they will also read that this operation might fail, forcing 
an informed decision.

If they don't care about a textual representation at all but only want to 
invoke os.popen with arguments received from the commandline, then everything 
is fine, too, because that function will take the strings as they are and 
just give them back to the OS. This allows roundtripping from OS over Python 
and back to the OS without any conversions and thus without any conversions 
that could fail. In the case of e.g. a backup program, this is exactly what 
is needed.

Now, if you have any hard-coded strings in your program but a function like 
os.popen needs an env_str object, this string is converted via a default 
encoding, i.e. the same that is used when converting an env_str object to 
Unicode. In this case, I would go so far to say that os.popen should accept 
normal str strings, too, and perform that conversion itself. An alternative 
way would be to reject the string because it is the wrong type, but since 
this internal string's encoding is known, there is no reason to force users 
to convert explicitly, it is just that the conversion might fail.

Similarly, when modifying such an env_str object, like e.g. "bak = 
sys.argv[1]+'.backup'". In this case, the string '.backup' is converted 
according to the default encoding and then appended to the commandline 
argument, the result would again be an env_str object.

Note: There is an option in this design, and that is to make the default 
behaviour in case of nonconvertable env_str objects configurable. A 
filemanager would then replace the undecodable bytes by an approximation, a 
backup program would use strict mode and a music player would perhaps simply 
skip and ignore such strings. The problem there is that changing this option 
would possibly affect other library code that one doesn't even know about 
because it is only used indirectly and its implementation is unknown. For 
that reason, I would rather not make this policy a configurable element. If 
you want that, you can easily code it yourself.

BTW: there was a PEP that proposed a new path class, which was rejected. This 
class was actually pretty similar, except that it also included several other 
features (globbing, path handling, opening files and the kitchen sink) which 
eventually made it too bloated. Otherwise, the idea of creating a separate 
type for these strings is the same.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**
   Visit our website at 
**
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten 
bestimmt und kann vertrauliche Informationen enthal

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread M.-A. Lemburg

On 2008-12-06 01:48, Nick Coghlan wrote:
> You can't display a non-decodable filename to the user, hence the user
> will have no idea what they're working on. Non-filesystem related apps
> have no business trying to deal with insane filenames.

This is not entirely true: OSes, shells, and applications will
typically represent the file names using either ?-replacements or
some form of hex or decimal escapes for the characters they can't
decode. Since humans are usually very good at pattern recognition,
this goes a long way.

Of course, how the application maps that partially converted file name
back to the real thing is another issue and that's something that
Python should not make harder than it should be.

> Linux is moving towards a standard of UTF-8 for filenames, and once we
> get to the point where the idea of encoding filenames and environment
> variables any other way is seen as crazy, then the Python 3 approach
> will work seamlessly.

It's going to take a long time before file names, environment variables
and command line parameters are all encoded using UTF-8, so "practicality
beats purity" will have to get more attention in this thread.

Python APIs should work out of the box most of the time.

Currently, if you live in a non-ASCII and non-pure-UTF-8 environment,
you have to deal with different and mixed encodings on a regular
basis.

Whether that's a USB stick, you're trying to read, a ZIP file
you're trying to open, a mounted network drive, etc. the problem
pops up in many different kinds of areas.

If I write "do_something.py *" I expect Python to indeed work on
all the files in my directory, not just the one that happen to
fit a particular encoding.

If I hook up a CGI script written in Python with a web server,
I expect all data to be received by the script, not just data
that happens to be UTF-8 encoded.

> In the meantime, raw bytes APIs will provide an alternative for those
> that disagree with that philosophy.

I think that's a wrong way to put it: The problems are not made
up by people who disagree with the one-encoding-for-everything
strategy.

The problems occur in real-life IT processing all the time - maybe
not so much in places where English scripts dominate, but certainly
in most other places with non-English scripts.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-08 Thread Antoine Pitrou


Hello,

The Py_buffer struct has two pointers named `shape` and `strides`. Each points
to an array of Py_ssize_t values whose length is equal to the number of
dimensions of the buffer object. Unfortunately, the buffer protocol spec doesn't
explain how allocation of these arrays should be handled.

Right now this is circumvented by either pointing them to an externally-managed
piece of memory (e.g. a Py_ssize_t field in the original PyObject), or by
pointing them to another field in the Py_buffer (because in the case of a
one-dimensional buffer with itemsize == 1, shape[0] is simply equal to the
length of the buffer in bytes).

Of course this is not flexible, and it makes fixing the situation with buffers
of itemsize larger than 1 difficult (indeed, for those buffers, we can't simply
point the shape array to the byte length, and if we are taking a slice of the
memoryview, we can't either point it to the size of the original object (for
example an array.array)). Therefore, arises the problem of allocation of the
shape array.

For the one-dimensional case, I had in mind a simple scheme where the Py_buffer
struct has an additional two-member Py_ssize_t array. Then `shape` and `strides`
can point to the first and second member of this array, respectively. This
wouldn't solve the multi-dimensional case, however.

Thanks for any ideas on how to solve this.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread rdmurray


On Sun, 7 Dec 2008 at 13:33, Guido van Rossum wrote:

My problem with raising exceptions *by default* when an undecodable
name exists is that it may render an app completely useless in a
situation where the developer is no longer around. This happened all


I think Nick Coghlan's suggestion of emitting warnings would be an
excellent solution that addresses both your concerns and the concerns
Toshio has expressed (and with which I agree 100%).

The above is the only use case I've heard in this thread for ignoring
files with names that can't be decoded:  so that a user can use the
program on those files whose names can be decoded even when the user does
not have the resources to get the program fixed to handle undecodable
filenames.  I agree that that is a worthwhile goal.

If warnings were emitted, then files would not be silently ignored,
yet the program could still be used.

--RDM

PS: I'd like to see a similar warning issued when an access attempt
is made through os.environ to a variable that cannot be decoded.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Bill Janssen

Nick Coghlan <[EMAIL PROTECTED]> wrote:

> - I think the binary and Unicode APIs should be available (and fully
> functional) on all platforms (including Windows) so that app developers
> don't create portability problems for themselves when they make the
> decision as to which API to use

+1

I'm perhaps biased here; most of my Python programs don't have user
interfaces, because they don't "talk" to people, they talk to other
programs.  The binary APIs for the OS are essential.  I use and
deeply appreciate all the string handling features in Python,
particularly its firm grip on Unicode issues, but that's *useful*
instead of *essential*.

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Terry Reedy


Nick Coghlan wrote:

Terry Reedy wrote:

This to be is an argument for keeping the default the current behavior,
but not for rejecting flexibility.  The computing world seems to be
messier than we would like and worse that I realized until this week. As
you say below, people need to better anticipate the future, and an
errors parameter would help do that.


It just occurred to me that this seems like a perfect situation to
address via the warning system.


I disagree.

> The normal warnings mechanics can then

be used to turn it into an exception if so desired, and this can be done
once per application rather than having to pass a separate argument
every time the affected APIs are called.


The warning mechanism, as far as I know, because I have never dealt with 
it (and do not want to) is for version issues.  In any case, the snippet 
that you clipped


try:
  files = os.listdir(somedir, errors = strict)
except OSError as e:
  log()
  files = os.listdir(somedir)

specifically requires a per call parameter.


And the decoding problems don't pass silently either - they just get
emitted as a warning by default instead of causing the application to crash.


Do they get automatically logged?  In any case, the errors parameter has 
an in between option to neither ignore or raise but to replace and give 
*something* printable.


This situation seems like an ideal situation for a parameter which gives 
the application program who uses Python a range of options to working 
with an un-ideal world.  I am really flabbergasted why there is so much 
opposition to doing so in favor of more difficult or less functional 
alternatives.


Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Guido van Rossum

On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
>>
>> On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
>>>
>>> Toshio Kuratomi wrote:
>>>
  - If this is true, a definition of os.listdir() that would
 better meet programmer expectation would be: "Give me all files in a
 directory with the output as str type".  The definition of
 os.listdir() would be "Give me all files in a directory
 with the output as bytes type".  Raising an exception when the filenames
 are undecodable is perfectly reasonable in this situation.
>>>
>>> Your examples (snipped) pretty well convince me that there is a use case
>>> for
>>> raising exceptions.  We should move beyond arguing over which one way is
>>> right.  I think there should be a second argument 'ignorebad=False' to
>>> ignore undecodable files rather than raise the exception (or
>>> 'strict=True'
>>> to stop and raise exception on non-decodable names -- then code is 'if
>>> strict: raise ...').  I believe other functions have a similar parameter.
>
> I was thinking of the "normal Unicode 'errors' parameter", as described by
> Nick.
>
>> If you want the exceptions, just use the bytes API and try to decode
>> the byte strings using the system encoding.
>
> If it was a matter of adding a new method, I might agree.  But:
>
> 1. We already have a method that does exactly what you describe.  It is only
> a matter of adding flexibility to the response to problems, for which there
> is already precedent.
>
> 2. Suggesting that people who want strings and not bytes should have to deal
> with bytes, just to get an error notification, seems to negate that point of
> moving to 3.0
>
> 3. A builtin would probably do so better than most programmers would, with
> little touches such as the one suggested below.
>
> 4. An error parameter would ALERT programmers to the possibility of a
> PROBLEM, both in the present and future.  As you say below, people need to
> better anticipate the future.
>
>> My problem with raising exceptions *by default* when an undecodable
>> name exists is that it may render an app completely useless in a
>> situation where the developer is no longer around. This happened all
>> the time with the 2.x Unicode API, where the developer hadn't
>> anticipated a particular input potentially containing non-ASCII bytes,
>> and the user fed the application non-ASCII text. Making os.listdir
>> raise an exception when a directory contains a single undecodable file
>> means that the entire directory can't be read, and most likely the
>> entire app crashes at that point. Most likely the developer never
>> anticipated this situation (since in most places it is either
>> impossible or very unlikely) -- after all, if they had anticipated it
>> they would have used the bytes API in the first place. (It's worse
>> because the exception being raised would be UnicodeError -- most
>> people expect os.listdir to raise OSError, not other errors.)
>
> This to be is an argument for keeping the default the current behavior, but
> not for rejecting flexibility.  The computing world seems to be messier than
> we would like and worse that I realized until this week. As you say below,
> people need to better anticipate the future, and an errors parameter would
> help do that.

I'm fine with whatever API enhancements you can come up with (assuming
others like them too :-) as long as the default remains the current
behavior.

> Is Windows really immune?  What about when it reads the directory of
> possibly old removable media with whatever byte name encodings?  Is this a
> possible source of 'unanticipated' problems?
>
> As to your last sentence, os.listdir() with an errors parameter could
> convert a decoding UnicodeError to "OSError: undecodable file name
> ", thereby supplying the expected exception as well as an
> extractable representation of problematical the raw bytes
>
> Here is a possible use case: I want filenames as 3.0 strings and I
> anticipate no problems at present but, as you say above, something might
> happen years in the future.  I am using 3.0 *because* of the strings ==
> unicode feature.  I would like to write
>
> try:
>  files = os.listdir(somedir, errors = strict)
> except OSError as e:
>  log()
>  files = os.listdir(somedir)
>
> and go one without the problem file but not without logging the problem so a
> future maintainer can consider what to do about it, but only when there is
> an actual need to think about it.
>
> Terry Jan Reedy
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pyth

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread rdmurray


On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote:

 And the decoding problems don't pass silently either - they just get
 emitted as a warning by default instead of causing the application to
 crash.


Do they get automatically logged?  In any case, the errors parameter has an 
in between option to neither ignore or raise but to replace and give 
*something* printable.


This situation seems like an ideal situation for a parameter which gives the 
application program who uses Python a range of options to working with an 
un-ideal world.  I am really flabbergasted why there is so much opposition to 
doing so in favor of more difficult or less functional alternatives.


I'm in favor of an option to control what happens.

I just really really don't want the _default_ to be "ignore".  Defaulting
to a warning is fine with me, as would be defaulting to a traceback.

But defaulting to "silently ignore", as we have now, is just asking for user
confusion and debugging headaches, as detailed by Toshio.  A _worse_ user
experience, IMO, than having a program fail when undecodable filenames
match the selection criteria.

--RDM
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.0.1 possibilities

2008-12-08 Thread Brett Cannon

On Mon, Dec 8, 2008 at 05:11, Barry Warsaw <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On Dec 7, 2008, at 7:56 PM, Christian Heimes wrote:
>
>> Barry Warsaw wrote:
>>>
>>> I'm personally okay with performance fixes in point releases, as long it
>>> doesn't change API or add additional features.
>>
>> Does your okay include or exclude new internal APIs like new helper
>> functions or a new C modules?
>
> I /personally/ don't have a problem with that, but we need consensus before
> that becomes policy.
>

Internal as in just for us I am fine with, but not nothing publicly available.

As for new C modules, I am fine with that as well as long as they add
no new build dependencies.

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Guido van Rossum

On Mon, Dec 8, 2008 at 10:34 AM,  <[EMAIL PROTECTED]> wrote:
> On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote:
>>>
>>>  And the decoding problems don't pass silently either - they just get
>>>  emitted as a warning by default instead of causing the application to
>>>  crash.
>>
>> Do they get automatically logged?  In any case, the errors parameter has
>> an in between option to neither ignore or raise but to replace and give
>> *something* printable.
>>
>> This situation seems like an ideal situation for a parameter which gives
>> the application program who uses Python a range of options to working with
>> an un-ideal world.  I am really flabbergasted why there is so much
>> opposition to doing so in favor of more difficult or less functional
>> alternatives.
>
> I'm in favor of an option to control what happens.
>
> I just really really don't want the _default_ to be "ignore".  Defaulting
> to a warning is fine with me, as would be defaulting to a traceback.
>
> But defaulting to "silently ignore", as we have now, is just asking for user
> confusion and debugging headaches, as detailed by Toshio.  A _worse_ user
> experience, IMO, than having a program fail when undecodable filenames
> match the selection criteria.

Do you really not care about the risk where apps that weren't written
to be prepared to handle this will be rendered completely useless if a
single file in a directory has an unencodable name? This is similar to
an issue that Python had for a long time where it wouldn't start up if
the current directory contained non-ASCII characters.

Given that most developers will not have this issue in their own
environment, most apps will not be prepared for this issue, and that
makes it worse for the app's user!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Scott Dial

Guido van Rossum wrote:
> On Mon, Dec 8, 2008 at 10:34 AM,  <[EMAIL PROTECTED]> wrote:
>> On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote:
  And the decoding problems don't pass silently either - they just get
  emitted as a warning by default instead of causing the application to
  crash.
>>> Do they get automatically logged?  In any case, the errors parameter has
>>> an in between option to neither ignore or raise but to replace and give
>>> *something* printable.
>>
>> I just really really don't want the _default_ to be "ignore".  Defaulting
>> to a warning is fine with me, as would be defaulting to a traceback.
> 
> Do you really not care about the risk where apps that weren't written
> to be prepared to handle this will be rendered completely useless if a
> single file in a directory has an unencodable name?

Since when do warnings cause apps to be rendered completely useless? I
think it's easy to agree that defaulting to an exception is not good for
the reason you give, but I don't see how that applies to a warning. And,
it seems like a warning covers the issues that the other people want as
well. If there is a warning, then there is at least a record of the fact
that some filenames were ignored. Presumably if I was responsible for
the correctness of some piece of code, I would see the warning in a log
of some sort and could investigate it further (if I cared), otherwise I
could choose to ignore it. I don't see os.listdir(name) to be one of
those situations that emitting a warning is a nuisance at all.

-Scott

-- 
Scott Dial
[EMAIL PROTECTED]
[EMAIL PROTECTED]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Bugbee, Larry

> I'm perhaps biased here; most of my Python programs don't have user 
> interfaces, because they don't "talk" to people, they talk to other 
> programs.  The binary APIs for the OS are essential.  I use and 
> deeply appreciate all the string handling features in Python, 
> particularly its firm grip on Unicode issues, but that's *useful* 
> instead of *essential*.

Exactly!  Another +1.

Larry


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread rdmurray


On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote:

On Mon, Dec 8, 2008 at 10:34 AM,  <[EMAIL PROTECTED]> wrote:

I'm in favor of an option to control what happens.

I just really really don't want the _default_ to be "ignore".  Defaulting
to a warning is fine with me, as would be defaulting to a traceback.

But defaulting to "silently ignore", as we have now, is just asking for user
confusion and debugging headaches, as detailed by Toshio.  A _worse_ user
experience, IMO, than having a program fail when undecodable filenames
match the selection criteria.


Do you really not care about the risk where apps that weren't written
to be prepared to handle this will be rendered completely useless if a
single file in a directory has an unencodable name? This is similar to
an issue that Python had for a long time where it wouldn't start up if
the current directory contained non-ASCII characters.


No, I do care.  In another message I agreed with you that having the
ap not fail was a reasonable goal.  What I'm saying is that having it
ignore the undecodable files fail _silently_ is bad.  And not picking
up a file that matches some selection criteria (ex: *.py) because it is
undecodable is a _failure_, in my opinion, that is _worse_ than getting
a traceback because there's an undecodable file in the directory.

But I'm happy with just issuing a warning by default.  That would mean
it doesn't fail silently, but neither does it crash.  Seems like the
best compromise with the broken nature of the real world IT
environment.


Given that most developers will not have this issue in their own
environment, most apps will not be prepared for this issue, and that
makes it worse for the app's user!


It is exactly because most developers won't have the issue in their own
environment that ignoring files silently is a problem.  If they did,
they'd fix their code before it went out the door.  Since they don't,
when their code is used by somebody in a mixed encoding environment,
the programs _will_ fail by ignoring files that they should process.
The question, it seems to me, is do they fail silently and mysteriously
by failing to process files they are supposed to, or do they fail with
at least a little bit of noise?

--RDM
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Guido van Rossum

On Mon, Dec 8, 2008 at 12:07 PM,  <[EMAIL PROTECTED]> wrote:
> On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote:
>>
>> On Mon, Dec 8, 2008 at 10:34 AM,  <[EMAIL PROTECTED]> wrote:
>>>
>>> I'm in favor of an option to control what happens.
>>>
>>> I just really really don't want the _default_ to be "ignore".  Defaulting
>>> to a warning is fine with me, as would be defaulting to a traceback.
>>>
>>> But defaulting to "silently ignore", as we have now, is just asking for
>>> user
>>> confusion and debugging headaches, as detailed by Toshio.  A _worse_ user
>>> experience, IMO, than having a program fail when undecodable filenames
>>> match the selection criteria.
>>
>> Do you really not care about the risk where apps that weren't written
>> to be prepared to handle this will be rendered completely useless if a
>> single file in a directory has an unencodable name? This is similar to
>> an issue that Python had for a long time where it wouldn't start up if
>> the current directory contained non-ASCII characters.
>
> No, I do care.  In another message I agreed with you that having the
> ap not fail was a reasonable goal.  What I'm saying is that having it
> ignore the undecodable files fail _silently_ is bad.  And not picking
> up a file that matches some selection criteria (ex: *.py) because it is
> undecodable is a _failure_, in my opinion, that is _worse_ than getting
> a traceback because there's an undecodable file in the directory.
>
> But I'm happy with just issuing a warning by default.  That would mean
> it doesn't fail silently, but neither does it crash.  Seems like the
> best compromise with the broken nature of the real world IT
> environment.

OK, I can live with that too.

>> Given that most developers will not have this issue in their own
>> environment, most apps will not be prepared for this issue, and that
>> makes it worse for the app's user!
>
> It is exactly because most developers won't have the issue in their own
> environment that ignoring files silently is a problem.  If they did,
> they'd fix their code before it went out the door.  Since they don't,
> when their code is used by somebody in a mixed encoding environment,
> the programs _will_ fail by ignoring files that they should process.
> The question, it seems to me, is do they fail silently and mysteriously
> by failing to process files they are supposed to, or do they fail with
> at least a little bit of noise?

A warning is fine. Whether the app *fails* or *succeeds* when the
warning is issued depends on what the app is trying to do and what the
user expects. There certainly are valid use cases for both, but I
expect that succeeding noisily is going to be at least as common as
failing (in the sense of not doing the right thing, not necessarily
crashing) noisily. This is an improvement over always crashing.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.0.1 possibilities

2008-12-08 Thread Nick Coghlan

Brett Cannon wrote:
> On Mon, Dec 8, 2008 at 05:11, Barry Warsaw <[EMAIL PROTECTED]> wrote:
>> On Dec 7, 2008, at 7:56 PM, Christian Heimes wrote:
>>> Barry Warsaw wrote:
 I'm personally okay with performance fixes in point releases, as long it
 doesn't change API or add additional features.
>>> Does your okay include or exclude new internal APIs like new helper
>>> functions or a new C modules?
>> I /personally/ don't have a problem with that, but we need consensus before
>> that becomes policy.
> Internal as in just for us I am fine with, but not nothing publicly available.

Where would adding a (undocumented) get_filename() method to ZipImporter
objects for the benefit of the -m switch fit then? There are a few
things which don't always work properly because runpy doesn't currently
know how to set __file__ properly when the module comes a zipfile.

Although now that I think about it, I could actually fix that "the right
way" (with a documented get_filename() method on ZipImporter) for 2.7
and 3.1, while using a runpy internal workaround specifically for
ZipImporter instances in the maintenance branches...

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] [PATCH] Make 2to3 --write preserve file mode (eg. execution bit)

2008-12-08 Thread Adeodato Simó

Hello,

after using 2to3 --write over some scripts, I found it very cumbersome
having to run `chmod +x` on each of them afterwards.

The attached patch is a possible way to fix this issue. It'd be great if
somebody could apply it, or write a more appropriate fix.

Many thanks in advance!

P.S.: Please CC me on replies.

-- 
Adeodato Simó dato at net.com.org.es
Debian Developer  adeodato at debian.org
 
Listening to: Manolo García - Prendí la flor
Index: Lib/lib2to3/main.py
===
--- Lib/lib2to3/main.py	(revision 67665)
+++ Lib/lib2to3/main.py	(working copy)
@@ -6,6 +6,7 @@
 import os
 import logging
 import optparse
+import shutil
 
 from . import refactor
 
@@ -39,6 +40,7 @@
 # Actually write the new file
 super(StdoutRefactoringTool, self).write_file(new_text,
   filename, old_text)
+shutil.copymode(backup, filename)
 
 def print_output(self, lines):
 for line in lines:
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread M.-A. Lemburg

On 2008-12-08 19:26, Guido van Rossum wrote:
> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:
>> Here is a possible use case: I want filenames as 3.0 strings and I
>> anticipate no problems at present but, as you say above, something might
>> happen years in the future.  I am using 3.0 *because* of the strings ==
>> unicode feature.  I would like to write
>>
>> try:
>>  files = os.listdir(somedir, errors = strict)
>> except OSError as e:
>>  log()
>>  files = os.listdir(somedir)
>>
>> and go one without the problem file but not without logging the problem so a
>> future maintainer can consider what to do about it, but only when there is
>> an actual need to think about it.

If that error parameter is the same as in unicode(value, errors),
then this would be a useful feature:

People could then choose among the already existing error handlers
('strict', 'ignore', 'replace', 'xmlcharrefreplace') or register
their own ones via the codecs module.

Such application specific error handlers could then also apply
whatever fancy round-trip safe encoding of non-decodable bytes
to Unicode escapes, private code points, etc. as seen fit by the
application.

Perhaps we should also add an ''encoding'' parameter that can be
set on a per directory basis (if necessary) and defaults to the
global file system encoding.

If an application hits directory that is known to cause problems,
it could then chose to receive the file names in a different,
more suitable encoding. This allows implementing fallback
mechanisms with a list of common encodings for a locale.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.0.1 possibilities

2008-12-08 Thread Antoine Pitrou

Nick Coghlan  gmail.com> writes:
> 
> Where would adding a (undocumented) get_filename() method to ZipImporter
> objects for the benefit of the -m switch fit then?

Why not call it _get_filename() in 3.0 and get_filename() in 3.1?



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Antoine Pitrou

M.-A. Lemburg  egenix.com> writes:
> 
> Such application specific error handlers could then also apply
> whatever fancy round-trip safe encoding of non-decodable bytes
> to Unicode escapes, private code points, etc. as seen fit by the
> application.

I'd argue that such fancy round-trip safe error handler should be provided by
Python. It's not reasonable to expect application coders to come up with their
own codec variation based on subtle details of the unicode spec.

Regards

Antoine.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-08 Thread Nick Coghlan

Antoine Pitrou wrote:
> For the one-dimensional case, I had in mind a simple scheme where the 
> Py_buffer
> struct has an additional two-member Py_ssize_t array. Then `shape` and 
> `strides`
> can point to the first and second member of this array, respectively. This
> wouldn't solve the multi-dimensional case, however.
> 
> Thanks for any ideas on how to solve this.

Actually, I think your suggested scheme for the one-dimensional case
shows the way forward: ownership of the shape and strides memory belongs
to the object issuing the Py_buffer struct, and that object needs to
deal with it when the buffer is released. Defining a larger memory chunk
with the Py_buffer as the first item and the shape and stride info
tacked onto the end and returning that from PyObject_GetBuffer() means
that the shape/stride info will be released automatically when the view
is released via PyBuffer_Release().

For more complicated cases, the object providing the views may need to
do some internally bookkeeping to map from Py_buffer pointers to
separately allocated shape/stride information and release those when the
views are released.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.0.1 possibilities

2008-12-08 Thread Nick Coghlan

Antoine Pitrou wrote:
> Nick Coghlan  gmail.com> writes:
>> Where would adding a (undocumented) get_filename() method to ZipImporter
>> objects for the benefit of the -m switch fit then?
> 
> Why not call it _get_filename() in 3.0 and get_filename() in 3.1?

Actually, since it should only be a fairly trivial couple of lines of
code, I think I'm going to put it in the runpy._get_filename() helper
function in the maintenance branches and only move it over to
ZipImporter on the trunk and the py3k branch. That way it's completely
unambiguous that this is just a bug fix for runpy rather than a new
feature for ZipImporter.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [PATCH] Make 2to3 --write preserve file mode (eg. execution bit)

2008-12-08 Thread Mark Dickinson

On Mon, Dec 8, 2008 at 6:51 PM, Adeodato Simó <[EMAIL PROTECTED]> wrote:
>
> The attached patch is a possible way to fix this issue. It'd be great if
> somebody could apply it, or write a more appropriate fix.

Please could you submit your patch to the bug tracker, at

http://bugs.python.org

That way it's less likely to get lost. :)

Thanks,

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.0.1 possibilities

2008-12-08 Thread Barry Warsaw


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Dec 8, 2008, at 3:39 PM, Antoine Pitrou wrote:


Nick Coghlan  gmail.com> writes:


Where would adding a (undocumented) get_filename() method to  
ZipImporter

objects for the benefit of the -m switch fit then?


Why not call it _get_filename() in 3.0 and get_filename() in 3.1?


+1
- -Barry

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBST2LKXEjvBPtnXfVAQJZzAP/avX4YgpBSmOAh6Zc2TZEnsllRz6CRa86
bEPCWF1an7H9zzDl6gS5ZjbstXoEPf0Irr+W6BTSLVnRT/G7rFgw5q/QlG2yqvCP
dgOCT1Vr3PXgXouNkGaBFI5L/Aw2fuDadWUpGeA3FgH3PxaAH0XAr5LcKP2SidXc
v5nDim8lCxc=
=k3gW
-END PGP SIGNATURE-
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread M.-A. Lemburg

On 2008-12-08 21:45, Antoine Pitrou wrote:
> M.-A. Lemburg  egenix.com> writes:
>> Such application specific error handlers could then also apply
>> whatever fancy round-trip safe encoding of non-decodable bytes
>> to Unicode escapes, private code points, etc. as seen fit by the
>> application.
> 
> I'd argue that such fancy round-trip safe error handler should be provided by
> Python. It's not reasonable to expect application coders to come up with their
> own codec variation based on subtle details of the unicode spec.

Fair enough. We could add some e.g.

 * a round-trip safe escape error handler that uses a Unicode private
   code point area which we officially reserve for the Python
   interpreter

 * a human readable escape error handler that encodes the problem
   bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1
   encoded directory name instead of failing

 * a warning error handler that replaces the problem cases with
   a question mark and issues a warning through the warning
   framework

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Nick Coghlan

Terry Reedy wrote:
> Nick Coghlan wrote:
>> Terry Reedy wrote:
>>> This to be is an argument for keeping the default the current behavior,
>>> but not for rejecting flexibility.  The computing world seems to be
>>> messier than we would like and worse that I realized until this week. As
>>> you say below, people need to better anticipate the future, and an
>>> errors parameter would help do that.
>>
>> It just occurred to me that this seems like a perfect situation to
>> address via the warning system.
> 
> I disagree.
> 
>> The normal warnings mechanics can then
>> be used to turn it into an exception if so desired, and this can be done
>> once per application rather than having to pass a separate argument
>> every time the affected APIs are called.
> 
> The warning mechanism, as far as I know, because I have never dealt with
> it (and do not want to) is for version issues.

No, it's just DeprecationWarning in particular that is specific to
versioning issues. That's obviously the one that comes up most often for
core development, but there are other warnings as well (e.g. the
off-by-default ImportWarning when potential packages are skipped because
__init__.py is missing).

For this particular case, I would suggest adding something like
EnvironmentWarning (to parallel the EnvironmentError that is the common
parent of OSError and IOError).

>  In any case, the snippet
> that you clipped
> 
> try:
>   files = os.listdir(somedir, errors = strict)
> except OSError as e:
>   log()
>   files = os.listdir(somedir)
> 
> specifically requires a per call parameter.

True, but the decision to have "errors=warn" as the default behaviour is
independent of the decision of whether or not to allow the behaviour to
be changed on a case-by-case basis. There is nothing stopping us from
doing both.

>> And the decoding problems don't pass silently either - they just get
>> emitted as a warning by default instead of causing the application to
>> crash.
> 
> Do they get automatically logged?

By default warnings are written to sys.stderr. Whether that gets logged
or not will depend on the nature of the application

There are also mechanisms in warnings that allow an application to
override the handling of warnings (and for 2.7/3.1, there are mechanisms
in logging to make it easy to hook the warning system and the logging
system together, so that warnings are automatically logged).

>  In any case, the errors parameter has
> an in between option to neither ignore or raise but to replace and give
> *something* printable.

That's true, and why I would actually support doing both. Adding the
warning is a more pressing need though, since it is what will prevent
the errors from passing silently in the default case.

> This situation seems like an ideal situation for a parameter which gives
> the application program who uses Python a range of options to working
> with an un-ideal world.  I am really flabbergasted why there is so much
> opposition to doing so in favor of more difficult or less functional
> alternatives.

A warning will stop the failure from passing silently in the default
case - that's solving a different problem to the one that the error
handling argument will solve. I do agree that being able to override the
handling on a per-call basis could be a useful feature.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-08 Thread Alexander Belopolsky

I don't have much to add to Nick's reply other than to point you to
numpy, , as a reference
implementation.  You may also get better responses on the numpy list,
< [EMAIL PROTECTED]>.

On Mon, Dec 8, 2008 at 3:46 PM, Nick Coghlan <[EMAIL PROTECTED]> wrote:
> Antoine Pitrou wrote:
>> For the one-dimensional case, I had in mind a simple scheme where the 
>> Py_buffer
>> struct has an additional two-member Py_ssize_t array. Then `shape` and 
>> `strides`
>> can point to the first and second member of this array, respectively. This
>> wouldn't solve the multi-dimensional case, however.
>>
>> Thanks for any ideas on how to solve this.
>
> Actually, I think your suggested scheme for the one-dimensional case
> shows the way forward: ownership of the shape and strides memory belongs
> to the object issuing the Py_buffer struct, and that object needs to
> deal with it when the buffer is released. Defining a larger memory chunk
> with the Py_buffer as the first item and the shape and stride info
> tacked onto the end and returning that from PyObject_GetBuffer() means
> that the shape/stride info will be released automatically when the view
> is released via PyBuffer_Release().
>
> For more complicated cases, the object providing the views may need to
> do some internally bookkeeping to map from Py_buffer pointers to
> separately allocated shape/stride information and release those when the
> views are released.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
> ---
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Adam Olsen

On Mon, Dec 8, 2008 at 1:45 PM, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
> M.-A. Lemburg  egenix.com> writes:
>>
>> Such application specific error handlers could then also apply
>> whatever fancy round-trip safe encoding of non-decodable bytes
>> to Unicode escapes, private code points, etc. as seen fit by the
>> application.
>
> I'd argue that such fancy round-trip safe error handler should be provided by
> Python. It's not reasonable to expect application coders to come up with their
> own codec variation based on subtle details of the unicode spec.

Except they're clearly NOT part of the unicode spec.

Moreover, whatever tricks you use vary depending on if your garbage
input is from UTF-8, UTF-16, or UTF-32 (or any other arbitrary
encoding, like CP-1252 or Shift-JIS.)

At this point someone suggests we have a type that can store an
arbitrary mix of unicode and bytes, so the undecodable portions stay
in their original form. :P

-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 3.0.1 possibilities

2008-12-08 Thread Nick Coghlan

Barry Warsaw wrote:
> On Dec 8, 2008, at 3:39 PM, Antoine Pitrou wrote:
> 
>> Nick Coghlan  gmail.com> writes:
>>>
>>> Where would adding a (undocumented) get_filename() method to ZipImporter
>>> objects for the benefit of the -m switch fit then?
> 
>> Why not call it _get_filename() in 3.0 and get_filename() in 3.1?
> 
> +1

Well, with release manager blessing I'll go with that approach then :)

Now, where are those round tuits to actually get it implemented...

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-08 Thread Antoine Pitrou

Nick Coghlan  gmail.com> writes:
> 
> Actually, I think your suggested scheme for the one-dimensional case
> shows the way forward: ownership of the shape and strides memory belongs
> to the object issuing the Py_buffer struct, and that object needs to
> deal with it when the buffer is released. Defining a larger memory chunk
> with the Py_buffer as the first item and the shape and stride info
> tacked onto the end and returning that from PyObject_GetBuffer() means
> that the shape/stride info will be released automatically when the view
> is released via PyBuffer_Release().

Ok, so another question: given that this will change the Py_buffer layout a bit,
can it go into 3.0.1 and 2.6.2?



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Antoine Pitrou

Adam Olsen  gmail.com> writes:
> 
> Except they're clearly NOT part of the unicode spec.

This is always the same discussion going in circles. I know they're not part of
the unicode spec, but practicality beats purity and if the said error handler
comes with an appropriate warning in the official doc, then why not?

In any case, +1 to Marc-André's proposal.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Adam Olsen

On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> On 2008-12-08 21:45, Antoine Pitrou wrote:
>> M.-A. Lemburg  egenix.com> writes:
>>> Such application specific error handlers could then also apply
>>> whatever fancy round-trip safe encoding of non-decodable bytes
>>> to Unicode escapes, private code points, etc. as seen fit by the
>>> application.
>>
>> I'd argue that such fancy round-trip safe error handler should be provided by
>> Python. It's not reasonable to expect application coders to come up with 
>> their
>> own codec variation based on subtle details of the unicode spec.
>
> Fair enough. We could add some e.g.
>
>  * a round-trip safe escape error handler that uses a Unicode private
>   code point area which we officially reserve for the Python
>   interpreter

This would of course alter the behaviour of those private code points,
preventing them from round-tripping properly.

I don't think round-tripping can be done from an error handler.  You
need a full codec to do it.  A simple option is 8859-1.  Or, ya know,
bytes.  This has long since gotten repetitive..


>  * a human readable escape error handler that encodes the problem
>   bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1
>   encoded directory name instead of failing

Similar to 'ö'.encode('ascii', 'backslashreplace')?  I'm +1 on making that work.


>  * a warning error handler that replaces the problem cases with
>   a question mark and issues a warning through the warning
>   framework

I dub thee errors='warnreplace'.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Toshio Kuratomi

Guido van Rossum wrote:
> On Mon, Dec 8, 2008 at 12:07 PM,  <[EMAIL PROTECTED]> wrote:
>> On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote:
>> But I'm happy with just issuing a warning by default.  That would mean
>> it doesn't fail silently, but neither does it crash.  Seems like the
>> best compromise with the broken nature of the real world IT
>> environment.
> 
> OK, I can live with that too.
> 
Same here.  This lets the application specify globally what should
happen (exception, warning, ignore via the warnings filters) and should
give enough context that it doesn't become a mysterious error in the
program.

The per method addition of an errors argument so that this isoverridable
locally as well as globally is also a nice touch but can be done
separately from this step.

-Toshio



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Victor Stinner

> ('strict', 'ignore', 'replace', 'xmlcharrefreplace')

replace (or xmlcharrefreplace) is just useless because you will not be unable 
to open or rename the file... You just know that there is a strange file in 
the directory.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-08 Thread Nick Coghlan

Antoine Pitrou wrote:
> Nick Coghlan  gmail.com> writes:
>> Actually, I think your suggested scheme for the one-dimensional case
>> shows the way forward: ownership of the shape and strides memory belongs
>> to the object issuing the Py_buffer struct, and that object needs to
>> deal with it when the buffer is released. Defining a larger memory chunk
>> with the Py_buffer as the first item and the shape and stride info
>> tacked onto the end and returning that from PyObject_GetBuffer() means
>> that the shape/stride info will be released automatically when the view
>> is released via PyBuffer_Release().
> 
> Ok, so another question: given that this will change the Py_buffer layout a 
> bit,
> can it go into 3.0.1 and 2.6.2?

No, you misunderstand what I meant. Py_buffer doesn't need to be changed
at all. The *issuing type* would define a new structure with the
additional fields, such as:

struct _my_Py_buffer {
  Py_buffer view;
  SHAPE_TYPEshape;
  STRIDES_TYPE  strides;
}

Internally, the object would use these instead of vanilla Py_buffer
objects, and set the shape and strides pointers inside the view field to
refer to the shape and strides fields.

Clients wouldn't need to know or care that the shape and stride
information had been tacked on to the end of the Py_buffer struct. When
the buffer was released via PyBuffer_Release, the object would throw
away the whole _my_Py_buffer structure (since the pointers are the same).

Alexander's suggestion of going and looking at what the numpy folks have
done in this area is probably a good idea too.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread M.-A. Lemburg

On 2008-12-08 22:32, Adam Olsen wrote:
> On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
>> On 2008-12-08 21:45, Antoine Pitrou wrote:
>>> M.-A. Lemburg  egenix.com> writes:
 Such application specific error handlers could then also apply
 whatever fancy round-trip safe encoding of non-decodable bytes
 to Unicode escapes, private code points, etc. as seen fit by the
 application.
>>> I'd argue that such fancy round-trip safe error handler should be provided 
>>> by
>>> Python. It's not reasonable to expect application coders to come up with 
>>> their
>>> own codec variation based on subtle details of the unicode spec.
>> Fair enough. We could add some e.g.
>>
>>  * a round-trip safe escape error handler that uses a Unicode private
>>   code point area which we officially reserve for the Python
>>   interpreter
> 
> This would of course alter the behaviour of those private code points,
> preventing them from round-tripping properly.
> 
> I don't think round-tripping can be done from an error handler.  You
> need a full codec to do it.  A simple option is 8859-1.  Or, ya know,
> bytes.  This has long since gotten repetitive..

The error handler would just map the problem bytes to the private
area. The application would then have to decide what to do with
them, ie. the error handler only provides one half of the round-
tripping.

And that's on purpose: I don't believe we can come up with some magic
solution for the encodings problem. This is essentially something
that applications will have to solve on a case-by-case basis.

>>  * a human readable escape error handler that encodes the problem
>>   bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1
>>   encoded directory name instead of failing
> 
> Similar to 'ö'.encode('ascii', 'backslashreplace')?  I'm +1 on making that 
> work.

Yes.

>>  * a warning error handler that replaces the problem cases with
>>   a question mark and issues a warning through the warning
>>   framework
> 
> I dub thee errors='warnreplace'.

Yep, something along those lines.

Perhaps there are more and better alternatives. These suggestions
are just to show how the idea could be put to some real-life use.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Adam Olsen

On Mon, Dec 8, 2008 at 1:12 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On Mon, Dec 8, 2008 at 12:07 PM,  <[EMAIL PROTECTED]> wrote:
>> But I'm happy with just issuing a warning by default.  That would mean
>> it doesn't fail silently, but neither does it crash.  Seems like the
>> best compromise with the broken nature of the real world IT
>> environment.
>
> OK, I can live with that too.

+1


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread M.-A. Lemburg

On 2008-12-08 22:39, Victor Stinner wrote:
>> ('strict', 'ignore', 'replace', 'xmlcharrefreplace')
> 
> replace (or xmlcharrefreplace) is just useless because you will not be unable 
> to open or rename the file... You just know that there is a strange file in 
> the directory.

Right, but that's already a lot better than not knowing of the
file's existence at all :-)

Note that the above are standard error handlers for Unicode
conversions. The rest of the email you cut away has more useful
error handlers for the purpose in question.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [PATCH] Make 2to3 --write preserve file mode (eg. execution bit)

2008-12-08 Thread Adeodato Simó

* Mark Dickinson [Mon, 08 Dec 2008 20:56:25 +]:

> On Mon, Dec 8, 2008 at 6:51 PM, Adeodato Simó <[EMAIL PROTECTED]> wrote:

> > The attached patch is a possible way to fix this issue. It'd be great if
> > somebody could apply it, or write a more appropriate fix.

> Please could you submit your patch to the bug tracker, at

> http://bugs.python.org

> That way it's less likely to get lost. :)

Ok, submitted as #4602.

Thanks,

-- 
Adeodato Simó dato at net.com.org.es
Debian Developer  adeodato at debian.org
 
As scarce as truth is, the supply has always been in excess of the demand.
-- Josh Billings

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] "as" keyword woes

2008-12-08 Thread Guido van Rossum

On Sun, Dec 7, 2008 at 1:06 PM, Paul Boddie <[EMAIL PROTECTED]> wrote:
> On Sat Dec 6 21:29:09 CET 2008, Guido van Rossum wrote:
>>
>> On Sat, Dec 6, 2008 at 11:38 AM, Warren DeLano 
>> wrote:
>> > As someone somewhat knowledgable of how parsers work, I do not
>> > understand why a method/attribute name "object_name.as(...)" must
>> > necessarily conflict with a standalone keyword " as ".  It seems to me
>> > that it should be possible to unambiguously separate the two without
>> > ambiguity or undue complication of the parser.
>>
>> That's possible with sufficiently powerful parser technology, but
>> that's not how the Python parser (and most parsers, in my experience)
>> treat reserved words. Reserved words are reserved in all contexts,
>> regardless of whether ambiguity could arise.
>
> Just a quick aside from someone who merely lurks on this list: in SQL, it's
> quite possible to use keywords in a fashion similar to that desired by the
> inquirer, and it's actually possible to double-quote keywords and use them as
> names for things. I'm not advocating more complicated parsing technology for
> any Python implementation, but I think it's pertinent to point out that the
> technology isn't particularly obscure.

>From my experience with SQL, it's nearly as bad as Python in that
every single one of the 200+ reserved words in a typical
implementation cannot be used as a name in any context without using
double quotes. While the double-quote escape is handy (especially
given there are so many obscure reserved words) this is not exactly
what the OP wanted -- they would have to say x."as"('float'), except
using some other notation instead of double quotes. Having to escape
it completely kills the OP's claim that 'as' is "simplest and most
elegant".

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Self in method body

2008-12-08 Thread Filip Gruszczyński

There is a large discussion on python-list about Guido's article about
new self syntax, therefore I would like to use that to raise similar
question: self in the body. Some time ago I was coding in Magik
language (http://en.wikipedia.org/wiki/Magik_(programming_language),
which is dynamically typed and similar to Smalltalk and actually to
Python too - although the syntax is far less appalling. As you can see
in the examples, defining methods is very similar to what Guido
proposed in his blog, though you don't provide the name of the
argument, but the name of the class. Then you just precede attributes
with a '.', which is 4 letters less than self. And, well, this rocks
;-)

It is really not a problem to type 4 letters (well, six with a coma
and a space) in the signature, but it takes a lot of time to type all
those selfs inside the function's body. So I was thinking, if this
issue could be raised too, when new self syntax is proposed. Simple
example looks like this:

class bar:

   def bar.foo():
  .x = 5

This could really save a lot of code, while attributes are still
easily distinguishable.

-- 
Filip Gruszczyński
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Nonlocal shortcut

2008-12-08 Thread Guido van Rossum

On Sun, Dec 7, 2008 at 2:45 PM, Amaury Forgeot d'Arc <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Fabio Zadrozny  wrote:
>> Hi,
>>
>> I'm currently implementing a parser to handle Python 3.0, and one of
>> the points I found conflicting with the grammar specification is the
>> PEP 3104.
>>
>> It says that a shortcut would be added to Python 3.0 so that "nonlocal
>> x = 0" can be written. However, the latest grammar specification
>> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar)
>> doesn't seem to take that into account... So, can someone enlighten me
>> on what should be the correct treatment for that on a grammar that
>> wants to support Python 3.0?
>
> An issue was already filed about this:
> http://bugs.python.org/issue4199
> It should be ready for inclusion in 3.0.1.

No it should not. It should be put in 3.1.

I strongly object against the addition of features of *any* kind to
3.0.1, no matter whether they were promised or announced in a PEP or
in the docs or on the 8 o'clock news.  This would make 3.0.0 forever a
"loser" release.

(I find the removal of 'cmp' hard to swallow too, but in a sense the
addition of features is worse, as it makes downgrading a risk.
Upgrades, no matter how minimal, always represent risks -- however
downgrading shouldn't represent risks, unless you happen to depend on
a bugfix that wasn't present in the downgrade -- but we're not talking
about a bugfix here no matter how you bend the English language.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] "as" keyword woes

2008-12-08 Thread Paul Boddie

On Monday 08 December 2008 22:54:41 Guido van Rossum wrote:
>
> From my experience with SQL, it's nearly as bad as Python in that
> every single one of the 200+ reserved words in a typical
> implementation cannot be used as a name in any context without using
> double quotes.

SQL is a big language; I won't disagree with that! That said, you don't always 
have to quote names like "end" as I mention below.

> While the double-quote escape is handy (especially 
> given there are so many obscure reserved words) this is not exactly
> what the OP wanted -- they would have to say x."as"('float'), except
> using some other notation instead of double quotes. Having to escape
> it completely kills the OP's claim that 'as' is "simplest and most
> elegant".

You can do what the OP wants, at least in PostgreSQL, which is fairly 
conformant. As I wrote on comp.lang.python...

create table "create" (
  "select" varchar
);

select "select" from "create";
select "create".select from "create";

(This from a PostgreSQL 8.2 session.)

I don't know whether SQL 1992 actually allows dropping the double-quotes for 
column names, but this is the kind of thing he has in mind.

Paul
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Adam Olsen

On Mon, Dec 8, 2008 at 2:44 PM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> On 2008-12-08 22:32, Adam Olsen wrote:
>> On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
>>> On 2008-12-08 21:45, Antoine Pitrou wrote:
 M.-A. Lemburg  egenix.com> writes:
> Such application specific error handlers could then also apply
> whatever fancy round-trip safe encoding of non-decodable bytes
> to Unicode escapes, private code points, etc. as seen fit by the
> application.
 I'd argue that such fancy round-trip safe error handler should be provided 
 by
 Python. It's not reasonable to expect application coders to come up with 
 their
 own codec variation based on subtle details of the unicode spec.
>>> Fair enough. We could add some e.g.
>>>
>>>  * a round-trip safe escape error handler that uses a Unicode private
>>>   code point area which we officially reserve for the Python
>>>   interpreter
>>
>> This would of course alter the behaviour of those private code points,
>> preventing them from round-tripping properly.
>>
>> I don't think round-tripping can be done from an error handler.  You
>> need a full codec to do it.  A simple option is 8859-1.  Or, ya know,
>> bytes.  This has long since gotten repetitive..
>
> The error handler would just map the problem bytes to the private
> area. The application would then have to decide what to do with
> them, ie. the error handler only provides one half of the round-
> tripping.

By that point it's already too late.  You've already conflated garbage
PUA with legitimate PUA.

To make it work you need to treat those legitimate PUA scalars as
errors too, transforming them.  A common example is how escaping
replaces a single '\' with '\\'.

Hrm.  nul-escaping should work.  Obviously it can't be used outside
the filesystem though, as they may introduce a legitimate nul.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Nonlocal shortcut

2008-12-08 Thread Calvin Spealman

Did the original PEP discussion cover debates about the shortcut
working for all assignment operators (like += and x[i] =) and the
difference between it being one-shot (doesnt affect x for the rest of
the function) or simply the unrolling into nonlocal x; x= y as it is?

On Mon, Dec 8, 2008 at 5:07 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On Sun, Dec 7, 2008 at 2:45 PM, Amaury Forgeot d'Arc <[EMAIL PROTECTED]> 
> wrote:
>> Hello,
>>
>> Fabio Zadrozny  wrote:
>>> Hi,
>>>
>>> I'm currently implementing a parser to handle Python 3.0, and one of
>>> the points I found conflicting with the grammar specification is the
>>> PEP 3104.
>>>
>>> It says that a shortcut would be added to Python 3.0 so that "nonlocal
>>> x = 0" can be written. However, the latest grammar specification
>>> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar)
>>> doesn't seem to take that into account... So, can someone enlighten me
>>> on what should be the correct treatment for that on a grammar that
>>> wants to support Python 3.0?
>>
>> An issue was already filed about this:
>> http://bugs.python.org/issue4199
>> It should be ready for inclusion in 3.0.1.
>
> No it should not. It should be put in 3.1.
>
> I strongly object against the addition of features of *any* kind to
> 3.0.1, no matter whether they were promised or announced in a PEP or
> in the docs or on the 8 o'clock news.  This would make 3.0.0 forever a
> "loser" release.
>
> (I find the removal of 'cmp' hard to swallow too, but in a sense the
> addition of features is worse, as it makes downgrading a risk.
> Upgrades, no matter how minimal, always represent risks -- however
> downgrading shouldn't represent risks, unless you happen to depend on
> a bugfix that wasn't present in the downgrade -- but we're not talking
> about a bugfix here no matter how you bend the English language.)
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/ironfroggy%40gmail.com
>



-- 
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://techblog.ironfroggy.com/
Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Self in method body

2008-12-08 Thread Steven D'Aprano

On Tue, 9 Dec 2008 08:55:21 am Filip Gruszczyński wrote:
> There is a large discussion on python-list about Guido's article
> about new self syntax, therefore I would like to use that to raise
> similar question: self in the body. Some time ago I was coding in
> Magik language
> (http://en.wikipedia.org/wiki/Magik_(programming_language), which is
> dynamically typed and similar to Smalltalk and actually to Python too
> - although the syntax is far less appalling. As you can see in the
> examples, defining methods is very similar to what Guido proposed in
> his blog, though you don't provide the name of the argument, but the
> name of the class. Then you just precede attributes with a '.', which
> is 4 letters less than self. And, well, this rocks ;-)
>
> It is really not a problem to type 4 letters (well, six with a coma
> and a space) in the signature, but it takes a lot of time to type all
> those selfs inside the function's body. 

For some definition of "a lot".

I've just grabbed a random, heavily OO module from my own code library. 
It has 60 instances of "self", or 240 characters, out of 18,839 
characters in total (including newlines). Removing self will decrease 
the number of my keystrokes and the amount of pure typing time 
(excluding thinking time, debugging time) by about 1.2%. I don't call 
that "a lot" -- it's actually quite small. And it becomes vanishingly 
trivial when you factor in that most of the time spent programming is 
not typing but thinking, testing, debugging, etc.

Doing the same calculation for BaseHTTPServer.py and SimpleHTTPServer.py 
in the standard library, I get 1.9% and 2.0% respectively.

> This could really save a lot of code, while attributes are still
> easily distinguishable.

I don't think so.

-- 
Steven
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Self in method body

2008-12-08 Thread Filip Gruszczyński

> I've just grabbed a random, heavily OO module from my own code library.
> It has 60 instances of "self", or 240 characters, out of 18,839
> characters in total (including newlines). Removing self will decrease
> the number of my keystrokes and the amount of pure typing time
> (excluding thinking time, debugging time) by about 1.2%. I don't call
> that "a lot" -- it's actually quite small. And it becomes vanishingly
> trivial when you factor in that most of the time spent programming is
> not typing but thinking, testing, debugging, etc.

Well, maybe I don't program in Python the "right way" ;-), because
it's a bit more in my code. I repeated this test, and for a random
module holding some GUI stuff (built using PyQt) and it's more than 5%
(213 selfs out of 16204 characters). With a small app for creating
dungeon tiles for role playing games I astonishingly got same very
similar value (484 * 4 / 35000) ;-) Maybe it's a feature of
programming with a lot of gui stuff, which I do. But 1 of the 20 chars
used for a self is quite a lot for me.

-- 
Filip Gruszczyński
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-08 Thread Antoine Pitrou

Nick Coghlan  gmail.com> writes:
> 
> No, you misunderstand what I meant. Py_buffer doesn't need to be changed
> at all. The *issuing type* would define a new structure with the
> additional fields, such as:

With to the current buffer API, this is not possible. It's the caller who
allocates the Py_buffer struct (usually on the stack), not the callee. Therefore
the callee (e.g. the getbufferproc of the issuing type) cannot choose to
allocate a different structure.

(of course complex schemes can be devised where the callee maintains its own
separate storage for shape and strides, but I don't think we want to go there)

> Alexander's suggestion of going and looking at what the numpy folks have
> done in this area is probably a good idea too.

Well, I'm open to others doing this, but I won't do it myself. My interest is in
fixing the most glaring bugs of the buffer API and memoryview object. The numpy
folks are welcome to voice their opinions and give advice on python-dev.

Regards

Antoine.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Terry Reedy


M.-A. Lemburg wrote:


On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <[EMAIL PROTECTED]> wrote:



try:
 files = os.listdir(somedir, errors = strict)
except OSError as e:
 log()
 files = os.listdir(somedir)


 > If that error parameter is the same as in unicode(value, errors),

then this would be a useful feature:


Except that unicode becomes str in 3.0, that is exactly my intention.


People could then choose among the already existing error handlers
('strict', 'ignore', 'replace', 'xmlcharrefreplace') or register
their own ones via the codecs module.


These could be passed through from listdir or getenv to str.

[Side questions:
1. 'xmlcharrefreplace' is not in the 3.0 LibRef doc or doc string. 
Should it be or is 'xmlcharrefreplace' an addition for a later version.
2. A garbage value for errors (such as 'blah') is silently ignored (so I 
cannot test the above).  Intended or a bug?]


Someone else proposed a new option 'warn', which Guido has accepted to 
be the default instead of the current 'ignore'.  It could not be passed 
through (unless str were changed or something registered).  I believe 
the implementation of that would be to call str with 'strict' but catch 
errors and warn instead.  Whether there should be 1 warning for each 
problematic bytes encountered or 1 for each listdir (or whatever) call, 
possibly with the number of problems, I leave to others to decide.



Such application specific error handlers could then also apply
whatever fancy round-trip safe encoding of non-decodable bytes
to Unicode escapes, private code points, etc. as seen fit by the
application.

Perhaps we should also add an ''encoding'' parameter that can be
set on a per directory basis (if necessary) and defaults to the
global file system encoding.


That could also be passed through, but I will lets others make the 
argument for it.


If an application hits directory that is known to cause problems,
it could then chose to receive the file names in a different,
more suitable encoding. This allows implementing fallback
mechanisms with a list of common encodings for a locale.


Terry Jan Reedy


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Self in method body

2008-12-08 Thread Terry Reedy


Filip Gruszczyński wrote:

There is a large discussion on python-list about Guido's article about


That discussion should stay there.


new self syntax, therefore I would like to use that to raise similar
question: self in the body.


That has also be heavily discussed, many times, there and here.


... Then you just precede attributes with a '.',


Guido has specifically rejected that, more than once, I believe.

> which is 4 letters less than self.

As has been said *many* times in previous discussions, you can use 1 
letter intead of 4 if you really wish, if saving keystrokes is your 
highest priority.  But please don't rehash these discussions, at least 
not here.


Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Holding a Python Language Summit at PyCon

2008-12-08 Thread A.M. Kuchling

On Sat, Dec 06, 2008 at 02:42:38PM -0800, Brett Cannon wrote:
> No, I am saying I had told AMK I was interested in championing the
> session. He chose you, and that's that. One less thing for me to worry
> about. =)

Brett, I actually think you'd be a good champion for the 11AM
transition-planning session.  As a reminder, the topics came up with
were:

Transition plan for rest of 2.x series; goals for 2.7/3.1.
- New features & future plans?
- Is 2.7 last of the 2.x releases?
- Unicode issues
- Stdlib plans?

(Possibly this is too much material for one session, and something
will have to be pruned.)

--amk
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-08 Thread Alexander Belopolsky

On Mon, Dec 8, 2008 at 6:25 PM, Antoine Pitrou <[EMAIL PROTECTED]> wrote:
..
>> Alexander's suggestion of going and looking at what the numpy folks have
>> done in this area is probably a good idea too.
>
> Well, I'm open to others doing this, but I won't do it myself. My interest is 
> in
> fixing the most glaring bugs of the buffer API and memoryview object. The 
> numpy
> folks are welcome to voice their opinions and give advice on python-dev.
>

I did not follow numpy development for the last year or more, so I
won't qualify as "the numpy folks," but my understanding is that numpy
does exactly what Nick recommended: the viewed object owns shape and
strides just as it owns the data.  The viewing object increases the
reference count of the viewed object and thus assures that data, shape
and strides don't go away prematurely.

I am copying Travis, the author of the PEP 3118, hoping that he would
step in on behalf of "the numpy folks."
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Holding a Python Language Summit at PyCon

2008-12-08 Thread Brett Cannon

On Mon, Dec 8, 2008 at 18:53, A.M. Kuchling <[EMAIL PROTECTED]> wrote:
> On Sat, Dec 06, 2008 at 02:42:38PM -0800, Brett Cannon wrote:
>> No, I am saying I had told AMK I was interested in championing the
>> session. He chose you, and that's that. One less thing for me to worry
>> about. =)
>
> Brett, I actually think you'd be a good champion for the 11AM
> transition-planning session.

OK, so I guess I do have one more thing to worry about. =) I'd be
happy to do that session.

> As a reminder, the topics came up with
> were:
>
> Transition plan for rest of 2.x series; goals for 2.7/3.1.
> - New features & future plans?
> - Is 2.7 last of the 2.x releases?
> - Unicode issues
> - Stdlib plans?

Probably the last two will be wishy-washy in terms of whether they
will be reached.

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Allocation of shape and strides fields in Py_buffer

2008-12-08 Thread Greg Ewing


Antoine Pitrou wrote:


(of course complex schemes can be devised where the callee maintains its own
separate storage for shape and strides, but I don't think we want to go there)


But that's exactly where you're supposed to be going.
If the object providing the buffer has variable-sized
shape and strides arrays, it has to manage the memory
for them somehow.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-3.0, unicode, and os.environ

2008-12-08 Thread Glenn Linderman

On approximately 12/8/2008 9:30 AM, came the following characters from 
the keyboard of [EMAIL PROTECTED]:



If warnings were emitted, then files would not be silently ignored,
yet the program could still be used.



Yep, this is sounding useful.



PS: I'd like to see a similar warning issued when an access attempt
is made through os.environ to a variable that cannot be decoded.



And argv ?  Seems like the warning technique could be useful for _any_ 
interface that has been traditionally bytes, because that's the kind of 
characters that were, but now should move to (Unicode) characters.


The warnings could be the same, or very similar.

The question is if one global control should handle all types of bytes 
problems, or if there should be individual controls for each bytes 
problem, or both.  I tend to believe in both; the paranoid can set 
exactly the ones they've coded for, the aggressive can set the global 
one.  In this manner, new cases can be added to the global settings over 
time, if more are discovered -- it should be documented to handle future 
similar issues in a similar manner.



--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

63 matches

Mail list logo