date:20160823

Re: [Python-Dev] File system path encoding on Windows

2016-08-23 Thread Stephen J. Turnbull

eryk sun writes:

 > I just wrote a simple function to enumerate the 822 system locales on
 > my Windows box (using EnumSystemLocalesEx and GetLocaleInfoEx, which
 > are Unicode-only functions), and 36.7% of them lack an ANSI codepage.
 > They're Unicode-only locales. UTF-8 is the only way to support these
 > locales with a bytes API.

Are the users of those locales banging on our door demanding such an API?

Apparently not; such banging would have resulted in a patch.  (That's
how you know it's a bang and not a whimper!)  Instead, Steve had to
volunteer one.

Pragmatically, I don't see anyone rushing to *supply* bytes-oriented
APIs, bytes-oriented networking stacks, or bytes-oriented applications
to the Windows world.  I doubt there are all that many purely bytes-
oriented libraries out there that are plug-compatible with existing
Windows libraries of similar functionality, and obviously superior.
So somebody's going to have to do some work to exploit this new
feature.

Who, and when?  If the answers are "uh, I dunno" and "eventually",
what's the big rush?  Making it possible to test such software on
Windows in the public release version of Python should be our goal for
3.6.  We can do that with an option to set the default codecs to
'utf-8', and the default being the backward-compatible 'mbcs'.  How we
deal with the existing deprecation, I don't really care ("now is
better than never", and everything currently on the table will need a
policy kludge).

If in 9 months after release of 3.6, there are apps targeting Windows
and using UTF-8 bytes APIs in beta (or nearing it), then we have
excellent reason to default to 'utf-8' for 3.7.

And of course the patch eliminating use of the *A APIs with their lack
of error-handling deserves nothing but a round of applause!

Steve
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 525

2016-08-23 Thread Yury Selivanov


Hi,

I think it's time to discuss PEP 525 on python-dev (pasted below).

There were no changes in the PEP since I posted it to python-ideas
a couple of weeks ago.

One really critical thing that will block PEP acceptance is
asynchronous generators (AG) finalization.  The problem is
to provide a reliable way to correctly close all AGs on program
shutdown.

To recap: PEP 492 requires an event loop or a coroutine runner
to run coroutines.  PEP 525 defines AGs using asynchronous
iteration protocol, also defined in PEP 492.  AGs require an
'async for' statement to iterate over them, which in turn can
only be used in a coroutine.  Therefore, AGs also require an
event loop or a coroutine runner to operate.

The finalization problem is related to partial iteration.
For instance, let's look at an ordinary synchronous generator:

  def gen():
try:
  while True:
yield 42
finally:
  print("done")

  g = gen()
  next(g)
  next(g)
  del g

In the above example, when 'g' is GCed, the interpreter will
try to close the generator.  It will do that by throwing a
GeneratorExit exception into 'g', which would trigger the 'finally'
statement.

For AGs we have a similar problem.  Except that they can have
`await` expressions in their `finally` statements, which means
that the interpreter can't close them on its own.  An event
loop is required to run an AG, and an event loop is required to
close it correctly.

To enable correct AGs finalization, PEP 525 proposes to add a
`sys.set_asyncgen_finalizer` API.  The idea is to have a finalizer
callback assigned to each AG, and when it's time to close the AG,
that callback will be called.  The callback will be installed by
the event loop (or coroutine runner), and should schedule a
correct asynchronous finalization of the AG (remember, AGs can
have 'await' expressions in their finally statements).

The problem with 'set_asyncgen_finalizer' is that the event loop
doesn't know about AGs until they are GCed.  This can be a problem
if we want to write a program that gracefully closes all AGs
when the event loop is being closed.

There is an alternative approach to finalization of AGs: instead
of assigning a finalizer callback to an AG, we can add an API to
intercept AG first iteration.  That would allow event loops to
have weak references to all AGs running under their control:

1. that would make it possible to intercept AGs garbage collection
similarly to the currently proposed set_asyncgen_finalizer

2. it would also allow us to implement 'loop.shutdown' coroutine,
which would try to asynchronously close all open AGs.

The second approach gives event loops more control and allows to
implement APIs to collect open resources gracefully.  The only
downside is that it's a bit harder for event loops to work with.

Let's discuss.


PEP: 525
Title: Asynchronous Generators
Version: $Revision$
Last-Modified: $Date$
Author: Yury Selivanov 
Discussions-To: 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 28-Jul-2016
Python-Version: 3.6
Post-History: 02-Aug-2016


Abstract


PEP 492 introduced support for native coroutines and ``async``/``await``
syntax to Python 3.5.  It is proposed here to extend Python's
asynchronous capabilities by adding support for
*asynchronous generators*.


Rationale and Goals
===

Regular generators (introduced in PEP 255) enabled an elegant way of
writing complex *data producers* and have them behave like an iterator.

However, currently there is no equivalent concept for the *asynchronous
iteration protocol* (``async for``).  This makes writing asynchronous
data producers unnecessarily complex, as one must define a class that
implements ``__aiter__`` and ``__anext__`` to be able to use it in
an ``async for`` statement.

Essentially, the goals and rationale for PEP 255, applied to the
asynchronous execution case, hold true for this proposal as well.

Performance is an additional point for this proposal: in our testing of
the reference implementation, asynchronous generators are **2x** faster
than an equivalent implemented as an asynchronous iterator.

As an illustration of the code quality improvement, consider the
following class that prints numbers with a given delay once iterated::

class Ticker:
"""Yield numbers from 0 to `to` every `delay` seconds."""

def __init__(self, delay, to):
self.delay = delay
self.i = 0
self.to = to

def __aiter__(self):
return self

async def __anext__(self):
i = self.i
if i >= self.to:
raise StopAsyncIteration
self.i += 1
if i:
await asyncio.sleep(self.delay)
return i


The same can be implemented as a much simpler asynchronous generator::

async def ticker(delay, to):
"""Yield numbers from 0 to `to` every `delay` seconds."""
for i in range(to):
yield i
a

Re: [Python-Dev] File system path encoding on Windows

2016-08-23 Thread Steve Dower

I've trimmed fairly aggressively for the sake of not causing the rest of 
the list to mute our discussion (again :) ). Stephen - feel free to 
email me off list if I go too far or misrepresent you.

As a summary for people who don't want to read on (and Stephen will 
correct me if I misquote):

* we agree on removing use of the *A APIs within Python, which means 
Python will have to decode bytes before passing them to the operating system
* we agree on allowing users to switch the encoding between utf-8 and 
mbcs:replace (the current default)
* we agree on making utf-8 the default for 3.6.0b1 and closely 
monitoring the reaction
* Stephen sees "no reason not to change locale.getpreferredencoding()" 
(default encoding for open()) at the same time with the same switches, 
while I'm not quite as confident. Do users generally specify an encoding 
these days? I know I always put utf-8 there.

Does anyone else have concerns or questions?

On 22Aug2016 2121, Stephen J. Turnbull wrote:

UTF-8 is absolutely not equivalent to UTF-16 from the point of view of
developers. Passing it to Windows APIs requires decoding to UTF-16 (or
from a Python developer's point of view, decoding to str and use of
str APIs).  That fact is what got you started on this whole proposal!

As encoded bytes, that's true, but as far as correctly encoding text, 
they are equivalent.

 > All MSVC users have been pushed towards Unicode for many years.

But that "push" is due to the use of UTF-16-based *W APIs and
deprecation of ACP-based *A APIs, right?  The input to *W APIs must be
decoded from all text/* content "out there", including UTF-8 content.
I don't see evidence that users have been pushed toward *UTF-8* in that
statement; they may be decoding from something else.  Unicode != UTF-8
for our purposes!

Yes, the operating system pushes people towards *W APIs, and the 
languages commonly used on that operating system follow.

Windows has (for as long as it matters) always been UTF-16 for paths and 
bytes for content. Nowhere does the operating system tell you how to 
read your text file except as raw bytes, and content types are meant to 
provide the encoding information you need. Languages each determine how 
to read files in "text" mode, but that's not bound to or enforced by the 
operating system in any way.

 > The .NET Framework has defaulted to UTF-8

Default != enforce, though.  Do you know that almost nobody changes
the default, and that behavior is fairly uniform across different
classes of organization (specifically by language)?  Or did you mean
"enforce"?

This will also not enforce anything that the operating system doesn't 
enforce. Windows uses Unicode to represent paths and requires them to be 
passed as UTF-16 encoded bytes. If you don't do that, it'll convert for 
you. My proposal is for Python to do the conversion instead.

(In .NET, users have to decode a byte array if they want to get a 
string. There aren't any APIs that take byte[] as if it were text, so 
it's basically the same separation between bytes/str that Python 3 
introduced, except without any allowance for bytes to still be used in 
places where text is needed.)

To be clear: asking users who want backward-compatible behavior to set
an environment variable does not count as a "screw" -- some will
complain, but "the defaults always suck for somebody".  Reasonable
people know that, and we can't do anything about the hysterics.

Good. Glad we agree on this.

1.  Organizations which behave like ".NET users" already have pure
UTF-8 environments.  They win from Python defaulting to UTF-8,
since Windows won't let them do it for themselves.  Now they can
plug in bytes-oriented code written for the POSIX environment
straight from upstream.

Is that correct?  Ie, without transcoding, they can't now use
bytes because their environment hands them UTF-8 but when Python
hands those bytes to Windows, it assumes anything else but UTF-8?

If you give Windows anything but UTF-16 as a path, it will convert to 
UTF-16. The change is to convert to UTF-16 ourselves, so Windows will 
never see the original bytes. To do that conversion, we need to know 
what encoding the incoming bytes are encoded with.

Python users will either transcode from bytes in encoding X to str, 
transcode from bytes in encoding X to bytes in UTF-8, or keep their 
bytes in UTF-8 if that's how they started.

(I feel like there's some other misunderstanding going on here, because 
I know you understand how encoding works, but I can't figure out what it 
is or what I need to say to trigger clarity. :( )

Windows does not support using UTF-8 encoded bytes as text. UTF-16 is 
the universal encoding. (Basically the only thing you can reliably do 
with UTF-8 bytes in the Windows API is convert them to UTF-16 - see the 
MultiByteToWideChar function. Everything else just treats it like a blob 
of meaningless data.)

BTW, I wonder how those organizations manage to

Re: [Python-Dev] PEP 525

2016-08-23 Thread Rajiv Kumar

Hi Yury,

I was playing with your implementation to gain a better understanding of
the operation of asend() and friends. Since I was explicitly trying to
"manually" advance the generators, I wasn't using asyncio or other event
loop. This meant that the first thing I ran into with my toy code was the
RuntimeError ("cannot iterate async generator without finalizer set").

As you have argued elsewhere, in practice the finalizer is likely to be set
by the event loop. Since the authors of event loops are likely to know that
they should set the finalizer, would it perhaps be acceptable to merely
issue a warning instead of an error if the finalizer is not set? That way
there isn't an extra hoop to jump through for simple examples.

In my case, I just called
sys.set_asyncgen_finalizer(lambda g: 1)
to get around the error and continue playing :) (I realize that's a bad
thing to do but it didn't matter for the toy cases)

- Rajiv
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] File system path encoding on Windows

2016-08-23 Thread Stephen J. Turnbull

Steve Dower writes:

 > * Stephen sees "no reason not to change locale.getpreferredencoding()" 
 > (default encoding for open()) at the same time with the same switches, 
 > while I'm not quite as confident. Do users generally specify an encoding 
 > these days? I know I always put utf-8 there.

I was insufficiently specific.  "No reason not to" depends on separate
switches for file system encoding and preferred encoding.  That makes
things somewhat more complicated for implementation, and significantly
so for users.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] File system path encoding on Windows

[Python-Dev] PEP 525

Re: [Python-Dev] File system path encoding on Windows

Re: [Python-Dev] PEP 525

Re: [Python-Dev] File system path encoding on Windows

5 matches

Site Navigation

Mail list logo

Footer information