[Python-Dev] Bytes path support

2014-08-19 Thread Serhiy Storchaka
Builting open(), io classes, os and os.path functions and some other 
functions in the stdlib support bytes paths as well as str paths. But 
many functions doesn't. There are requests about adding this support 
([1], [2]) in some modules. It is easy (just call os.fsdecode() on 
argument) but I'm not sure it is worth to do. Pathlib doesn't support 
bytes path and it looks intentional. What is general policy about 
support of bytes path in the stdlib?


[1] http://bugs.python.org/issue19997
[2] http://bugs.python.org/issue20797

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray

2014-08-19 Thread Nick Coghlan
On 18 August 2014 10:45, Guido van Rossum  wrote:
> On Sun, Aug 17, 2014 at 5:22 PM, Barry Warsaw  wrote:
>>
>> On Aug 18, 2014, at 10:08 AM, Nick Coghlan wrote:
>>
>> >There's actually another aspect to your idea, independent of the naming:
>> >exposing a view rather than just an iterator. I'm going to have to look
>> > at
>> >the implications for memoryview, but it may be a good way to go (and
>> > would
>> >align with the iterator -> view changes in dict).
>>
>> Yep!  Maybe that will inspire a better spelling. :)
>
>
> +1. It's just as much about b[i] as it is about "for c in b", so a view
> sounds right. (The view would have to be mutable for bytearrays and for
> writable memoryviews.)
>
> On the rest, it's sounding more and more as if we will just need to live
> with both bytes(1000) and bytearray(1000). A warning sounds worse than a
> deprecation to me.

I'm fine with keeping bytearray(1000), since that works the same way
in both Python 2 & 3, and doesn't seem likely to be invoked
inadvertently.

I'd still like to deprecate "bytes(1000)", since that does different
things in Python 2 & 3, while "b'\x00' * 1000" does the same thing in
both.

$ python -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))'
'10'
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
$ python3 -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))'
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

Hitting the deprecation warning in single-source code would seem to be
a strong hint that you have a bug in one version or the other rather
than being intended behaviour.

> bytes.zeros(n) sounds fine to me; I value similar interfaces for bytes and
> bytearray pretty highly.

With "bytearray(1000)" sticking around indefinitely, I'm less
concerned about adding a "zeros" constructor.

> I'm lukewarm on bytes.byte(c); but bytes([c]) does bother me because a size
> one list is (or at least feels) more expensive to allocate than a size one
> bytes object. So, okay.

So, here's an interesting thing I hadn't previously registered: we
actually already have a fairly capable "bytesview" option, and have
done since Stefan implemented "memoryview.cast" in 3.3. The trick lies
in the 'c' format character for the struct module, which is parsed as
a length 1 bytes object rather than as an integer:

>>> data = bytearray(b"Hello world")
>>> bytesview = memoryview(data).cast('c')
>>> list(bytesview)
[b'H', b'e', b'l', b'l', b'o', b' ', b'w', b'o', b'r', b'l', b'd']
>>> b''.join(bytesview)
b'Hello world'
>>> bytesview[0:5] = memoryview(b"olleH").cast('c')
>>> list(bytesview)
[b'o', b'l', b'l', b'e', b'H', b' ', b'w', b'o', b'r', b'l', b'd']
>>> b''.join(bytesview)
b'olleH world'

For the read-only case, it covers everything (iteration, indexing,
slicing), for the writable view case, it doesn't cover changing the
shape of the target array, and it doesn't cover assigning arbitrary
buffer objects (you need to wrap them in a similar cast for memoryview
to allow the assignment).

It's hardly the most *intuitive* spelling though - I was one of the
reviewers for Stefan's memoryview rewrite back in 3.3, and I only made
the connection today when looking to see how a view object like the
one we were discussing elsewhere in the thread might be implemented as
a facade over arbitrary memory buffers, rather than being specific to
bytes and bytearray.

If we went down the "bytesview" path, then a single new facade would
cover not only the 3 builtins (bytes, bytearray, memoryview) but also
any *other* buffer exporting type. If we so chose (at some point in
the future, not as part of this PEP), such a type could allow
additional bytes operations (like "count", "startswith" or "index") to
be applied to arbitrary regions of memory without making a copy. We
can't add those other operations to memoryview, since they don't make
sense for an n-dimensional array.

Regards,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray

2014-08-19 Thread Guido van Rossum
On Tue, Aug 19, 2014 at 5:25 AM, Nick Coghlan  wrote:

> On 18 August 2014 10:45, Guido van Rossum  wrote:
> > On Sun, Aug 17, 2014 at 5:22 PM, Barry Warsaw  wrote:
> >>
> >> On Aug 18, 2014, at 10:08 AM, Nick Coghlan wrote:
> >>
> >> >There's actually another aspect to your idea, independent of the
> naming:
> >> >exposing a view rather than just an iterator. I'm going to have to look
> >> > at
> >> >the implications for memoryview, but it may be a good way to go (and
> >> > would
> >> >align with the iterator -> view changes in dict).
> >>
> >> Yep!  Maybe that will inspire a better spelling. :)
> >
> >
> > +1. It's just as much about b[i] as it is about "for c in b", so a view
> > sounds right. (The view would have to be mutable for bytearrays and for
> > writable memoryviews.)
> >
> > On the rest, it's sounding more and more as if we will just need to live
> > with both bytes(1000) and bytearray(1000). A warning sounds worse than a
> > deprecation to me.
>
> I'm fine with keeping bytearray(1000), since that works the same way
> in both Python 2 & 3, and doesn't seem likely to be invoked
> inadvertently.
>
> I'd still like to deprecate "bytes(1000)", since that does different
> things in Python 2 & 3, while "b'\x00' * 1000" does the same thing in
> both.
>

I think any argument based on what "bytes" does in Python 2 is pretty weak,
since Python 2's bytes is just an alias for str, so it has tons of behavior
that differ -- why single this out?

In Python 3, I really like bytes and bytearray to be as similar as
possible, and that includes the constructor.


> $ python -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))'
> '10'
> '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
> $ python3 -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))'
> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>
> Hitting the deprecation warning in single-source code would seem to be
> a strong hint that you have a bug in one version or the other rather
> than being intended behaviour.
>
> > bytes.zeros(n) sounds fine to me; I value similar interfaces for bytes
> and
> > bytearray pretty highly.
>
> With "bytearray(1000)" sticking around indefinitely, I'm less
> concerned about adding a "zeros" constructor.
>

That's fine.


>  > I'm lukewarm on bytes.byte(c); but bytes([c]) does bother me because a
> size
> > one list is (or at least feels) more expensive to allocate than a size
> one
> > bytes object. So, okay.
>
> So, here's an interesting thing I hadn't previously registered: we
> actually already have a fairly capable "bytesview" option, and have
> done since Stefan implemented "memoryview.cast" in 3.3. The trick lies
> in the 'c' format character for the struct module, which is parsed as
> a length 1 bytes object rather than as an integer:
>
> >>> data = bytearray(b"Hello world")
> >>> bytesview = memoryview(data).cast('c')
> >>> list(bytesview)
> [b'H', b'e', b'l', b'l', b'o', b' ', b'w', b'o', b'r', b'l', b'd']
> >>> b''.join(bytesview)
> b'Hello world'
> >>> bytesview[0:5] = memoryview(b"olleH").cast('c')
> >>> list(bytesview)
> [b'o', b'l', b'l', b'e', b'H', b' ', b'w', b'o', b'r', b'l', b'd']
> >>> b''.join(bytesview)
> b'olleH world'
>
> For the read-only case, it covers everything (iteration, indexing,
> slicing), for the writable view case, it doesn't cover changing the
> shape of the target array, and it doesn't cover assigning arbitrary
> buffer objects (you need to wrap them in a similar cast for memoryview
> to allow the assignment).
>
> It's hardly the most *intuitive* spelling though - I was one of the
> reviewers for Stefan's memoryview rewrite back in 3.3, and I only made
> the connection today when looking to see how a view object like the
> one we were discussing elsewhere in the thread might be implemented as
> a facade over arbitrary memory buffers, rather than being specific to
> bytes and bytearray.
>

Maybe the 'future' package can offer an iterbytes or bytesview implemented
this way?


> If we went down the "bytesview" path, then a single new facade would
> cover not only the 3 builtins (bytes, bytearray, memoryview) but also
> any *other* buffer exporting type. If we so chose (at some point in
> the future, not as part of this PEP), such a type could allow
> additional bytes operations (like "count", "startswith" or "index") to
> be applied to arbitrary regions of memory without making a copy.


Why call out "without making a copy" for operations that naturally don't
have to copy anything?


> We
> can't add those other operations to memoryview, since they don't make
> sense for an n-dimensional array.
>

I'm sorry for your efforts, but I'm getting more and more lukewarm about
the entire PEP.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40

Re: [Python-Dev] Bytes path support

2014-08-19 Thread Guido van Rossum
The official policy is that we want them to go away, but reality so far has
not budged. We will continue to hold our breath though. :-)


On Tue, Aug 19, 2014 at 1:37 AM, Serhiy Storchaka 
wrote:

> Builting open(), io classes, os and os.path functions and some other
> functions in the stdlib support bytes paths as well as str paths. But many
> functions doesn't. There are requests about adding this support ([1], [2])
> in some modules. It is easy (just call os.fsdecode() on argument) but I'm
> not sure it is worth to do. Pathlib doesn't support bytes path and it looks
> intentional. What is general policy about support of bytes path in the
> stdlib?
>
> [1] http://bugs.python.org/issue19997
> [2] http://bugs.python.org/issue20797
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Ben Hoyt
> The official policy is that we want them [support for bytes paths in stdlib 
> functions] to go away, but reality so far has not budged. We will continue to 
> hold our breath though. :-)

Does that mean that new APIs should explicitly not support bytes? I'm
thinking of os.scandir() (PEP 471), which I'm implementing at the
moment. I was originally going to make it support bytes so it was
compatible with listdir, but maybe that's a bad idea. Bytes paths are
essentially broken on Windows.

-Ben

> On Tue, Aug 19, 2014 at 1:37 AM, Serhiy Storchaka  wrote:
>>
>> Builting open(), io classes, os and os.path functions and some other 
>> functions in the stdlib support bytes paths as well as str paths. But many 
>> functions doesn't. There are requests about adding this support ([1], [2]) 
>> in some modules. It is easy (just call os.fsdecode() on argument) but I'm 
>> not sure it is worth to do. Pathlib doesn't support bytes path and it looks 
>> intentional. What is general policy about support of bytes path in the 
>> stdlib?
>>
>> [1] http://bugs.python.org/issue19997
>> [2] http://bugs.python.org/issue20797
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Serhiy Storchaka

19.08.14 20:02, Guido van Rossum написав(ла):

The official policy is that we want them to go away, but reality so far
has not budged. We will continue to hold our breath though. :-)


Does it mean that we should reject all propositions about adding bytes 
path support in existing functions (in particular issue19997 (imghdr) 
and issue20797 (zipfile))?



___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Benjamin Peterson


On Tue, Aug 19, 2014, at 10:31, Ben Hoyt wrote:
> > The official policy is that we want them [support for bytes paths in stdlib 
> > functions] to go away, but reality so far has not budged. We will continue 
> > to hold our breath though. :-)
> 
> Does that mean that new APIs should explicitly not support bytes? I'm
> thinking of os.scandir() (PEP 471), which I'm implementing at the
> moment. I was originally going to make it support bytes so it was
> compatible with listdir, but maybe that's a bad idea. Bytes paths are
> essentially broken on Windows.

Bytes paths are "essential" on Unix, though, so I don't think we should
create new low-level APIs that don't support bytes.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Ben Hoyt
>> > The official policy is that we want them [support for bytes paths in 
>> > stdlib functions] to go away, but reality so far has not budged. We will 
>> > continue to hold our breath though. :-)
>>
>> Does that mean that new APIs should explicitly not support bytes? I'm
>> thinking of os.scandir() (PEP 471), which I'm implementing at the
>> moment. I was originally going to make it support bytes so it was
>> compatible with listdir, but maybe that's a bad idea. Bytes paths are
>> essentially broken on Windows.
>
> Bytes paths are "essential" on Unix, though, so I don't think we should
> create new low-level APIs that don't support bytes.

Fair enough. I don't quite understand, though -- why is the "official
policy" to kill something that's "essential" on *nix?

-Ben
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/19/2014 01:43 PM, Ben Hoyt wrote:
 The official policy is that we want them [support for bytes
 paths in stdlib functions] to go away, but reality so far has
 not budged. We will continue to hold our breath though. :-)
>>> 
>>> Does that mean that new APIs should explicitly not support bytes?
>>> I'm thinking of os.scandir() (PEP 471), which I'm implementing at
>>> the moment. I was originally going to make it support bytes so it
>>> was compatible with listdir, but maybe that's a bad idea. Bytes
>>> paths are essentially broken on Windows.
>> 
>> Bytes paths are "essential" on Unix, though, so I don't think we
>> should create new low-level APIs that don't support bytes.
> 
> Fair enough. I don't quite understand, though -- why is the "official 
> policy" to kill something that's "essential" on *nix?

ISTM that the policy is based on a fantasy that "it looks like text to me
in my use cases, so therefore it must be text for everyone."


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  [email protected]
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlPzj8AACgkQ+gerLs4ltQ6AjACgzSC6kBXssnzNhVTdahWIi48u
5SwAn3+ytO/bh1YrVzCbVJqU/wIs7WiA
=qGLR
-END PGP SIGNATURE-

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Benjamin Peterson


On Tue, Aug 19, 2014, at 10:43, Ben Hoyt wrote:
> >> > The official policy is that we want them [support for bytes paths in 
> >> > stdlib functions] to go away, but reality so far has not budged. We will 
> >> > continue to hold our breath though. :-)
> >>
> >> Does that mean that new APIs should explicitly not support bytes? I'm
> >> thinking of os.scandir() (PEP 471), which I'm implementing at the
> >> moment. I was originally going to make it support bytes so it was
> >> compatible with listdir, but maybe that's a bad idea. Bytes paths are
> >> essentially broken on Windows.
> >
> > Bytes paths are "essential" on Unix, though, so I don't think we should
> > create new low-level APIs that don't support bytes.
> 
> Fair enough. I don't quite understand, though -- why is the "official
> policy" to kill something that's "essential" on *nix?

Well, notice the official policy is desperately *wanting* them to go
away with the implication that we grudgingly bow to reality. :)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Antoine Pitrou

Le 19/08/2014 13:43, Ben Hoyt a écrit :

The official policy is that we want them [support for bytes paths in stdlib 
functions] to go away, but reality so far has not budged. We will continue to 
hold our breath though. :-)


Does that mean that new APIs should explicitly not support bytes? I'm
thinking of os.scandir() (PEP 471), which I'm implementing at the
moment. I was originally going to make it support bytes so it was
compatible with listdir, but maybe that's a bad idea. Bytes paths are
essentially broken on Windows.


Bytes paths are "essential" on Unix, though, so I don't think we should
create new low-level APIs that don't support bytes.


Fair enough. I don't quite understand, though -- why is the "official
policy" to kill something that's "essential" on *nix?


PEP 383 should actually work on Unix quite well, AFAIR.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Marko Rauhamaa
Tres Seaver :

> On 08/19/2014 01:43 PM, Ben Hoyt wrote:
>> Fair enough. I don't quite understand, though -- why is the "official
>> policy" to kill something that's "essential" on *nix?
>
> ISTM that the policy is based on a fantasy that "it looks like text to
> me in my use cases, so therefore it must be text for everyone."

What I like about Python is that it allows me to write native linux code
without having to make portability compromises that plague, say, Java. I
have select.epoll(). I have os.fork(). I have socket.TCP_CORK. The
"textualization" of Python3 seems part of a conscious effort to make
Python more Java-esque.


Marko
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Stephen J. Turnbull
Ben Hoyt writes:

 > Fair enough. I don't quite understand, though -- why is the "official
 > policy" to kill something that's "essential" on *nix?

They're not essential on *nix.  Unix paths at the OS level are "just
bytes" (even on Mac, although the most common Mac filesystem does
enforce UTF-8 Unicode NFD).  This use case is now perfectly well
served by codecs.

However, there are a lot of applications that involve reading a file
name from a directory, and passing it verbatim to another OS
function.  This case can be handled now using the surrogateescape
error handler, but when these APIs were introduced we didn't even have
a reliable way to roundtrip filenames because a Unix filename doesn't
need to be a string of characters from *any* character set.

And there's the undeniable convenience of treating file names as
opaque objects in those applications.

Regards,

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Greg Ewing

Ben Hoyt wrote:

Does that mean that new APIs should explicitly not support bytes? 

> ... Bytes paths are essentially broken on Windows.

But on Unix, paths are essentially bytes. What's the
official policy for dealing with that?

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Greg Ewing

Stephen J. Turnbull wrote:


This case can be handled now using the surrogateescape
error handler,


So maybe the way to make bytes paths go away is to always
use surrogateescape for paths on unix?

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Guido van Rossum
I'm sorry my moment of levity was taken so seriously.

With my serious hat on, I would like to claim that *conceptually* filenames
are most definitely text. Due to various historical accidents the UNIX
system calls often encoded text as arguments, and we sometimes need to
control that encoding. Hence the occasional need for bytes arguments. But
most of the time you don't have to think about that, and forcing users to
worry about it is mostly as counter-productive as forcing to think about
the encoding of every text file.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Stephen J. Turnbull
Greg Ewing writes:
 > Stephen J. Turnbull wrote:
 > 
 > > This case can be handled now using the surrogateescape
 > > error handler,
 > 
 > So maybe the way to make bytes paths go away is to always
 > use surrogateescape for paths on unix?

Backward compatibility rules that out, I think.  I certainly would
recommend that for new code, but even for new code there are many
users who vehemently object to using Unicode as an intermediate
representation of things they think of as binary blobs.  Not worth the
hassle to even seriously propose removing those APIs IMO.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Guido van Rossum
On Tuesday, August 19, 2014, Stephen J. Turnbull  wrote:

> Greg Ewing writes:
>  > Stephen J. Turnbull wrote:
>  >
>  > > This case can be handled now using the surrogateescape
>  > > error handler,
>  >
>  > So maybe the way to make bytes paths go away is to always
>  > use surrogateescape for paths on unix?
>
> Backward compatibility rules that out, I think.  I certainly would
> recommend that for new code, but even for new code there are many
> users who vehemently object to using Unicode as an intermediate
> representation of things they think of as binary blobs.  Not worth the
> hassle to even seriously propose removing those APIs IMO.


But maybe we don't have to add new ones?

--Guido


-- 
--Guido van Rossum (on iPad)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Marko Rauhamaa
Guido van Rossum :

> With my serious hat on, I would like to claim that *conceptually*
> filenames are most definitely text. Due to various historical
> accidents the UNIX system calls often encoded text as arguments, and
> we sometimes need to control that encoding.

Due to historical accidents, text (in the Python sense) is not a
first-class data type in Unix. Text, machine language, XML, Python etc
are interpretations of bytes. Bytes are the first-class data type
recognized by the kernel. That reality cannot be wished away.

> Hence the occasional need for bytes arguments. But most of the time
> you don't have to think about that, and forcing users to worry about
> it is mostly as counter-productive as forcing to think about the
> encoding of every text file.

The users of Python programs can often be given higher-level facades.
Unix programmers, though, shouldn't be shielded from bytes.


Marko
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Stephen J. Turnbull
Guido van Rossum writes:
 > On Tuesday, August 19, 2014, Stephen J. Turnbull  wrote:
 > > Greg Ewing writes:

 > >  > So maybe the way to make bytes paths go away is to always
 > >  > use surrogateescape for paths on unix?
 > >
 > > Backward compatibility rules that out, I think.  I certainly would
 > > recommend that for new code, but even for new code there are many
 > > users who vehemently object to using Unicode as an intermediate
 > > representation of things they think of as binary blobs.  Not worth the
 > > hassle to even seriously propose removing those APIs IMO.
 > 
 > But maybe we don't have to add new ones?

IMO, we should avoid it.

There may be some use cases.  Sergiy mentions two bug reports.

http://bugs.python.org/issue19997 imghdr.what doesn't accept bytes paths
http://bugs.python.org/issue20797 zipfile.extractall should accept bytes path 
as parameter

I'm very unsympathetic to these.  In both cases the bytes are coming
from outside of module in question.  Why are they in bytes?  That
question should scare you, because from the point of view of end users
there are no good answers: they all mean that the end user is going to
end up with uninterpretable bytes in their directories, for the
convenience of the programmer.

In the case of issue20797, I'd be a *little* sympathetic if the RFE
were for the *members* argument.  zipfiles evidently have no way to
specify the encodings of the name(s) of their members (and the zipfile
module doesn't have APIs for it!), so the programmer is kind of stuck,
especially if the requirement is that the extraction require no user
intervention.  But again, this is rarely what the user wants.

I would be sympathetic to an internal, bytes-based, "kids these stunts
are performed by trained professionals do NOT try this at home" API,
with a sane user-oriented str-based API for ordinary use for this
module.  I suppose it might be useful for such a multi-type API to be
polymorphic, but it would have to be a "if there are bytes anywhere,
everything must be bytes and return values will be bytes" and
similarly for str kind of polymorphism.  No mixing bytes and strings,
period.



___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Stephen J. Turnbull
Marko Rauhamaa writes:

 > Unix programmers, though, shouldn't be shielded from bytes.

Nobody's trying to do that.  But Python users should be shielded from
Unix programmers.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bytes path support

2014-08-19 Thread Ben Finney
"Stephen J. Turnbull"  writes:

> Marko Rauhamaa writes:
>  > Unix programmers, though, shouldn't be shielded from bytes.
>
> Nobody's trying to do that.  But Python users should be shielded from
> Unix programmers.

+1 QotW

-- 
 \“Intellectual property is to the 21st century what the slave |
  `\  trade was to the 16th.” —David Mertz |
_o__)  |
Ben Finney

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com