On Sep 7, 2014, at 7:14 PM, exar...@twistedmatrix.com wrote:

> On 01:26 am, wolfgang....@rohdewald.de wrote:
>> The porting guide says
>> 
>> No byte paths in sys.path.
> 
> What porting guide is that?
>> 
>> doc for FilePath says
>>   On both Python 2 and Python 3, paths can only be bytes.
>> 
>> 
>> I stumbled upon this while trying to find out how much work it might be
>> to make bin/trial run with python3
>> 
>> admin/run-python3-tests already passes for all twisted.spread related
>> tests but I still need to clean up a lot.
>> 
>> after adding an assert to FilePath.__init__, python3 bin/trial ... gives
>> 
>> File "/home/wr/ssdsrc/Twisted/twisted/scripts/trial.py", line 601, in run
>>   config.parseOptions()
>> File "/home/wr/ssdsrc/Twisted/twisted/python/usage.py", line 277, in 
>> parseOptions
>>   self.postOptions()
>> File "/home/wr/ssdsrc/Twisted/twisted/scripts/trial.py", line 472, in 
>> postOptions
>>   _BasicOptions.postOptions(self)
>> File "/home/wr/ssdsrc/Twisted/twisted/scripts/trial.py", line 382, in 
>> postOptions
>>   self['reporter'] = self._loadReporterByName(self['reporter'])
>> File "/home/wr/ssdsrc/Twisted/twisted/scripts/trial.py", line 369, in 
>> _loadReporterByName
>>   for p in plugin.getPlugins(itrial.IReporter):
>> File "/home/wr/ssdsrc/Twisted/twisted/plugin.py", line 209, in getPlugins
>>   allDropins = getCache(package)
>> File "/home/wr/ssdsrc/Twisted/twisted/plugin.py", line 134, in getCache
>>   mod = getModule(module.__name__)
>> File "/home/wr/ssdsrc/Twisted/twisted/python/modules.py", line 781, in 
>> getModule
>>   return theSystemPath[moduleName]
>> File "/home/wr/ssdsrc/Twisted/twisted/python/modules.py", line 702, in 
>> __getitem__
>>   self._findEntryPathString(moduleObject)),
>> File "/home/wr/ssdsrc/Twisted/twisted/python/modules.py", line 627, in 
>> _findEntryPathString
>>   if _isPackagePath(FilePath(topPackageObj.__file__)):
>> File "/home/wr/ssdsrc/Twisted/twisted/python/filepath.py", line 664, in 
>> __init__
>>   assert isinstance(path, bytes), 'path must be bytes: %r' % (path,)
>> AssertionError: path must be bytes: 
>> '/home/wr/ssdsrc/Twisted/twisted/__init__.py'
> 
> If paths are being represented using unicode somewhere and you want to use 
> them with FilePath then you have to encode them (or you have to add unicode 
> path support to FilePath and let FilePath encode them).
> 
> Unfortunately it's not entirely obvious how to make FilePath support unicode 
> paths since not all platforms Twisted supports represent filesystem paths 
> using unicode.
> 
> The choice python-dev made to bridge this gap was the creation of the 
> "surrogateescape" error handler for the UTC-8 codec.  This lets you pretend 
> that any time you need to convert between bytes and unicode the correct codec 
> is UTF-8 (with this special error handler).
> 
> It's not clear this was a good choice (since the result is unicode strings 
> that may contain garbage which will confuse other software) but it's also not 
> clear it's possible for Twisted to try to make any other choice (at some 
> point Twisted has to interoperate with the path-related APIs in Python itself 
> - `sys.path`, for example).
> 
> Not sure if that helps you at all.  Maybe it outlines the problem a little 
> more clearly, at least.

The problem with making FilePath support unicode is that we want to provide an 
interface that applications can rely upon, specified in terms of specific types 
(bytes or text) so that when you get an IFilePath you know what you can do with 
it.

As it is currently implemented, FilePath exposes its internal representation 
fairly directly, most notably as the ‘.path’ attribute, but also in the 
return-type of methods like "basename" and "segmentsFrom".

FilePath doesn't exactly "support" unicode, in that it's specifically 
documented not to, but it's sort of hard to tell, since you can instantiate one 
with a unicode string in both python 2 and python 3, and get (apparently) 
correct results out of it for some methods.  However, methods that need a 
string constant as part of their implementation, like siblingExtensionSearch 
and globChildren, will break unceremoniously when presented with unicode.

Another decision that python-dev made to bridge the gap was to randomly allow 
different string types be passed to platform APIs, like this:

>>> import os
>>> os.listdir(u".")
['a', 'b', 'c']
>>> os.listdir(b".")
[b'a', b'b', b'c']
>>> os.path.basename(b".")
b'.'
>>> os.path.basename(".")
'.'

This implies a parallel structure might be possible for FilePath: if you pass 
its constructor bytes, you get a BytesFilePath; if you pass it text, you get a 
TextFilePath.  You can't mix the two, and once you've chosen a path you can't 
choose a different one.

IFilePath could then document that all of its existing methods have the return 
type of "whatever got passed to __init__" (which is what the current 
implementation does about 2/3 of the time anyway on py3, and about 9/10 of the 
time on py2; we would just be making it work intentionally, all the way).

But, it would then be possible to give BytesFilePath a "asText()" method and 
vice versa "asBytes()" - since it's the filesystem, metadata about encodings 
exists outside your program and you would not need to guess at encodings, you'd 
simply indicate what return value you'd like from methods like .basename() et. 
al.

The more I think about this, the more I like it - it's a bit of annoying and 
subtle implementation work, but I think it would supply the behavior that most 
people want, remain compatible with most of the existing unspecified behavior, 
and it would address clean text/bytes separation without having a giant 
deprecation cycle and inventing a new interface.  It's also the sort of 
implementation work which, after some discussion and consideration, we could be 
reasonably sure is *correct* rather than guessing at things.

Thoughts?

-glyph


_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to