> On Sep 2, 2015, at 3:38 AM, Adi Roiban <a...@roiban.ro> wrote:
> 
> Hi,
> 
> While reviewing the latest patch related to porting Twisted to py3 I
> saw that many values which were supposed to hold just text were ported
> as bytes.

This is fairly ambiguous, since it depends quite closely on what exact values 
they were.

> The argument for this conversion was that Twisted is a low level
> framework and that other high level framworks like Treq or Klein
> should implement the required code so that end user can just use text.

When high-level 

> For example the HTTP response messages are not bytes, even if the RFC
> specified that they should only contain text with a single encoding.

By "response messages" do you mean

response headers (which are ASCII-punned bytes, or text with a restricted 
character set, depending on your philosophical disposition)
response bodies (which are unambiguously bytes, unless you interpret them 
according to their content-type, in which case they are arbitrary objects, one 
type of which may be text)
response status codes (which are sort of like headers, except not, except maybe 
they're integers)
response status text (which are arguably text, but do not feature an encoding, 
and are therefore in the same grey area as headers)

?

> I have little experience with twisted.web, so maybe there are many
> users of twisted.web which use binary data for response messages or
> maybe there is a good use case for putting random bytes in the HTTP
> response message.

The main use-case is the Python zen, "refuse the temptation to guess".  Treq 
and Klein can provide good default ways to interpret things, but some 
applications will need to get underneath those defaults and treat things 
differently.

> If for twisted.web there are Treq or Klein to implement the user
> friendly interfaces, I don't know what can be used for twisted.conch

Clearly we need to write some new code.

> In the ticket for porting twisted.conch.ssh.key to py3 [1] the name of
> the ssh key algorithms like 'ssh-rsa' or 'ssh-dsa', encryption
> algorithm names like AES-128-CBC and ssh key components like p, q, y,
> x, n, e are now all bytes.
> 
> Do you think that this is ok?

Yes, and here's why: those values are all enumerated constants.  They come off 
the wire as bytes, in no particular encoding (these happen to be ASCII, but is 
there a guarantee that all future algorithm names will also be?), and then they 
have to be treated specially.  A good, high-level API for this would use 
twisted.python.constants, and not bother application code with bytes or text.  
Given that what is implemented is all pretty low-level, bytes make sense.

That said, if there were a good case for ASCII being the declared encoding and 
having some authoritative sense that that's what we should use, then we should 
use it.  Except that the 2.x types already use bytes, and so we'd have to 
either go with "native str" (which is a very problematic type, and should be 
avoided for everything except Python identifiers and docstrings, or things that 
need to be processed into them.)

> Why allow or encourage people to use random bytes for fields which
> should contain human readable text?

If you encounter an entity that can read the string "ssh-rsa" and truly 
comprehend it, chances are good you are not dealing with a human.

> For HTTP response line and response headers I think that all values should be 
> text and encoded in ISO-8859-1.

> RFC 4819 [2] only talks about using US-ASCII for all names used in the
> SSH public key subsystem.
> Why use bytes to represent these names?

You might be correct according to the specification (although it remains to be 
seen if you're right as far as implementations are concerned); however, why 
would it be useful to decode these values into bytes?  Should we be processing 
them as text?  In what context?

> RFC 4716 specifies that header tags must be US-ASCII while header
> value UTF-8 ... while all IANA names are US-ASCII.... and names in the
> private namespace (name@domain) should also be US-ASCII.

> As a reviewer I don't know that is the degin/architecture choose by
> Twisted and how to review such changes.

It is not necessarily possible to rationalize every decision that has been made 
thus far as being part of one grand plan.  For one thing, many of them have 
been taken by different people.  For another, we learn things as we go along, 
and so some of the decisions made thus far are now recognized as mistakes.  So 
at this point I think it is best that you just state your preferred design and 
we discuss the pros and cons of that.

To the extent that there has been a conscious design strategy, it's something 
like this: every API needs one layer at which it needs to treat most of its 
data as bytes.  (Sadly) few Twisted APIs have nice, discrete higher layers with 
objects that represent meaningful user actions rather than protocol trivia.  So 
the existing strategy has been around making the lower levels consistently 
manipulate bytes everywhere, in the hope that we will promote these objects to 
more high-level types in a different layer later (hence the "well, in klein and 
treq..." answers).

> As a developer I prefer to have as much text as possible so that I can
> do text manipulation operation on these values and directly include
> them in logs or error messages.

OK, so the main utility of treating them as text is being able to concatenate 
them into diagnostic messages?  Can this not be done with bytes equally well?  
This is the main thing to focus on, I think: concrete useful things you could 
do with text in these places where bytes are sub-optimal.

> I assume that all the people involved in writing the RFC had a good
> reason to require those fields to be text rather than any bytes.
> 
> Thanks for your feedback!

Thank you for prompting this conversation, adi, we do need a better 
communicated strategy around how we handle text and encodings when protocols 
stipulate that the things they're dealing with are text.

-glyph

_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to