[issue34138] RFC 6855 issue

2018-07-17 Thread Sam Varshavchik


New submission from Sam Varshavchik :

Greetings. I am in the process of implementing RFC 6855 in Courier-IMAP. A 
Google search for IMAP clients that implement RFC 6855 led me to 
https://bugs.python.org/issue21800 and looking over the code that was added to 
imaplib, to support RFC 6855, a few things stood out. I checked, and the 
changes introduces in 21800 still appear to be unchanged in 
https://github.com/python/cpython/blob/master/Lib/imaplib.py 

Issue 21800 modified sub append(), that implements the IMAP APPEND command, 
thusly:

-self.literal = MapCRLF.sub(CRLF, message)
+literal = MapCRLF.sub(CRLF, message)
+if self.utf8_enabled:
+literal = b'UTF8 (' + literal + b')'
+self.literal = literal

"literal" here appears to be the contents of the message with CRLF line ending. 
But section 4 of https://tools.ietf.org/html/rfc6855.html states:

  The ABNF for the "APPEND" data extension and "CATENATE" extension 
  follows:

utf8-literal   = "UTF8" SP "(" literal8 ")"

literal8   = 

append-data=/ utf8-literal

cat-part   =/ utf8-literal

As indicated above, "literal8" comes from RFC 4466, which also defines 
"append-data". RFC 4466 additionally states:

   In addition, the non-terminal "literal8" defined in [BINARY] got
   extended to allow for non-synchronizing literals if both [BINARY] and
   [LITERAL+] extensions are supported by the server.

I'll come back to this revealing paragraph in a moment, but, as stated, 
"literal8" actually comes from [BINARY] which is RFC 3516, which specifies the 
following:

   append =/  "APPEND" SP mailbox [SP flag-list]
  [SP date-time] SP literal8

   fetch-att  =/  "BINARY" [".PEEK"] section-binary [partial]
  / "BINARY.SIZE" section-binary

   literal8   =   "~{" number "}" CRLF *OCTET
  ;  represents the number of OCTETs
  ; in the response string.

An exhaustive search of imaplib.py seems to indicate that this pesky tilde is 
in hiding. And the wrong thing seems to be quoted as the actual literal. 
Anyway, back to the RFCs: combine all of the above together, spin it in a 
blender, and you get the following result:

Supposing that the message being appended consists of a single header line 
"Subject: test", and a blank line, a sample command of what actually goes out 
the wire (based on the above, and other parts of these, and related RFCs):

APPEND INBOX NIL NIL UTF8 (~{17}Subject: test)

I haven't tested imaplib against Courier-IMAP in this respect, but it doesn't 
seem like this is going to be results.

But wait, there's more!

"literal8" is a synchronizing literal, like "literal" from RFC 3501, which 
specifies:

  ...In the case of
   literals transmitted from client to server, the client MUST wait
   to receive a command continuation request (described later in
   this document) before sending the octet data (and the remainder
   of the command).

The LITERAL+ IMAP extension, that was mentioned in the excerpt from RFC 4466 
that I cited above, introduced non-synchronizing literals:

   The protocol receiver of an IMAP4 server must check the end of every
   received line for an open brace ('{') followed by an octet count, a
   plus ('+'), and a close brace ('}') immediately preceeding the CRLF.
   If it finds this sequence, it is the octet count of a non-
   synchronizing literal and the server MUST treat the specified number
   of following octets and the following line as part of the same
   command.

Otherwise, after the closing brace and the  the IMAP client must wait 
for the continuation response from the server.

So, to summarize:

1) RFC 4466, combined with RFC 6855 an IMAP UTF-8 client talking to an IMAP 
UTF-8 server can send the following, on the wire, if the server supports 
LITERAL+:

APPEND INBOX NIL NIL UTF8 (~{17+}Subject: test)

2) But, if the server did not advertise LITERAL+, the IMAP client is required 
to send only:

APPEND INBOX NIL NIL UTF8 (~{17}

Then wait for the continuation response from the server, then send the rest of 
the command.

IMAP specifications have been painful to read, for the 20+ years I've been 
reading them. Historically there's been a lot of interoperability problems 
between IMAP clients and servers. I lay the blame squarely on the horrible 
specs, but that's off-topic. Suffice to say, nothing of that sort has been 
observed for POP3 and SMTP, and I think there's a very good reason for that.

--
components: email
messages: 321819
nosy: Sam Varshavchik, barry, r.david.murray
priority: normal
seve

[issue34138] imaplib RFC 6855 issue

2018-07-18 Thread Sam Varshavchik


Sam Varshavchik  added the comment:

I don't have sufficient python or imaplib exposure to be able to implement full 
UTF8 APPEND functionality. I was merely investigating and researching what IMAP 
UTF8 support there was, in all existing client and server code I knew of.

What I can propose is to reverse this part of the original change:

@@ -360,7 +380,10 @@
 date_time = Time2Internaldate(date_time)
 else:
 date_time = None
-self.literal = MapCRLF.sub(CRLF, message)
+literal = MapCRLF.sub(CRLF, message)
+if self.utf8_enabled:
+literal = b'UTF8 (' + literal + b')'
+self.literal = literal
 return self._simple_command(name, mailbox, flags, date_time)

I don't see that the original patch added any code to test_imaplib.py to test 
UTF8 literals with APPEND. So what this should do is, is go back and use the 
pre-UTF8, RFC 3501 APPEND syntax, with no existing unit test fall-out.

Which is fine. IMAP UTF8 clients are not required to use UTF8 literals with 
APPEND. Enabling UTF8 in the IMAP server does not require using the UTF8 
version of APPEND. It's only required if the IMAP client wishes to send a 
message with UTF8 headers to the IMAP server.

I also looked into mutt's source, and mutt appears to be taking the same 
approach. It enables UTF8 mode in the IMAP server, and swallows UTF8 E-mail, 
and deals with folders whose names are now encoded in UTF8, instead of RFC3501 
IMAP's modified-UTF7 encoding convention. But I did not see anything in mutt 
that used UTF8 literals with APPEND. Searching mutt's source for APPEND code 
finds only one instance which sends the non-UTF8 literal. Looks like mutt will 
accept UTF8 mail, but not generate them itself. Not sure what mutt does 
creating a reply to E-mail with a UTF8 E-mail address. I don't use mutt, but 
I'll test that.

It does not surprise me, that this did not come up previously. All three other 
Libre IMAP server that I know of: UW-IMAP, Cyrus, and Dovecot, do not implement 
RFC 6855. Unless one of them is currently working on it, Courier will be the 
first one to support it. But, I have other sources that confirm otherwise.

I fully understand your lack of interest in imaplib (I really do), and I wish I 
had more Python background to help out here, myself. This is as much as I can 
propose right now, with some level of confidence in my meager Python skills. I 
mostly revolve in C++, C, and Perl orbits.

If in the future more interest develops in improving IMAP support, I'm 
reachable and I'll be open to integration testing as much as my own time 
permits, in these matters...

--

___
Python tracker 
<https://bugs.python.org/issue34138>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com