[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread R. David Murray
R. David Murray added the comment: Committed (with RM approval on IRC) in r88252. Note that this does not necessarily solve the performance problem. A new issue should be opened for that if it still exist. -- resolution: -> fixed stage: commit review -> committed/rejected status: op

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread R. David Murray
R. David Murray added the comment: Benjamin suggested using hasattr(message, 'buffer'), and that works great. The test revealed a bug in the patch, which is now fixed. All tests pass on windows. As far as I'm concerned the patch is ready to go. Other reviews would of course be welcome (and

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread R. David Murray
R. David Murray added the comment: (I hope you meant I was working on a patch :) Patch is done, but there is one remaining test failure that I'm not sure how to handle. The test is test_add_text_file_warns. The code checks to see if a file is a subclass of io.TextIOWrapper, and if so warns

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread STINNER Victor
STINNER Victor added the comment: > The last step is running the tests on Windows. > Attached is the updated patch. mailbox4.patch doesn't pass on Windows, Raymond is working on a patch. -- ___ Python tracker

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread R. David Murray
R. David Murray added the comment: OK, I've added deprecation warnings for using StringIO or text mode files as input. I found one bug thereby, but it is a bug that pre-existed the patch (see issue 11062). I've completed my code review. To address Victor's question about the mh-sequences

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread R. David Murray
R. David Murray added the comment: Victor: yes, I was thinking that when I added that comment but forgot to come back to it. Thanks for spotting that. Another thing I forgot about yesterday is that I activated the commented out statements that do linesep transformations on the binary file da

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread STINNER Victor
STINNER Victor added the comment: +if isinstance(message, io.TextIOWrapper): +# Backward compatibility hack. +message = message.buffer Is it a good thing to parse a mailbox using a text file? If not, we should emit a warning and maybe remove this fea

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso added the comment: RDM: it seems i was too tired to get your messages right last evening! Indeed it's now completely my fault, i should inspect the content further in respect to the str/bytes etc. stuff! Thus - i will now need three or four days to cleanup my hacky code

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread Georg Brandl
Georg Brandl added the comment: I'd really like someone else to throw a pair of eyes at the code changes before it is committed. But yes, I will allow this into rc2, since a completely broken module isn't really what a minor release is about. -- _

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread Georg Brandl
Changes by Georg Brandl : -- Removed message: http://bugs.python.org/msg127391 ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-29 Thread Georg Brandl
Georg Brandl added the comment: I'd really like someone else to throw a pair of eyes at the code changes before it is committed. But yes, I will allow this into rc2, and we'll buy some stabilizing time by adding an rc3 to the cycle. -- ___ Python

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray added the comment: If you are using the most recent mailbox3 patch (I should have renamed it, sorry...I've no done so to make it clear) you should be getting an error message that tells you to use binary or Message. So I don't understand how you are getting this message. I s

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso added the comment: I missed your mailbox3.patch, but now i've merged it in. One error changed, it now happens when a re.search is applied to a header value and thus seems to match what you say. I'm not able to understand this error this evening, but i will review it onc

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
Changes by R. David Murray : Removed file: http://bugs.python.org/file20579/mailbox3.patch ___ Python tracker ___ ___ Python-bugs-list mailing

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
Changes by R. David Murray : Added file: http://bugs.python.org/file20585/mailbox3.patch ___ Python tracker ___ ___ Python-bugs-list mailing li

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray added the comment: Added two more tests of non-ASCII. I think the tests now cover the necessary cases. I still want to do a full code review tomorrow, but I think the patch is in final form if anyone else is available to do a review as well. Georg, are you OK with this going

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray added the comment: Well, that's a bunch of code, and I'm afraid I don't know what your answer to my question was. What error do you get now if you use the new version of mailbox3.patch? If you feed the new mailbox/email bytes, it will preserve the bytes as is, as long as you

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso added the comment: > What is the data type returned by your get_msg? I bet it is string, > and email can't handle messages in string format that have non-ASCII > characters (Now i see that the local names 'box', 'mbox' and 'mailbox' have become somewhat messed up, which

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
Changes by R. David Murray : Removed file: http://bugs.python.org/file20565/mailbox3.patch ___ Python tracker ___ ___ Python-bugs-list mailing

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray added the comment: I'm updating the patch to contain a couple tests using non-ASCII. More are needed. Before this patch, one could process a file containing non-ASCII characters as text, and if your default encoding happened to be able to decode it, things would appear to mo

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray added the comment: What is the data type returned by your get_msg? I bet it is string, and email can't handle messages in string format that have non-ASCII characters (I'm adding an explicit error message for this). You either need to use a Message object, or, more likely in

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso added the comment: You're indeed right, i've overseen a try..catch! I'll even be able to give you some fast code hints now (and i'll be offline the next few hours - the mails are simply mails with illegal charsets, say): Traceback (most recent call last): File "/Users/

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread A.M. Kuchling
Changes by A.M. Kuchling : -- nosy: -akuchling ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyth

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray added the comment: I don't see those error messages in the mailbox source. I'm guess your application isn trapping the errors in a try/except. In that case, just do a bare 'raise' in the except clause, and you should get the full traceback. I'm sure I'll discover problems ju

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso added the comment: Indeed i tried to create tracebacks (even with "import traceback"), but these all end up in my code (and all the time). I have not yet figured out how to create tracebacks which leave my code and reach the source, which surely must be somewhere in em

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread R. David Murray
R. David Murray added the comment: Steffen: thanks for testing. Do those error messages have tracebacks? Can you post them? Can you post example messages and a short program that demonstrates the problem? I'm going to be creating some non-ascii test cases, but any additional info you can

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-28 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso added the comment: After cloning branches/py3k (i now have three different detached repo snakes in my arena (2.7,3.1,py3k), by the way - not bad for a greenhorn, huh?). I've applied RDMs patch from msg127245. Note: the test mails are *malformed*! Stuff in brackets are my

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-27 Thread R. David Murray
R. David Murray added the comment: Attached is a patch that builds on Victor's patch, but takes the approach I discussed of maintaining backward compatibility (for the most part; see below). The test suite in this version is substantially unchanged. The major changes are adding tests for th

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread R. David Murray
R. David Murray added the comment: I haven't looked at the items Haypo has pointed to yet, but I have looked at the API issues (get_string, add, etc). It seems to me that we have to make a decision here: do we break API backward compatibility and convert to consuming and emitting bytes prett

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread STINNER Victor
STINNER Victor added the comment: pitrou> There's a missing conversion in mailbox.patch. pitrou> Running with -bb shows the issue. pitrou> Here is an updated patch. Good catch: test_mailbox now pass on Windows. -- Some remarks on mailbox2.patch. get_string() returns a bytes object: I propose

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread R. David Murray
R. David Murray added the comment: Haypo: yeah, in an ideal world Generator would use get_payload() and not _payload, but I had to make some compromises with purity of model separation in order to achieve the practical goal of handling bytes usefully. I'd like to fix that as I work on email

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread Antoine Pitrou
Antoine Pitrou added the comment: There's a missing conversion in mailbox.patch. Running with -bb shows the issue. Here is an updated patch. -- Added file: http://bugs.python.org/file20532/mailbox2.patch ___ Python tracker

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread STINNER Victor
STINNER Victor added the comment: All test_email and test_mailbox pass with mailbox.patch+BytesGenerator_handle_text.patch on Windows except one test: == ERROR: test_set_item (test.test_mailbox.TestBabyl) -

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso added the comment: This message will not help anyone. And it's not a chat, but because it seems i really made the horses gone grazy (direct translation of a german proverb): - msg127002: RDM, i didn't know all of that and i am really sorry. Now I am looking forward to do

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-26 Thread STINNER Victor
STINNER Victor added the comment: > I reverted r88197 because it was incorrect and caused an email test > to fail. Once I come up with a test for it I'll fix it correctly. test_mailbox is a good (indirect) test suite for this change. The problem of r88197 is that it replaces msg._payload by ms

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread R. David Murray
R. David Murray added the comment: Thanks, Victor, you beat me to it :) I'll see if I can review this tomorrow, or if not I can probably do it Thursday. I reverted r88197 because it was incorrect and caused an email test to fail. Once I come up with a test for it I'll fix it correctly. I s

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread Raymond Hettinger
Raymond Hettinger added the comment: Nice. Thanks Victor. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: h

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread STINNER Victor
STINNER Victor added the comment: While working on this issue, I found and fixed two bugs in the email binary parser: r88196 and r88197. -- ___ Python tracker ___ __

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread STINNER Victor
STINNER Victor added the comment: mailbox.patch: - open files in binary mode not as text - parse as bytes not as Unicode - replace email.generator.Generator() by email.generator.BytesGenerator() - use .message_from_bytes() instead of .message_from_str() - use .message_from_binary_file() ins

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread Raymond Hettinger
Raymond Hettinger added the comment: ISTM an "API change" is okay if it fixes a critical usability bug. Also, if this is going to ship as-is, the docs should get a big warning right at the top. Perhaps the source code should also emit a notice that the module is hosed so that people like Ste

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread R. David Murray
R. David Murray added the comment: That should have been "too late to make API changes for 3.2". -- ___ Python tracker ___ ___ Python-

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread R. David Murray
R. David Murray added the comment: I'm afraid so. The python3 uptake process was expected to take five years overall, and we are only up to about the second year at this point. So while you may have been away from Python for 6 years, you came back right in the middle of an unprecedented tra

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- priority: high -> critical versions: +Python 3.3 ___ Python tracker ___ ___ Python-bugs-list mailing lis

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-25 Thread Steffen Daode Nurpmeso
Steffen Daode Nurpmeso added the comment: Re-New to Python - Re-Started with Py3K in 2011. 'Found myself in a dead-end after 10 days of work because a KOI8-R spam mail causes the file I/O decoding process to fail - and there is NO WAY TO HANDLE THIS with mailbox.py! (Went to python.org, searche

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-24 Thread STINNER Victor
Changes by STINNER Victor : -- nosy: +haypo ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.o

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-12 Thread R. David Murray
R. David Murray added the comment: I've been intending to take a look at this issue at some point, but am not sure when I'd get to it. I took a quick look. It does seems to me that it is true that for data-validity purposes the message files need to be opened in binary and fed to the email p

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-12 Thread Raymond Hettinger
Raymond Hettinger added the comment: With the module being so slow as to be unusable, this can be considered a bugfix, so it is okay if the fix goes into 3.2.1. -- versions: -Python 3.3 ___ Python tracker ___

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-12 Thread Raymond Hettinger
Raymond Hettinger added the comment: RDM, you were suggested for this by Thomas Wouters (who wrote much of the existing code). Are you up for it? -- assignee: -> r.david.murray nosy: +r.david.murray ___ Python tracker

[issue9124] Mailbox module should use binary I/O, not text I/O

2011-01-12 Thread Antoine Pitrou
Antoine Pitrou added the comment: The aforementioned python-dev thread (available at http://mail.python.org/pipermail/python-dev/2010-June/101186.html ) explains things quite well. The mailbox module needs to be modified to use binary I/O, both for functionality and for speed. Right now, I do