[Python-Dev] socket.fromfd() documentation problem
Hello I was having a very nasty fd leak recently where I've leaked more than 200k FDs, allocating more than 1Gbytes of ram in kernel space. It was my fault alright, but I thought I'd mention it here so maybe you'll put a little NOTE section in the documentation mentioning that you have to os.close() the original FD to avoid leakage. Also I'm not completely clear why python does it this way and why not close the original socket - it seems (at least to me) that this would be the right(er) way. Nevertheless what are your thoughts on this? Should I file a bug report for it? (The best part was that the allocated FD memory did not show up as slab, so this was a real pain in the butt to hunt down) Kalman Gergely ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] socket.fromfd() documentation problem
Le mercredi 06 octobre 2010 09:34:05, Kálmán Gergely a écrit : > Nevertheless what are your thoughts on this? Should I file a bug report > for it? It will be fixed faster if you open an issue and attach a patch ;-) -- Victor Stinner http://www.haypocalc.com/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Patch making the current email package (mostly) support bytes
R. David Murray writes: > version of headers to the email5 API, but since any such data would > be non-RFC compliant anyway, [access to non-conforming headers by > reparsing the bytes] will just have to be good enough for now. But that's potentially unpleasant for, say, Mailman. AFAICS, what you're saying is that Mailman will have to implement a full header parser and repair module, or shunt (and wait for administrator intervention on) any mail that happens to contain even one byte of non-RFC-conforming content in a header it cares about. (Note that we're not talking about moderator-level admins here; we're talking about the Big Cheese with access to the command line on the list host.) That's substantially worse than the current system, where (in theory, and in actual practice where it distributes its own version of email) it can trap the Unicode exception on a per-header basis. I also worry about the implications for backwards compatibility. Eventually email-N needs to handle non-conforming mail in a sensible way, or anybody who gets spam (ie, everybody) and wants a reliable email system will need to implement their own. If you punt completely on handling non-conforming mail now, when is it going to be done? And when it is done, will the backward-compatible interface be able to access the robust implementation, or will people who want robust APIs have to use rather different ones? The way you're going right now, I have to worry about the answer to the second question, at least. > [*] Why '?' and not the unicode invalid character character? Well, the > email5 Generate.flatten can be used to generate data for transmission over > the wire *if* the source is RFC compliant and 7bit-only, and this would > be a normal email5 usage pattern (that is, smtplib.SMTP.sendmail expects > ASCII-only strings as input!). So the data generated by Generator.flatten > should not include unicode... I don't understand this at all. Of course the byte stream generated by Generator.flatten won't contain Unicode (in the headers, anyway); it will contain only ASCII (that happens to conform to QP or Base64 encoding of Unicode in some appropriate UTF in many cases). Why is U+FFFD REPLACEMENT CHARACTER any different from any other non-ASCII character in this respect? (Surely you are not saying that Generator.flatten can't DTRT with non-ASCII content *at all*?) The only thing I can think of is that you might not want to introduce non-ASCII characters into a string that looks like it might simply be corrupted in transmission (eg, it contains only one non-ASCII byte). That's reasonable; there are a lot of people who don't have to deal with anything but ASCII and occasionally Latin-1, and they don't like having Unicode crammed down their throats. > which raises a problem for CTE 8bit sections > that the patch doesn't currently address. AFAIK, there's no requirement, implied or otherwise, that a conforming implementation *produce* CTE 8bit. So just don't do that; that will keep smtplib happy, no? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] r85288 - in python/branches/py3k/Lib: concurrent/futures/_base.py test/test_concurrent_futures.py
2010/10/6 brian.quinlan :
> Author: brian.quinlan
> Date: Wed Oct 6 15:05:45 2010
> New Revision: 85288
>
> Log:
> Fixes 9903: test_concurrent_futures writes on stderr
>
> Modified:
> python/branches/py3k/Lib/concurrent/futures/_base.py
> python/branches/py3k/Lib/test/test_concurrent_futures.py
>
> Modified: python/branches/py3k/Lib/concurrent/futures/_base.py
> ==
> --- python/branches/py3k/Lib/concurrent/futures/_base.py (original)
> +++ python/branches/py3k/Lib/concurrent/futures/_base.py Wed Oct 6
> 15:05:45 2010
> @@ -40,9 +40,8 @@
>
> # Logger for internal use by the futures package.
> LOGGER = logging.getLogger("concurrent.futures")
> -_handler = logging.StreamHandler()
> -LOGGER.addHandler(_handler)
> -del _handler
> +STDERR_HANDLER = logging.StreamHandler()
> +LOGGER.addHandler(STDERR_HANDLER)
>
> class Error(Exception):
> """Base class for all future-related exceptions."""
>
> Modified: python/branches/py3k/Lib/test/test_concurrent_futures.py
> ==
> --- python/branches/py3k/Lib/test/test_concurrent_futures.py (original)
> +++ python/branches/py3k/Lib/test/test_concurrent_futures.py Wed Oct 6
> 15:05:45 2010
> @@ -9,6 +9,8 @@
> # without thread support.
> test.support.import_module('threading')
>
> +import io
> +import logging
> import multiprocessing
> import sys
> import threading
> @@ -21,7 +23,8 @@
>
> from concurrent import futures
> from concurrent.futures._base import (
> - PENDING, RUNNING, CANCELLED, CANCELLED_AND_NOTIFIED, FINISHED, Future,
> wait)
> + PENDING, RUNNING, CANCELLED, CANCELLED_AND_NOTIFIED, FINISHED, Future,
> + LOGGER, STDERR_HANDLER, wait)
> import concurrent.futures.process
>
> def create_future(state=PENDING, exception=None, result=None):
> @@ -617,24 +620,33 @@
> self.assertTrue(was_cancelled)
>
> def test_done_callback_raises(self):
> - raising_was_called = False
> - fn_was_called = False
> -
> - def raising_fn(callback_future):
> - nonlocal raising_was_called
> - raising_was_called = True
> - raise Exception('doh!')
> -
> - def fn(callback_future):
> - nonlocal fn_was_called
> - fn_was_called = True
> -
> - f = Future()
> - f.add_done_callback(raising_fn)
> - f.add_done_callback(fn)
> - f.set_result(5)
> - self.assertTrue(raising_was_called)
> - self.assertTrue(fn_was_called)
> + LOGGER.removeHandler(STDERR_HANDLER)
> + logging_stream = io.StringIO()
> + handler = logging.StreamHandler(logging_stream)
> + LOGGER.addHandler(handler)
> + try:
> + raising_was_called = False
> + fn_was_called = False
> +
> + def raising_fn(callback_future):
> + nonlocal raising_was_called
> + raising_was_called = True
> + raise Exception('doh!')
> +
> + def fn(callback_future):
> + nonlocal fn_was_called
> + fn_was_called = True
> +
> + f = Future()
> + f.add_done_callback(raising_fn)
> + f.add_done_callback(fn)
> + f.set_result(5)
> + self.assertTrue(raising_was_called)
> + self.assertTrue(fn_was_called)
> + self.assertIn('Exception: doh!', logging_stream.getvalue())
> + finally:
> + LOGGER.removeHandler(handler)
> + LOGGER.addHandler(STDERR_HANDLER)
You could use TestCase.addCleanup() here.
--
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Patch making the current email package (mostly) support bytes
On Wed, 06 Oct 2010 12:22:18 +0900, "Stephen J. Turnbull" wrote: > Nick Coghlan writes: > > > - if you pass in bytes data and know what you are doing, then you can > > access that raw bytes data and do your own decoding > > At what level, though? > > To take an interesting example I used to see frequently: > > From: [email protected] > (Taro Yamada in 8-bit Shift JIS) > > So I guess you are suggesting that the email module can RFC 822 parse > that, and > > 1. Refuse to return the unwrapped (ie, single line) form of the whole > field, except as bytes. > 2. Refuse to return the content of the From field, except as bytes. > 3. Return the email address parsed from the From field. > 4. Refuse to return the comment, except as bytes. 5. Return the content, with non-ASCII bytes replaced with ? characters. In other words, my proposed patch only makes email5 1/8 to 1/4 broken, instead of half broken as it is now. But not un-broken enough for Mailman, it sounds like. > That's fine. But suppose I have a private or newly defined header > that is structured? Now I have two choices: > > 1. Write a version of my private parser for both str (the normal > case) and bytes (if accessing the value as str raises) > > 2. Always get the bytes and convert them to str (probably using the > same .decode('ascii','surrogate-escape') call that email uses but > won't let me have the value of!), then use a common str parser. Yes, this is exactly the dilemma faced by the entire email package. The current email6 code attempts to do a variation on (1) by having a common parser that handles both strings and bytes using a dual subclass approach. This patch is trying out (2). If you have a private header parser, you would ideally like to be able to use the same mechanism as the email package to solve the problem. For email6 you'd be able to register your header parser and get handed the input like the built in parser and be able to use the tools provided by the built in parser to do your work. In email5 there is no way that I know of for you to register a private parser, so you need access to the raw input for the header in one form or another. If we go this route (as opposed to only handling headers with 8bit data by sanitizing them), then we need to think about the email5 header parsers as well (decode_header and parseaddr). They are of course going to have the same problems as the rest of the email package with parsing bytes, and you are suggesting that access to those header 8bit bytes is needed. One option would be to add a keyword to the get and get_all methods that instructs it to return the string with the surrogate-escaped bytes, which can then be passed onward to decode_header, parseaddr, or a custom decoder. Then I need to look at what needs to be added to those methods to handle the escaped bytes, and from what you say they too need a keyword telling them to preserve the escaped bytes on output (a "yes I know what I'm doing" flag...'preserve_escaped_bytes=True'?). > Note that this is more problematic than it looks, since the > appropriate base codec may require information from higher-level > structures (eg, qp codec tags or a Content-Type header's charset > field). You'll have to give me an example of where this is a problem but is not already a problem in email4. > Why should I reproduce email's logic here? I don't care if the > default or concise API raises on surrogates in the str value. But I'm > pretty sure that I will want to use str values containing surrogates > in these contexts (for the same reasons that email module does, for > example), rather than work with bytes sometimes and strs sometimes. > > Please provide a way to return strs-with-surrogates if I ask for them. Does my proposal make sense? But note, it raises exactly the backward compatibility concerns you mention in your next email (that I will reply to next). It is an open question whether it is worth opening that door in order to be able to do extended handling on non-RFC conforming email (as opposed to just sanitizing it and soldering on). -- R. David Murray www.bitdance.com ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Patch making the current email package (mostly) support bytes
On Wed, 06 Oct 2010 22:55:00 +0900, "Stephen J. Turnbull"
wrote:
> R. David Murray writes:
>
> > version of headers to the email5 API, but since any such data would
> > be non-RFC compliant anyway, [access to non-conforming headers by
> > reparsing the bytes] will just have to be good enough for now.
>
> But that's potentially unpleasant for, say, Mailman. AFAICS, what
> you're saying is that Mailman will have to implement a full header
> parser and repair module, or shunt (and wait for administrator
> intervention on) any mail that happens to contain even one byte of
> non-RFC-conforming content in a header it cares about. (Note that
No, it just means that such bytes would not be preserved for presentation
in the web UI. They'd show up as '?'s (Or U+FFFDs, perhaps, if I change
DeocdedGenerator to use U+FFFD instead of ?s for the unknown bytes).
As long as BytesGenerator is used on the output side to send the messages,
the bytes will be preserved and presented to the moderator in their email.
So the only parsing issue is if Mailman cares about *the non-ASCII
bytes* in the headers it cares about. If it has to modify headers that
contain non-ASCII bytes (for example, addresses and Subject) and cares
about preserving the non-ASCII bytes, then there is indeed an issue;
see previous email for a possible way around that.
> we're not talking about moderator-level admins here; we're talking
> about the Big Cheese with access to the command line on the list
> host.) That's substantially worse than the current system, where (in
> theory, and in actual practice where it distributes its own version of
> email) it can trap the Unicode exception on a per-header basis.
I thought mailman no longer distributed its own version of email?
And the email API currently promises not to raise during parsing,
which is a contract my patch does not change.
> I also worry about the implications for backwards compatibility.
> Eventually email-N needs to handle non-conforming mail in a sensible
> way, or anybody who gets spam (ie, everybody) and wants a reliable
> email system will need to implement their own. If you punt completely
> on handling non-conforming mail now, when is it going to be done? And
We're (in the current patch) not punting on handling non-conforming
email, we're punting on handling non-conforming bytes *if the headers
that contain them need to be modified*. The headers can still be
modified, you just (currently) lose the non-ASCII bytes in the process.
> when it is done, will the backward-compatible interface be able to
> access the robust implementation, or will people who want robust APIs
> have to use rather different ones? The way you're going right now, I
> have to worry about the answer to the second question, at least.
Well, this is still theory given the current state of the email6
code, but I *think* that working email5 code, even after this patch,
will continue to work using email6's backward compatibility interface.
And robustness is not the issue, only extended-beyond-the-RFCs handling
of non-conforming bytes would be an issue.
*But*, as I implied in my previous email, if we allow the surrogates
out so that custom header parsers can use them, then making *that*
code continue to work may require an extra layer in the compatibility
interface to produce the surrogateescaped strings. Still, at the moment
I can't see any theoretical reason why that would not be possible,
so it may be worth the risk.
> > [*] Why '?' and not the unicode invalid character character? Well, the
> > email5 Generate.flatten can be used to generate data for transmission over
> > the wire *if* the source is RFC compliant and 7bit-only, and this would
> > be a normal email5 usage pattern (that is, smtplib.SMTP.sendmail expects
> > ASCII-only strings as input!). So the data generated by Generator.flatten
> > should not include unicode...
>
> I don't understand this at all. Of course the byte stream generated
> by Generator.flatten won't contain Unicode (in the headers, anyway);
> it will contain only ASCII (that happens to conform to QP or Base64
> encoding of Unicode in some appropriate UTF in many cases). Why is
> U+FFFD REPLACEMENT CHARACTER any different from any other non-ASCII
> character in this respect?
>
> (Surely you are not saying that Generator.flatten can't DTRT with
> non-ASCII content *at all*?)
Yes, that is *exactly* what I am saying:
>>> m = email.message_from_string("""\
... From: pöstal
...
... """)
>>> str(m)
Traceback (most recent call last):
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 1:
ordinal not in range(128)
Remember, email5 is a direct translation of email4, and email4 only
handled ASCII and oh-by-the-way-if-there-are-bytes-along-for-the-
-ride-fine-we'll-pass-then-along. So if you want to put non-ASCII
data into a message you have to encode it properly to ASCII in
exactly the same way that you did in email4:
>>> m = email
Re: [Python-Dev] Patch making the current email package (mostly) support bytes
R. David Murray writes: > 5. Return the content, with non-ASCII bytes replaced with ? > characters. That hadn't occurred to me (and it makes me sick to contemplate it). That said, this is probably good enough for Mailman-like apps to limp along for "most" users. It's certainly good enough for the "might kick your wife and elope with your dog" alpha ports of Mailman to Python 3 (well, as certain as I can be; of course in the end Barry decides). Assuming reasonable backward compatibility of the API, of course! > In other words, my proposed patch only makes email5 1/8 to 1/4 > broken, instead of half broken as it is now. But not un-broken > enough for Mailman, it sounds like. IMO, not in the long run. But realistically, in the applications I know of, most desired traffic is conformant, and since there aren't any Python 3 email apps yet, this isn't even a regression. :-/ I do think that it's important that the parsed object be able to tell you what fields are there (except if the field name itself is invalid) and return field bodies parsed as far as possible. > If we go this route (as opposed to only handling headers with 8bit data by > sanitizing them), then we need to think about the email5 header parsers > as well (decode_header and parseaddr). They are of course going to have > the same problems as the rest of the email package with parsing bytes, > and you are suggesting that access to those header 8bit bytes is needed. Yes, that would be preferable to replacing them with ASCII junk. But I don't see any problem with parsing them; they're syntactically insignificant by definition. The problem is purely on output: do I get verbatim escaped bytes, a sanitized str, or an exception? > One option would be to add a keyword to the get and get_all methods > that instructs it to return the string with the surrogate-escaped > bytes, which can then be passed onward to decode_header, parseaddr, > or a custom decoder. Then I need to look at what needs to be added > to those methods to handle the escaped bytes, and from what you say > they too need a keyword telling them to preserve the escaped bytes > on output (a "yes I know what I'm doing" flag... > 'preserve_escaped_bytes=True'?). The need is not absolute, but I would have a strong preference for being able to get at those bytes. > Does my proposal make sense? But note, it raises exactly the backward > compatibility concerns you mention in your next email (that I will reply > to next). It is an open question whether it is worth opening that door > in order to be able to do extended handling on non-RFC conforming email > (as opposed to just sanitizing it and soldering on). Well, maybe not. However, it is not obvious to me that you won't run into these issues again in Email6. Applications that think of email as textual objects are going to want to make their own choices about handling of non-conforming email, and it's likely to be massively inconvenient to say "OK, but you have to use bytes interfaces exclusively, because the str interfaces don't handle that." ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Patch making the current email package (mostly) support bytes
R. David Murray writes:
> So the only parsing issue is if Mailman cares about *the non-ASCII
> bytes* in the headers it cares about. If it has to modify headers that
> contain non-ASCII bytes (for example, addresses and Subject) and cares
> about preserving the non-ASCII bytes, then there is indeed an issue;
> see previous email for a possible way around that.
OK.
> I thought mailman no longer distributed its own version of email?
I believe so; the point is that it could do so again.
> And the email API currently promises not to raise during parsing,
> which is a contract my patch does not change.
Which is a contract that has historically been broken frequently.
Unhandled UnicodeErrors have been one of the most common causes of
queue stoppage in Mailman (exceeded only by configuration errors
AFAICS). I haven't seen any reports for a while, but with the email
package being reengineered from the ground up, the possibility of
regression can't be ignored.
Granted, there should be no regression problem in the current model
for Email5, AIUI.
> We're (in the current patch) not punting on handling non-conforming
> email, we're punting on handling non-conforming bytes *if the headers
> that contain them need to be modified*. The headers can still be
> modified, you just (currently) lose the non-ASCII bytes in the process.
Modified *or examined*. I can't think of any important applications
offhand that *need* to examine the non-ASCII bytes (in particular,
Mailman doesn't need to do that). Verbatim copying of the bytes
themselves is almost always the desired usage.
> And robustness is not the issue, only extended-beyond-the-RFCs handling
> of non-conforming bytes would be an issue.
And with that, I'm certain that Jon Postel is really dead. :-(
> > (Surely you are not saying that Generator.flatten can't DTRT with
> > non-ASCII content *at all*?)
>
> Yes, that is *exactly* what I am saying:
>
> >>> m = email.message_from_string("""\
> ... From: pöstal
> ...
> ... """)
> >>> str(m)
> Traceback (most recent call last):
>
> UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position
> 1: ordinal not in range(128)
But that's not interesting; you did that with Python 3. We want to
know what people porting from Python 2 will expect. So, in 2.5.5 or
2.6.6 on Mac, with email v4.0.2, it *doesn't* raise, it returns
wideload:~ 4:14$ python
Python 2.5.5 (r255:77872, Jul 13 2010, 03:03:57)
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> m=email.message_from_string('From: pöstal\n\n')
>>> str(m)
'From nobody Thu Oct 7 04:18:25 2010\nFrom: p\xc3\xb6stal\n\n'
>>> m['From']
'p\xc3\xb6stal'
>>>
That's hardly helpful! Surely we can and should do better than that
now, especially since UTF-8 (with a proper CTE) is now almost
universally acceptable to MUAs. When would it be a problem for that
to return
'From nobody Thu Oct 7 04:18:25 2010\nFrom: =?UTF-8?Q?p=C3=B6stal?=\n\n'
> Remember, email5 is a direct translation of email4, and email4 only
> handled ASCII and oh-by-the-way-if-there-are-bytes-along-for-the-
> -ride-fine-we'll-pass-then-along. So if you want to put non-ASCII
> data into a message you have to encode it properly to ASCII in
> exactly the same way that you did in email4:
But if you do it right, then it will still work in a version that just
encodes non-ASCII characters in UTF-8 with the appropriate CTE. Since
you'll never be passing it non-ASCII characters, it's already ASCII
and UTF-8, and no CTE will be needed.
> Yes, exactly. I need to fix the patch to recode using, say,
> quoted-printable in that case.
It really should check for proportions of non-ASCII. QP would be
horrible for Japanese or Chinese.
> DecodedGenerator could still produce the unicode, though, which is
> what I believe we want. (Although that raises the question of
> whether DecodedGenerator should also decode the RFC2047 encoded
> headersbut that raises a backward compatibility issue).
Can't really help you there. While I would want the RFC 2047 headers
decoded if I were writing new code (which is generally the case for
me), I haven't really wrapped my head around the issues of porting old
code using Python2 str to Python3 str here. My intuition says "no
problem" (there won't be any MIME-words so the app won't try to decode
them), but I'm not real sure of that. ;-)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
