[Python-Dev] socket.fromfd() documentation problem

2010-10-06 Thread Kálmán Gergely

Hello

I was having a very nasty fd leak recently where I've leaked more than 200k
FDs, allocating more than 1Gbytes of ram in kernel space. It was my 
fault alright,
but I thought I'd mention it here so maybe you'll put a little NOTE 
section in the
documentation mentioning that you have to os.close() the original FD to 
avoid
leakage. Also I'm not completely clear why python does it this way and 
why not
close the original socket - it seems (at least to me) that this would be 
the right(er)

way.

Nevertheless what are your thoughts on this? Should I file a bug report 
for it?


(The best part was that the allocated FD memory did not show up as slab, so
this was a real pain in the butt to hunt down)

Kalman Gergely
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] socket.fromfd() documentation problem

2010-10-06 Thread Victor Stinner
Le mercredi 06 octobre 2010 09:34:05, Kálmán Gergely a écrit :
> Nevertheless what are your thoughts on this? Should I file a bug report
> for it?

It will be fixed faster if you open an issue and attach a patch ;-)

-- 
Victor Stinner
http://www.haypocalc.com/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread Stephen J. Turnbull
R. David Murray writes:

 > version of headers to the email5 API, but since any such data would
 > be non-RFC compliant anyway, [access to non-conforming headers by
 > reparsing the bytes] will just have to be good enough for now.

But that's potentially unpleasant for, say, Mailman.  AFAICS, what
you're saying is that Mailman will have to implement a full header
parser and repair module, or shunt (and wait for administrator
intervention on) any mail that happens to contain even one byte of
non-RFC-conforming content in a header it cares about.  (Note that
we're not talking about moderator-level admins here; we're talking
about the Big Cheese with access to the command line on the list
host.)  That's substantially worse than the current system, where (in
theory, and in actual practice where it distributes its own version of
email) it can trap the Unicode exception on a per-header basis.

I also worry about the implications for backwards compatibility.
Eventually email-N needs to handle non-conforming mail in a sensible
way, or anybody who gets spam (ie, everybody) and wants a reliable
email system will need to implement their own.  If you punt completely
on handling non-conforming mail now, when is it going to be done?  And
when it is done, will the backward-compatible interface be able to
access the robust implementation, or will people who want robust APIs
have to use rather different ones?  The way you're going right now, I
have to worry about the answer to the second question, at least.

 > [*] Why '?' and not the unicode invalid character character?  Well, the
 > email5 Generate.flatten can be used to generate data for transmission over
 > the wire *if* the source is RFC compliant and 7bit-only, and this would
 > be a normal email5 usage pattern (that is, smtplib.SMTP.sendmail expects
 > ASCII-only strings as input!).  So the data generated by Generator.flatten
 > should not include unicode...

I don't understand this at all.  Of course the byte stream generated
by Generator.flatten won't contain Unicode (in the headers, anyway);
it will contain only ASCII (that happens to conform to QP or Base64
encoding of Unicode in some appropriate UTF in many cases).  Why is
U+FFFD REPLACEMENT CHARACTER any different from any other non-ASCII
character in this respect?

(Surely you are not saying that Generator.flatten can't DTRT with
non-ASCII content *at all*?)

The only thing I can think of is that you might not want to introduce
non-ASCII characters into a string that looks like it might simply be
corrupted in transmission (eg, it contains only one non-ASCII byte).
That's reasonable; there are a lot of people who don't have to deal
with anything but ASCII and occasionally Latin-1, and they don't like
having Unicode crammed down their throats.

 > which raises a problem for CTE 8bit sections
 > that the patch doesn't currently address.

AFAIK, there's no requirement, implied or otherwise, that a conforming
implementation *produce* CTE 8bit.  So just don't do that; that will
keep smtplib happy, no?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r85288 - in python/branches/py3k/Lib: concurrent/futures/_base.py test/test_concurrent_futures.py

2010-10-06 Thread Benjamin Peterson
2010/10/6 brian.quinlan :
> Author: brian.quinlan
> Date: Wed Oct  6 15:05:45 2010
> New Revision: 85288
>
> Log:
> Fixes 9903: test_concurrent_futures writes on stderr
>
> Modified:
>   python/branches/py3k/Lib/concurrent/futures/_base.py
>   python/branches/py3k/Lib/test/test_concurrent_futures.py
>
> Modified: python/branches/py3k/Lib/concurrent/futures/_base.py
> ==
> --- python/branches/py3k/Lib/concurrent/futures/_base.py        (original)
> +++ python/branches/py3k/Lib/concurrent/futures/_base.py        Wed Oct  6 
> 15:05:45 2010
> @@ -40,9 +40,8 @@
>
>  # Logger for internal use by the futures package.
>  LOGGER = logging.getLogger("concurrent.futures")
> -_handler = logging.StreamHandler()
> -LOGGER.addHandler(_handler)
> -del _handler
> +STDERR_HANDLER = logging.StreamHandler()
> +LOGGER.addHandler(STDERR_HANDLER)
>
>  class Error(Exception):
>     """Base class for all future-related exceptions."""
>
> Modified: python/branches/py3k/Lib/test/test_concurrent_futures.py
> ==
> --- python/branches/py3k/Lib/test/test_concurrent_futures.py    (original)
> +++ python/branches/py3k/Lib/test/test_concurrent_futures.py    Wed Oct  6 
> 15:05:45 2010
> @@ -9,6 +9,8 @@
>  # without thread support.
>  test.support.import_module('threading')
>
> +import io
> +import logging
>  import multiprocessing
>  import sys
>  import threading
> @@ -21,7 +23,8 @@
>
>  from concurrent import futures
>  from concurrent.futures._base import (
> -    PENDING, RUNNING, CANCELLED, CANCELLED_AND_NOTIFIED, FINISHED, Future, 
> wait)
> +    PENDING, RUNNING, CANCELLED, CANCELLED_AND_NOTIFIED, FINISHED, Future,
> +    LOGGER, STDERR_HANDLER, wait)
>  import concurrent.futures.process
>
>  def create_future(state=PENDING, exception=None, result=None):
> @@ -617,24 +620,33 @@
>         self.assertTrue(was_cancelled)
>
>     def test_done_callback_raises(self):
> -        raising_was_called = False
> -        fn_was_called = False
> -
> -        def raising_fn(callback_future):
> -            nonlocal raising_was_called
> -            raising_was_called = True
> -            raise Exception('doh!')
> -
> -        def fn(callback_future):
> -            nonlocal fn_was_called
> -            fn_was_called = True
> -
> -        f = Future()
> -        f.add_done_callback(raising_fn)
> -        f.add_done_callback(fn)
> -        f.set_result(5)
> -        self.assertTrue(raising_was_called)
> -        self.assertTrue(fn_was_called)
> +        LOGGER.removeHandler(STDERR_HANDLER)
> +        logging_stream = io.StringIO()
> +        handler = logging.StreamHandler(logging_stream)
> +        LOGGER.addHandler(handler)
> +        try:
> +            raising_was_called = False
> +            fn_was_called = False
> +
> +            def raising_fn(callback_future):
> +                nonlocal raising_was_called
> +                raising_was_called = True
> +                raise Exception('doh!')
> +
> +            def fn(callback_future):
> +                nonlocal fn_was_called
> +                fn_was_called = True
> +
> +            f = Future()
> +            f.add_done_callback(raising_fn)
> +            f.add_done_callback(fn)
> +            f.set_result(5)
> +            self.assertTrue(raising_was_called)
> +            self.assertTrue(fn_was_called)
> +            self.assertIn('Exception: doh!', logging_stream.getvalue())
> +        finally:
> +            LOGGER.removeHandler(handler)
> +            LOGGER.addHandler(STDERR_HANDLER)

You could use TestCase.addCleanup() here.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread R. David Murray
On Wed, 06 Oct 2010 12:22:18 +0900, "Stephen J. Turnbull"  
wrote:
> Nick Coghlan writes:
> 
>  > - if you pass in bytes data and know what you are doing, then you can
>  > access that raw bytes data and do your own decoding
> 
> At what level, though?
> 
> To take an interesting example I used to see frequently:
> 
> From: [email protected]
>   (Taro Yamada in 8-bit Shift JIS)
> 
> So I guess you are suggesting that the email module can RFC 822 parse
> that, and
> 
> 1.  Refuse to return the unwrapped (ie, single line) form of the whole
> field, except as bytes.
> 2.  Refuse to return the content of the From field, except as bytes.
> 3.  Return the email address parsed from the From field.
> 4.  Refuse to return the comment, except as bytes.

  5.  Return the content, with non-ASCII bytes replaced with ?
  characters.

In other words, my proposed patch only makes email5 1/8 to 1/4
broken, instead of half broken as it is now.  But not un-broken
enough for Mailman, it sounds like.

> That's fine.  But suppose I have a private or newly defined header
> that is structured?  Now I have two choices:
> 
> 1.  Write a version of my private parser for both str (the normal
> case) and bytes (if accessing the value as str raises)
>
> 2.  Always get the bytes and convert them to str (probably using the
> same .decode('ascii','surrogate-escape') call that email uses but
> won't let me have the value of!), then use a common str parser.

Yes, this is exactly the dilemma faced by the entire email package.
The current email6 code attempts to do a variation on (1) by having a
common parser that handles both strings and bytes using a dual subclass
approach.  This patch is trying out (2).  If you have a private header
parser, you would ideally like to be able to use the same mechanism as the
email package to solve the problem.  For email6 you'd be able to register
your header parser and get handed the input like the built in parser and
be able to use the tools provided by the built in parser to do your work.

In email5 there is no way that I know of for you to register a private
parser, so you need access to the raw input for the header in one form
or another.

If we go this route (as opposed to only handling headers with 8bit data by
sanitizing them), then we need to think about the email5 header parsers
as well (decode_header and parseaddr).  They are of course going to have
the same problems as the rest of the email package with parsing bytes,
and you are suggesting that access to those header 8bit bytes is needed.

One option would be to add a keyword to the get and get_all methods
that instructs it to return the string with the surrogate-escaped
bytes, which can then be passed onward to decode_header, parseaddr,
or a custom decoder.  Then I need to look at what needs to be added to
those methods to handle the escaped bytes, and from what you say they
too need a keyword telling them to preserve the escaped bytes on output
(a "yes I know what I'm doing" flag...'preserve_escaped_bytes=True'?).

> Note that this is more problematic than it looks, since the
> appropriate base codec may require information from higher-level
> structures (eg, qp codec tags or a Content-Type header's charset
> field).

You'll have to give me an example of where this is a problem but is
not already a problem in email4.

> Why should I reproduce email's logic here?  I don't care if the
> default or concise API raises on surrogates in the str value.  But I'm
> pretty sure that I will want to use str values containing surrogates
> in these contexts (for the same reasons that email module does, for
> example), rather than work with bytes sometimes and strs sometimes.
> 
> Please provide a way to return strs-with-surrogates if I ask for them.

Does my proposal make sense?  But note, it raises exactly the backward
compatibility concerns you mention in your next email (that I will reply
to next).  It is an open question whether it is worth opening that door
in order to be able to do extended handling on non-RFC conforming email
(as opposed to just sanitizing it and soldering on).

--
R. David Murray  www.bitdance.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread R. David Murray
On Wed, 06 Oct 2010 22:55:00 +0900, "Stephen J. Turnbull"  
wrote:
> R. David Murray writes:
> 
>  > version of headers to the email5 API, but since any such data would
>  > be non-RFC compliant anyway, [access to non-conforming headers by
>  > reparsing the bytes] will just have to be good enough for now.
> 
> But that's potentially unpleasant for, say, Mailman.  AFAICS, what
> you're saying is that Mailman will have to implement a full header
> parser and repair module, or shunt (and wait for administrator
> intervention on) any mail that happens to contain even one byte of
> non-RFC-conforming content in a header it cares about.  (Note that

No, it just means that such bytes would not be preserved for presentation
in the web UI.  They'd show up as '?'s  (Or U+FFFDs, perhaps, if I change
DeocdedGenerator to use U+FFFD instead of ?s for the unknown bytes).
As long as BytesGenerator is used on the output side to send the messages,
the bytes will be preserved and presented to the moderator in their email.

So the only parsing issue is if Mailman cares about *the non-ASCII
bytes* in the headers it cares about.  If it has to modify headers that
contain non-ASCII bytes (for example, addresses and Subject) and cares
about preserving the non-ASCII bytes, then there is indeed an issue;
see previous email for a possible way around that.

> we're not talking about moderator-level admins here; we're talking
> about the Big Cheese with access to the command line on the list
> host.)  That's substantially worse than the current system, where (in
> theory, and in actual practice where it distributes its own version of
> email) it can trap the Unicode exception on a per-header basis.

I thought mailman no longer distributed its own version of email?
And the email API currently promises not to raise during parsing,
which is a contract my patch does not change.

> I also worry about the implications for backwards compatibility.
> Eventually email-N needs to handle non-conforming mail in a sensible
> way, or anybody who gets spam (ie, everybody) and wants a reliable
> email system will need to implement their own.  If you punt completely
> on handling non-conforming mail now, when is it going to be done?  And

We're (in the current patch) not punting on handling non-conforming
email, we're punting on handling non-conforming bytes *if the headers
that contain them need to be modified*.  The headers can still be
modified, you just (currently) lose the non-ASCII bytes in the process.

> when it is done, will the backward-compatible interface be able to
> access the robust implementation, or will people who want robust APIs
> have to use rather different ones?  The way you're going right now, I
> have to worry about the answer to the second question, at least.

Well, this is still theory given the current state of the email6
code, but I *think* that working email5 code, even after this patch,
will continue to work using email6's backward compatibility interface.
And robustness is not the issue, only extended-beyond-the-RFCs handling
of non-conforming bytes would be an issue.

*But*, as I implied in my previous email, if we allow the surrogates
out so that custom header parsers can use them, then making *that*
code continue to work may require an extra layer in the compatibility
interface to produce the surrogateescaped strings.  Still, at the moment
I can't see any theoretical reason why that would not be possible,
so it may be worth the risk.

>  > [*] Why '?' and not the unicode invalid character character?  Well, the
>  > email5 Generate.flatten can be used to generate data for transmission over
>  > the wire *if* the source is RFC compliant and 7bit-only, and this would
>  > be a normal email5 usage pattern (that is, smtplib.SMTP.sendmail expects
>  > ASCII-only strings as input!).  So the data generated by Generator.flatten
>  > should not include unicode...
> 
> I don't understand this at all.  Of course the byte stream generated
> by Generator.flatten won't contain Unicode (in the headers, anyway);
> it will contain only ASCII (that happens to conform to QP or Base64
> encoding of Unicode in some appropriate UTF in many cases).  Why is
> U+FFFD REPLACEMENT CHARACTER any different from any other non-ASCII
> character in this respect?
>
> (Surely you are not saying that Generator.flatten can't DTRT with
> non-ASCII content *at all*?)

Yes, that is *exactly* what I am saying:

>>> m = email.message_from_string("""\
... From: pöstal
...   
... """)
>>> str(m)
Traceback (most recent call last):
  
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 1: 
ordinal not in range(128)

Remember, email5 is a direct translation of email4, and email4 only
handled ASCII and oh-by-the-way-if-there-are-bytes-along-for-the-
-ride-fine-we'll-pass-then-along.  So if you want to put non-ASCII
data into a message you have to encode it properly to ASCII in
exactly the same way that you did in email4:

>>> m = email

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread Stephen J. Turnbull
R. David Murray writes:

 >   5.  Return the content, with non-ASCII bytes replaced with ?
 >   characters.

That hadn't occurred to me (and it makes me sick to contemplate it).

That said, this is probably good enough for Mailman-like apps to limp
along for "most" users.  It's certainly good enough for the "might
kick your wife and elope with your dog" alpha ports of Mailman to
Python 3 (well, as certain as I can be; of course in the end Barry
decides).  Assuming reasonable backward compatibility of the API, of
course!

 > In other words, my proposed patch only makes email5 1/8 to 1/4
 > broken, instead of half broken as it is now.  But not un-broken
 > enough for Mailman, it sounds like.

IMO, not in the long run.  But realistically, in the applications I
know of, most desired traffic is conformant, and since there aren't
any Python 3 email apps yet, this isn't even a regression. :-/

I do think that it's important that the parsed object be able to tell
you what fields are there (except if the field name itself is invalid)
and return field bodies parsed as far as possible.

 > If we go this route (as opposed to only handling headers with 8bit data by
 > sanitizing them), then we need to think about the email5 header parsers
 > as well (decode_header and parseaddr).  They are of course going to have
 > the same problems as the rest of the email package with parsing bytes,
 > and you are suggesting that access to those header 8bit bytes is needed.

Yes, that would be preferable to replacing them with ASCII junk.

But I don't see any problem with parsing them; they're syntactically
insignificant by definition.  The problem is purely on output: do I
get verbatim escaped bytes, a sanitized str, or an exception?

 > One option would be to add a keyword to the get and get_all methods
 > that instructs it to return the string with the surrogate-escaped
 > bytes, which can then be passed onward to decode_header, parseaddr,
 > or a custom decoder.  Then I need to look at what needs to be added
 > to those methods to handle the escaped bytes, and from what you say
 > they too need a keyword telling them to preserve the escaped bytes
 > on output (a "yes I know what I'm doing" flag...
 > 'preserve_escaped_bytes=True'?).

The need is not absolute, but I would have a strong preference for
being able to get at those bytes.

 > Does my proposal make sense?  But note, it raises exactly the backward
 > compatibility concerns you mention in your next email (that I will reply
 > to next).  It is an open question whether it is worth opening that door
 > in order to be able to do extended handling on non-RFC conforming email
 > (as opposed to just sanitizing it and soldering on).

Well, maybe not.  However, it is not obvious to me that you won't run
into these issues again in Email6.  Applications that think of email
as textual objects are going to want to make their own choices about
handling of non-conforming email, and it's likely to be massively
inconvenient to say "OK, but you have to use bytes interfaces
exclusively, because the str interfaces don't handle that."
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread Stephen J. Turnbull
R. David Murray writes:

 > So the only parsing issue is if Mailman cares about *the non-ASCII
 > bytes* in the headers it cares about.  If it has to modify headers that
 > contain non-ASCII bytes (for example, addresses and Subject) and cares
 > about preserving the non-ASCII bytes, then there is indeed an issue;
 > see previous email for a possible way around that.

OK.

 > I thought mailman no longer distributed its own version of email?

I believe so; the point is that it could do so again.

 > And the email API currently promises not to raise during parsing,
 > which is a contract my patch does not change.

Which is a contract that has historically been broken frequently.
Unhandled UnicodeErrors have been one of the most common causes of
queue stoppage in Mailman (exceeded only by configuration errors
AFAICS).  I haven't seen any reports for a while, but with the email
package being reengineered from the ground up, the possibility of
regression can't be ignored.

Granted, there should be no regression problem in the current model
for Email5, AIUI.

 > We're (in the current patch) not punting on handling non-conforming
 > email, we're punting on handling non-conforming bytes *if the headers
 > that contain them need to be modified*.  The headers can still be
 > modified, you just (currently) lose the non-ASCII bytes in the process.

Modified *or examined*.  I can't think of any important applications
offhand that *need* to examine the non-ASCII bytes (in particular,
Mailman doesn't need to do that).  Verbatim copying of the bytes
themselves is almost always the desired usage.

 > And robustness is not the issue, only extended-beyond-the-RFCs handling
 > of non-conforming bytes would be an issue.

And with that, I'm certain that Jon Postel is really dead. :-(

 > > (Surely you are not saying that Generator.flatten can't DTRT with
 > > non-ASCII content *at all*?)
 > 
 > Yes, that is *exactly* what I am saying:
 > 
 > >>> m = email.message_from_string("""\
 > ... From: pöstal
 > ...   
 > ... """)
 > >>> str(m)
 > Traceback (most recent call last):
 >   
 > UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 
 > 1: ordinal not in range(128)

But that's not interesting; you did that with Python 3.  We want to
know what people porting from Python 2 will expect.  So, in 2.5.5 or
2.6.6 on Mac, with email v4.0.2, it *doesn't* raise, it returns

wideload:~ 4:14$ python
Python 2.5.5 (r255:77872, Jul 13 2010, 03:03:57) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> m=email.message_from_string('From: pöstal\n\n')
>>> str(m)
'From nobody Thu Oct  7 04:18:25 2010\nFrom: p\xc3\xb6stal\n\n'
>>> m['From']
'p\xc3\xb6stal'
>>> 

That's hardly helpful!  Surely we can and should do better than that
now, especially since UTF-8 (with a proper CTE) is now almost
universally acceptable to MUAs.  When would it be a problem for that
to return

'From nobody Thu Oct  7 04:18:25 2010\nFrom: =?UTF-8?Q?p=C3=B6stal?=\n\n'

 > Remember, email5 is a direct translation of email4, and email4 only
 > handled ASCII and oh-by-the-way-if-there-are-bytes-along-for-the-
 > -ride-fine-we'll-pass-then-along.  So if you want to put non-ASCII
 > data into a message you have to encode it properly to ASCII in
 > exactly the same way that you did in email4:

But if you do it right, then it will still work in a version that just
encodes non-ASCII characters in UTF-8 with the appropriate CTE.  Since
you'll never be passing it non-ASCII characters, it's already ASCII
and UTF-8, and no CTE will be needed.

 > Yes, exactly.  I need to fix the patch to recode using, say,
 > quoted-printable in that case.

It really should check for proportions of non-ASCII.  QP would be
horrible for Japanese or Chinese.

 > DecodedGenerator could still produce the unicode, though, which is
 > what I believe we want.  (Although that raises the question of
 > whether DecodedGenerator should also decode the RFC2047 encoded
 > headersbut that raises a backward compatibility issue).

Can't really help you there.  While I would want the RFC 2047 headers
decoded if I were writing new code (which is generally the case for
me), I haven't really wrapped my head around the issues of porting old
code using Python2 str to Python3 str here.  My intuition says "no
problem" (there won't be any MIME-words so the app won't try to decode
them), but I'm not real sure of that. ;-)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com