Re: [Python-Dev] Edits to Metadata 1.2 to add extras (optional ependencies)

2012-09-01 Thread R. David Murray
On Sat, 01 Sep 2012 13:55:11 +0900, "Stephen J. Turnbull"  
wrote:
> "Martin v. Löwis" writes:
> 
>  > Unfortunately, this conflicts with the desire to use UTF-8 in attribute
>  > values - RFC 822 (and also 2822) don't support this, but require the
>  > use oF MIME instead (Q or B encoding).
> 
> This can be achieved simply by extending the set of characters
> permitted, as MIME did for message bodies.  I'd be cautious about RFC
> 5335, not just because it's experimental, but because there may be
> other requirements we don't want to mess with.  (If RDM says
> otherwise, listen to him.  I just know the RFC exists.)

That is essentially what that RFC does.  I haven't gone through it with
a fine-tooth yet, but that's why I say the parsing side mostly works
already: we allow unicode characters anywhere non-special-characters are
allowed during parsing.  The only issue is that we encode non-ASCII using
the normal rules during serialization, so we need a new policy control to
disable that.  I'm thinking it will be any easy addition...the hard part
for RFC5335 is doing that fine-tooth read and adding appropriate tests.

Alternatively, as Donald pointed out, you can use the Binary mode, where
the utf-8 bytes just go along for the ride.  In the context of the
metadata, I think that should produce the desired results, since there
should be no need to re-wrap metadata lines.  It will also preserve the
line endings *if* you don't use the new policies.  But that is why I
would prefer to use explicit RFC5335 support...I'd like the email
backward compatibility policy to go away some day :)  (On the gripping
hand, it will always be possible to recreate it as a custom policy.)

>  > RFC 2822 also has a continuation line semantics which traditionally 
>  > conflicts with the metadata; in particular, line breaks cannot be 
>  > represented (but are interpreted as continuation lines instead).
> 
> Of course line breaks can be represented, without any further change
> to RFC 2822.  Just use Unicode LINE SEPARATOR.  You could even do it
> within ASCII by adhering strictly to RFC 2822 syntax which interprets
> continuation lines by removing exactly the CRLF pair.  Just use ASCII
> TAB as the field separator.

Yes, that is what I was talking to Tarek about.  And since ReST source
shouldn't contain tabs, a tab would probably work as the separator,
if for some reason you didn't want to use LINE SEPARATOR.

> There's a final dodge that occurs to me: the semantics you're talking
> about are *lexical* semantics in the RFC 2822 context (line unfolding
> and RFC 2047 decoding).  We could possibly in the context of the email
> module treat Metadata as an intermediate post-lexical-decoding
> pre-syntactic-analysis representation.  I don't know if that makes
> sense in the context of using email module facilities to parse
> Metadata.

The policy has hooks that support this.  A policy gets handed the source
line complete with the line breaks, determines what gets stored in the
model, and also gets to control what gets handed back to the application
when a header is retrieved from the model.  The policy can also control
the header folding during serialization.  So preserving line separators
using a custom policy is not only possible, but should be fairly easy.

--David
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Unicode support of the curses module in Python 3.3

2012-09-01 Thread Victor Stinner
Hi,

I changed many functions of the curses module in Python 3.3 to improve
its Unicode support:
 - new functions: curses.unget_wch() and window.get_wch()
 - new attribute: window.encoding
 - the default encoding is now the locale encoding instead of UTF-8
 - use the C functions *_wch() and *wstr() when available instead of
*ch() and *str() functions. For example, the Python function addstr()
calls waddwstr() and addch(str) calls wadd_wch() (addch(int) and
addch(bytes) are still calling waddch())

Most new features related to Unicode now depends if the Python curses
module is linked to the C libncursesw library or not... and the Python
module is not linked to libncursesw if the libreadline library is
linked to libncurses module. How the readline library is linked to
libncurses/libncursesw is not a new problem but it may become more
annoying than before. I hope that most Linux distro are/will link
readline to libncursesw.

For example, if the Python curses module is not linked to libncursesw,
get_wch() is not available and addch("é") raises an OverflowError if
the locale encoding is UTF-8 (because "é".encode("utf-8") is longer
than 1 byte).

I introduced two bugs: get_wch() didn't support keycodes (like
curses.KEY_UP) and addch() didn't work anymore with special characters
like curses.ACS_HLINE. These issues are referenced as #15785 and
#14223 in the bug tracker, and I pushed fixes: c58789634d22 and
27b5bd5f0e4c. I hope that Georg will accept them in Python 3.3 final!

I didn't find these bugs myself because I only used dummy scripts to
test my changes. Does anyone know "real world" applications using the
curses module and supporting Python 3? Can you please test them with
non-ASCII characters and the last development version of Python 3.3?

So please try to test the curses module before Python 3.3 final with
your favorite application!

Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-ideas] itertools.chunks(iterable, size, fill=None)

2012-09-01 Thread Miki Tebeka
See the "grouper" example in http://docs.python.org/library/itertools.html

On Friday, August 31, 2012 11:28:33 PM UTC-7, anatoly techtonik wrote:
>
> I've run into the necessity of implementing chunks() again. Here is 
> the code I've made from scratch. 
>
> def chunks(seq, size): 
>   '''Cut sequence into chunks of given size. If `seq` length is 
>  not divisible by `size` without reminder, last chunk will 
>  have length less than size. 
>
>  >>> list( chunks([1,2,3,4,5,6,7], 3) ) 
>  [[1, 2, 3], [4, 5, 6], [7]] 
>   ''' 
>   endlen = len(seq)//size 
>   for i in range(endlen): 
> yield [seq[i*size+n] for n in range(size)] 
>   if len(seq) % size: 
> yield seq[endlen*size:] 
>
> -- 
> anatoly t. 
>
>
> On Fri, Jun 29, 2012 at 11:32 PM, Georg Brandl > 
> wrote: 
> > On 26.06.2012 10:03, anatoly techtonik wrote: 
> >> 
> >> Now that Python 3 is all about iterators (which is a user killer 
> >> feature for Python according to StackOverflow - 
> >> http://stackoverflow.com/questions/tagged/python) would it be nice to 
> >> introduce more first class functions to work with them? One function 
> >> to be exact to split string into chunks. 
> >> 
> >>  itertools.chunks(iterable, size, fill=None) 
> >> 
> >> Which is the 33th most voted Python question on SO - 
> >> 
> >> 
> http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python/312464
>  
> >> 
> >> P.S. CC'ing to python-dev@ to notify about the thread in python-ideas. 
> >> 
> > 
> > Anatoly, so far there were no negative votes -- would you care to go 
> > another step and propose a patch? 
> > 
> > 
> > Georg 
> > 
> > ___ 
> > Python-ideas mailing list 
> > [email protected]  
> > http://mail.python.org/mailman/listinfo/python-ideas 
> ___ 
> Python-ideas mailing list 
> [email protected]  
> http://mail.python.org/mailman/listinfo/python-ideas 
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode support of the curses module in Python 3.3

2012-09-01 Thread Steven D'Aprano

On 01/09/12 23:44, Victor Stinner wrote:

Hi,

I changed many functions of the curses module in Python 3.3 to improve
its Unicode support:

[...]

Thank you.



For example, if the Python curses module is not linked to libncursesw,
get_wch() is not available and addch("é") raises an OverflowError if
the locale encoding is UTF-8 (because "é".encode("utf-8") is longer
than 1 byte).


OverflowError? That is very surprising. I wouldn't guess that calling
addch could raise OverflowError.

Could you use a less surprising exception, or at least make sure that
it is clearly and obviously documented?


--
Steven
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com