New submission from Guillaume Sanchez:
"a⃑".center(width=5, fillchar=".")
produces
'..a⃑.' instead of '..a⃑..'
The reason is that "a⃑" is composed of two code points (2 UCS4 chars), one 'a'
and one combining code point "above
Guillaume Sanchez added the comment:
Obviously, I'm talking about str.center() but all functions needing a count of
graphemes are then not totally correct.
I can fix that and add the corresponding function, or an iterator over
graphemes, or whatever seems
Guillaume Sanchez added the comment:
Thanks for all those interesting cases you brought here! I didn't think of that
at all!
I'm using the word "grapheme" as per the definition given in UAX TR29 which is
*not* language/locale dependant [1].
This annex is very specific and
Guillaume Sanchez added the comment:
Hello to all of you, sorry for the delay. Been busy.
I added the base code needed to built the grapheme cluster break algorithm. We
now have the GraphemeBreakProperty available via
unicodedata.grapheme_cluster_break()
Can you check that the implementation
Guillaume Sanchez added the comment:
Hello,
I implemented unicodedata.break_graphemes() that returns an iterators that
spits consecutive graphemes.
This is a "test" implementation meant to see what doesn't fits Python's style
and design, to discuss naming and implementa
Guillaume Sanchez added the comment:
Hello,
I come from bugs.python.org/issue30717 . I have a pending PR that needs review
( https://github.com/python/cpython/pull/2673 ) adding a function that breaks
unicode strings into grapheme clusters (aka what one would intuitively call "a
char
Guillaume Sanchez added the comment:
Hello Steven!
Thanks for your reactivity!
unicodedata.grapheme_cluster_break() takes a unicode code point as an argument
and return its GraphemeBreakProperty as a string. Possible values are listed
here: http://www.unicode.org/reports/tr29/#CR
help
Guillaume Sanchez added the comment:
Hi,
Are you guys still interested? I haven't heard from you in a while
--
___
Python tracker
<http://bugs.python.org/is
Guillaume Sanchez added the comment:
Thanks for your consideration. I'm currently fixing what's been asked in the
reviews.
> But it would be useful to provide also word and sentence iterators.
I'll gladly do that as well!
> I think emitting a pair (pos, substring) would
Guillaume Sanchez added the comment:
I have a few criticism to do against that proto-PEP
http://mail.python.org/pipermail/python-dev/2001-July/015938.html
In particular, the fact that all those functions return an index prevents any
state keeping.
That's a problem because:
> next_(
Guillaume Sanchez added the comment:
> I don't think unicodedata is the right place
I do agree with that. A new module sounds good, would it be a problem if that
module would contain very few functions at first?
> Can we mark this as having a Provisional API to give us time to de
11 matches
Mail list logo