[issue18814] Add utilities to "clean" surrogate code points from strings

2018-03-29 Thread Nick Coghlan
Nick Coghlan added the comment: With PEPs 538 and 540 implemented for 3.7, my thinking on this has evolved a bit. A recent discussion on python-ideas [1] also introduced me to the third party library, "ftfy", which offers a wide range of tools for cleaning up improperly decoded data: https:/

[issue18814] Add utilities to "clean" surrogate code points from strings

2015-09-29 Thread R. David Murray
R. David Murray added the comment: Done: issue 25269. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https:

[issue18814] Add utilities to "clean" surrogate code points from strings

2015-09-29 Thread STINNER Victor
STINNER Victor added the comment: > I also want "detect if there are any surrogates". Could you please open a separated issue for this function/method? I believe that it's very different than other proposed functions/methods. It was proposed before to add methods like "is_ascii()" but the requ

[issue18814] Add utilities to "clean" surrogate code points from strings

2015-09-27 Thread Steven D'Aprano
Steven D'Aprano added the comment: On Sun, Sep 27, 2015 at 04:17:45PM +, R. David Murray wrote: > > I also want "detect if there are any surrogates". I think that's useful enough it should be a str method. Here's a pure-Python implementation: def is_surrogate(s): return any(0xD800 <=

[issue18814] Add utilities to "clean" surrogate code points from strings

2015-09-27 Thread R. David Murray
R. David Murray added the comment: I also want "detect if there are any surrogates". -- ___ Python tracker ___ ___ Python-bugs-list ma

[issue18814] Add utilities to "clean" surrogate code points from strings

2015-09-27 Thread Nick Coghlan
Nick Coghlan added the comment: As far as the rationale for adding the functions at all goes, my main interest is still in having somewhere in the codecs module documentation to *define the problem*, and to my mind that entails also offering a simple way to do the relevant pre-/post-processing

[issue18814] Add utilities to "clean" surrogate code points from strings

2015-09-27 Thread Nick Coghlan
Nick Coghlan added the comment: I think moving this forward mainly needs someone with the time and energy wrangle a python-ideas/dev discussion to get some additional feedback on the API design. As I see it, there are 2 main questions to be resolved: 1. Where to expose these functions The def

[issue18814] Add utilities to "clean" surrogate code points from strings

2015-09-27 Thread STINNER Victor
STINNER Victor added the comment: Hum, I suggest to put these functions in a package on PyPI, or recipes on a website like stackoverfkow., and close the issue. I'm still not convinced that these functions are useful . Usually we take a function from an existing project used in applications to pu

[issue18814] Add utilities to "clean" surrogate code points from strings

2015-09-26 Thread Martin Panter
Martin Panter added the comment: [padding] I think my suggested colours for the bikeshed would be handle_surrogates() and handle_surrogateescape(). “Rehandle” seems awkward and too assuming to me. And I agree with Serhiy that surrogates are a Unicode thing, not just related to the “surrogatep

[issue18814] Add utilities to "clean" surrogate code points from strings

2015-06-07 Thread Steven D'Aprano
Changes by Steven D'Aprano : -- nosy: +steven.daprano ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://m

[issue18814] Add utilities to "clean" surrogate code points from strings

2015-05-11 Thread Nick Coghlan
Nick Coghlan added the comment: I suggest we defer this one to 3.6 - I still think it's worth doing, but I don't think it's a major barrier to migration, and it would be good to get some real world experience with the new sys.stdin behaviour of defaulting to using surrogateescape in the POSIX