On Thu, May 5, 2016, at 11:03 AM, Steven D'Aprano wrote: > - Nobody could possibly want to support non-ASCII text. (Apart from the > approximately 6.5 billion people in the world that don't speak English of > course, an utterly insignificant majority.)
Oh, I'd absolutely want to support non-ASCII text. If I have unicode input, though, I unfortunately have to rely on https://pypi.python.org/pypi/regex as 're' doesn't support matching on character properties. I keep hoping it'll replace "re", then we could do: pattern = regex.compile(ru"^\p{Lu}\s&]+$") where \p{property} matches against character properties in the unicode database. > - Data validity doesn't matter, because there's no possible way that you > might accidentally scrape data from the wrong part of a HTML file and end > up with junk input. Um, no one said that. I was arguing that the *regular expression* doesn't need to be responsible for validation. > - Even if you do somehow end up with junk, there couldn't possibly be any > real consequences to that. No one said that either... > - It doesn't matter if you match too much, or to little, that just means > the > specs are too pedantic. Or that... -- Stephen Hansen m e @ i x o k a i . i o -- https://mail.python.org/mailman/listinfo/python-list