On 25-09-17 09:53, Richard Sargent wrote:
Rather than off-the-cuffing anything, please honour the Unicode Character Properties. Refer to https://en.wikipedia.org/wiki/Unicode_character_property#Whitespace, among others.
That is a good idea. And it won't help you if you scrape data from the web, as you'll find plenty of bad encoding. And unclarity over which version of which standard was used (see mongolian vowel separator)
Stephan