On 12/03/2016 19:26, Thomas 'PointedEars' Lahn wrote:
BartC wrote:
On 12/03/2016 12:13, Marko Rauhamaa wrote:
Why, look at the *English* page on Hillary Clinton:
Hillary Diane Rodham Clinton /ˈhɪləri daɪˈæn ˈrɒdəm ˈklɪntən/ (born
October 26, 1947) is an American politician.
<URL: https://en.wikipedia.org/wiki/Hillary_Clinton>
You couldn't get past the first sentence in ASCII.
I saved that page locally as a .htm file in UTF-8 encoding. I ran a
modified version of my benchmark, and it appeared that 99.7% of the
bytes had ASCII codes.
That is a contradiction in terms. Obviously you do not know what ASCII is.
What does your own analysis show of that page?
If you had it in memory as fully expanded 32-bit Unicode values, what
proportion of those would have values below 128?
--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list