On 2016-03-19 12:24, BartC wrote: > So a string that looks like: > > "ññññññññññññññññññññññññññññññññññññññññññññññññññ" > > can have 2**50 different representations? And occupy somewhere > between 50 and 200 bytes? Or is that 400?
And moreover, they're all distinct if you don't normalize them. Which certain environments such as CSS & HTML don't do. So you can have css = """ <style> .r\N{LATIN SMALL LETTER E WITH ACUTE}sum\N{LATIN SMALL LETTER E WITH ACUTE} {color: red;} .r\N{LATIN SMALL LETTER E WITH ACUTE}sume\N{COMBINING ACUTE ACCENT} {color: blue;} .re\N{COMBINING ACUTE ACCENT}sum\N{LATIN SMALL LETTER E WITH ACUTE} {color: purple;} .re\N{COMBINING ACUTE ACCENT}sume\N{COMBINING ACUTE ACCENT} {color: purple;} <style> """ html_fragment = """ <ul> <li class="r\N{LATIN SMALL LETTER E WITH ACUTE}sum\N{LATIN SMALL LETTER E WITH ACUTE}">One <li class="r\N{LATIN SMALL LETTER E WITH ACUTE}sume\N{COMBINING ACUTE ACCENT}">Two <li class="re\N{COMBINING ACUTE ACCENT}sum\N{LATIN SMALL LETTER E WITH ACUTE}">Three <li class="re\N{COMBINING ACUTE ACCENT}sume\N{COMBINING ACUTE ACCENT}">Four </ul> """ which will all appear visually identical in the source code, but each is unique according to the DOM/CSS parser. -tkc -- https://mail.python.org/mailman/listinfo/python-list