In article <aa3b500f-bebf-4d77-9855-3d90b07ea...@y7g2000pbu.googlegroups.com>, rusi <rustompm...@gmail.com> wrote:
> On Apr 3, 6:43 pm, Roy Smith <r...@panix.com> wrote: > > This has to inspect the entire string, no? I posted (essentially) this > > a few days ago: > > > > if all(ord(c) <= 0xffff for c in s): > > return "it's all bmp" > > else: > > return "it's got astral crap in it" > > Astral crap? CRAP? > Verily sir I am offended! > [...] > You are American! This is true. But, to be fair, in the (I don't have the exact number here) roughly 200 million records in our recent big data import job, I found exactly FOUR strings with astral characters. Which boiled down to two versions of each of two different song titles. One had a Unicode Character 'BALLOON' (U+1F388). The other had some heart symbol (sorry, I don't remember the exact code point). These hardly seem a matter of national pride. And, if you don't believe there is astral crap, how do you explain U+1F4A9?
-- http://mail.python.org/mailman/listinfo/python-list