On Wed, 5 Jun 2024 at 02:49, Edward Teach via Python-list <python-list@python.org> wrote: > > On Mon, 03 Jun 2024 14:58:26 -0400 (EDT) > Grant Edwards <grant.b.edwa...@gmail.com> wrote: > > > On 2024-06-03, Edward Teach via Python-list <python-list@python.org> > > wrote: > > > > > The Gutenburg Project publishes "plain text". That's another > > > problem, because "plain text" means UTF-8....and that means > > > unicode...and that means running some sort of unicode-to-ascii > > > conversion in order to get something like "words". A couple of > > > hours....a couple of hundred lines of C....problem solved! > > > > I'm curious. Why does it need to be converted frum Unicode to ASCII? > > > > When you read it into Python, it gets converted right back to > > Unicode... > > > > Well.....when using the file linux.words as a useful master list of > "words".....linux.words is strict ASCII........ >
Whatever gave you that idea? I have a large number of dictionaries in /usr/share/dict, all of them encoded UTF-8 except one (and I don't know why that is). Even the English ones aren't entirely ASCII. There is no need to "convert from Unicode to ASCII", which makes no sense. ChrisA -- https://mail.python.org/mailman/listinfo/python-list