Re: From JoyceUlysses.txt -- words occurring exactly once

Chris Angelico via Python-list Tue, 04 Jun 2024 15:05:47 -0700

On Wed, 5 Jun 2024 at 02:49, Edward Teach via Python-list
<[email protected]> wrote:
>
> On Mon, 03 Jun 2024 14:58:26 -0400 (EDT)
> Grant Edwards <[email protected]> wrote:
>
> > On 2024-06-03, Edward Teach via Python-list <[email protected]>
> > wrote:
> >
> > > The Gutenburg Project publishes "plain text".  That's another
> > > problem, because "plain text" means UTF-8....and that means
> > > unicode...and that means running some sort of unicode-to-ascii
> > > conversion in order to get something like "words".  A couple of
> > > hours....a couple of hundred lines of C....problem solved!
> >
> > I'm curious.  Why does it need to be converted frum Unicode to ASCII?
> >
> > When you read it into Python, it gets converted right back to
> > Unicode...
> >
>
> Well.....when using the file linux.words as a useful master list of
> "words".....linux.words is strict ASCII........
>


Whatever gave you that idea? I have a large number of dictionaries in
/usr/share/dict, all of them encoded UTF-8 except one (and I don't
know why that is). Even the English ones aren't entirely ASCII.

There is no need to "convert from Unicode to ASCII", which makes no sense.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: From JoyceUlysses.txt -- words occurring exactly once

Reply via email to