Re: Dictionary changes

Bzzzz Wed, 02 Jul 2014 09:41:05 -0700

On Wed, 2 Jul 2014 12:22:02 -0400
Steve Litt <sl...@troubleshooters.com> wrote:


> Another thing to remember is that the wordlist is no longer ASCII,

Excellent thing at the age of UTF-N.

> cat /usr/share/dict/words | grep -i "$1"

Simplify it: grep -i "$1" /usr/share/dict/words

> If you look up ^smor.*rd$, you get nothing. But if you look up
> ^sm.*rd$ you get smörgåsbord. What I'd like to do is get grep to
> think "å" is a hit for "a" and report it, but report it as "å".
> I'll let you know when I figure out how to do that, or do some
> other thing that produces the same result. Prepending LC_ALL=
> either C, C.UTF-8, en_US.utf8, or POSIX, to the grep command,
> didn't do it either.

You can't, 'cos these letters do not have the same code
in either encoding.
(But your case is interesting; may be a rewritten grep,
including conversions, would be of interest).
 
> If worst comes to worst and I can't find a way to get grep to do
> this, I'll just put together a substitution table,
> convert /usr/share/dict/words to words.ascii, line for line, search
> words.ascii, get the line number, and pull that line out of words.
> Crude, but effective.

AFAIK, this is the only way to be able to perform what you want.

-- 
To be is to do. -- I. Kant
To do is to be. -- A. Sartre
Do be a Do Bee! -- Miss Connie, Romper Room
Do be do be do! -- F. Sinatra
Yabba-Dabba-Doo! -- F. Flintstone

signature.asc
Description: PGP signature

Re: Dictionary changes

Reply via email to