Re: Catogorising strings into random versus non-random

Steven D'Aprano Mon, 21 Dec 2015 02:42:07 -0800

On Mon, 21 Dec 2015 08:56 pm, Christian Gollwitzer wrote:

> Apfelkiste:Tests chris$ python score_my.py
> -8.74  baby lions at play
> -7.63  saturday_morning12
> -6.38  Fukushima
> -5.72  ImpossibleFork
> -10.6  xy39mGWbosjY
> -12.9  9sjz7s8198ghwt
> -12.1  rz4sdko-28dbRW00u
> Apfelkiste:Tests chris$ python score_my.py 'bnsip atl ayba loy'
> -9.43  bnsip atl ayba loy


Thanks Christian and Peter for the suggestion, I'll certainly investigate
this further.

But the scoring doesn't seem very good. "baby lions at play" is 100% English
words, and ought to have a radically different score from (say)
xy39mGWbosjY which is extremely non-English like. (How many English words
do you know of with W, X, two Y, and J?) And yet they are only two units
apart. "baby lions..." is a score almost as negative as the authentic
gibberish, while Fukushima (a Japanese word) has a much less negative
score. Using trigraphs doesn't change that:

> -11.5  baby lions at play
> -9.85  Fukushima
> -13.4  xy39mGWbosjY

So this test appears to find that English-like words are nearly as "random"
as actual random strings.

But it's certainly worth looking into.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Catogorising strings into random versus non-random

Reply via email to