On Monday 21 December 2015 14:45, Ben Finney wrote: > Steven D'Aprano <st...@pearwood.info> writes: > >> Let's call the second group "random" and the first "non-random", >> without getting bogged down into arguments about whether they are >> really random or not. > > I think we should discuss it, even at risk of getting bogged down. As > you know better than I, “random” is not an observable property of the > value, but of the process that produced it. > > So, I don't think “random” is at all helpful as a descriptor of the > criteria you need for discriminating these values. > > Can you give a better definition of what criteria distinguish the > values, based only on their observable properties?
No, not really. This *literally* is a case of "I'll know it when I see it", which suggests that some sort of machine-learning solution (neural network?) may be useful. I can train it on a bunch of strings which I can hand- classify, and let the machine pick out the correlations, then apply it to the rest of the strings. The best I can say is that the "non-random" strings either are, or consist of, mostly English words, names, or things which look like they might be English words, containing no more than a few non-ASCII characters, punctuation, or digits. > You used “meaningless”; that seems at least more hopeful as a criterion > we can use by examining text values. So, what counts as meaningless? Strings made up of random-looking sequences of characters, like you often see on sites like imgur or tumblr. Characters from non-Latin character sets that I can't read (e.g. Japanese, Korean, Arabic, etc). Jumbled up words, e.g. "python" is non-random, "nyohtp" would be random. [...] > Perhaps you could measure Shannon entropy (“expected information value”) > <URL:https://en.wikipedia.org/wiki/Entropy_%28information_theory%29> as > a proxy? Or maybe I don't quite understand the criteria. That's a possibility. At least, it might be able to distinguish some strings, although if I understand correctly, the two strings "python" and "nhoypt" have identical entropy, so this alone won't be sufficient. -- Steve -- https://mail.python.org/mailman/listinfo/python-list