Steve Reid <[EMAIL PROTECTED]> writes:
> Below is some sample output. The amount of entropy per passphrase should
> be more than 89 bits, or almost the same as seven Diceware words.
> However, if you generate N passphrases and pick the one that is easiest
> to remember then you should subtract log2(N) bits from your entropy
> estimate (assume an adversary knows how to try passphrases in order of
> easiest-to-remember to hardest-to-remember).
>
> 1- the optative furore dankly bedevil the sixty-six creamware
> 2- the mouthless clepsydras sweatily abdicated the unfelt Commons
> 3- the talkative admirer cracking endure the declivous Andizhan
> 4- the unrested Atabrine corruptly graving the stateside flatness
> 5- the unvibrant kataplasia valorously reissuing the calcareous Portage
>
> This is not nearly as good as I had hoped. Does anyone have any
> suggestions for producing output that is more correct english? I'm
> wondering if maybe the lexicon I'm using isn't so good. Or maybe my
> knowledge of sentence structure hmm, with Yoda on par it is.
The most obvious reason why your phrases are so opaque is you're
probably giving equal weight to all the words you're pseudo-randomly
selecting. In real english, the words are not so well distributed, but
a small subset is frequently selected and a large subset is almost
never used. To produce more readable text, you would want to weight
your logic to select some words more frequently, which of course means
those words would contribute fewer bits to your entropy and you'd need
a longer pass phrase. For extra entropy, you could vary your sentence
structure. One quick hack for selecting the "more common" words would
be to count syllables - the longer words are generally less common
and certainly harder to remember & spell right. Another rule is that
all the common words come from german, and all the uncommon ones come
from latin or french.
The 2nd problem is that, when you look at english words, there are a
number of layers at which you can look at them:
spelling
pronunciation
grammatical "parts of speech"
lower level semantic content
higher level "meaning"
To produce "well-formed" text, you have to get *all* of these
right. Badly botching the first few levels will give you more or less
garbage. Leaving out the last few levels will give you various
kinds of nonsense, but if you can get the rhyme & rhythm right,
you might please the shade of Charles Dodgeson. The first few
levels are pretty straight-forward, but there are some non-obvious
relationships between the last few levels. For instance, for
adjectives, there is a "preferred" order based in part on semantics,
which is why we say
the big expensive red house
instead of
the expensive red big house
There are also more obvious semantic problems violated in the following:
The sandy word darkly ran of the radio metaphors.
Presumably, with some work, you could classify the attributes
of words, to keep track of mass nouns, concrete vs. abstract
nouns, acceptable prepositions to use with which verbs, and so forth.
Higher level meaning is probably hardest to come by, since
it really requires human comprehension to understand.
A runty god ran by my couch this morning, on my way to work.
He said "bad boy, no cookie", and I gave him a small rhino.
As the sun set, I made my way past the dying policeman
and sat down on my computer. After all the weights
fell on the ceiling, I gave the boss my pocketbook
and swam onto the plane. Several bubbles later, I
arrived home, drank two steaks, and fell asleep on a knife.
All the words here are used in a roughly sensible fashion, but the
story they tell is wildly unlikely. There's a huge database of
knowledge we have, that people don't swim onto planes, that weights
fall down, not up, that people don't ride couches to work, the sun doesn't
set in the morning, and that small rhinos are no longer acceptable religious
sacrifices either before or after work, that we all learn as kids, but
that won't show up in any dictionary.
I've heard it claimed that in ordinary english, each word has about 4
bits of entropy - I think that means in order to get 90 bits of
entropy, you'd need about 23 words. That's probably more than one line
of text.
-Marcus Watts
UM ITD PD&D Umich Systems Group