I confess I've kind of lost the plot on the performance requirements at this point. Instead of measuring and evaluating potential solutions, can we try to approach this from the opposite direction and ask what the requirements are?
What's the maximum number of CPU cycles that we are allowed to burn such that we can meet the 1-2% overhead? And how many bits of uncertainty are we trying to present to the attacker? What's the minimum beyond we shouldn't bother? (Perhaps because rdtsc will give us that many bits?) And does that change if we vary the reseed window in terms of the number of system calls between reseeding? And what are the ideal parameters after which point we're just gilding the lily? - Ted