W dniu 2012-09-23 17:17, Pawel Jakub Dawidek pisze:
On Sun, Sep 23, 2012 at 02:37:48AM +0200, Mariusz Gromada wrote:
W dniu 2012-09-22 21:53, Pawel Jakub Dawidek pisze:
Mariusz, can you confirm my findings?
Pawel,
Your conclusions can be easily confirmed by shape analysis of the EDF.
Usually maximum quantile difference (called D-statistic) gives you a
kind of overview, function shape gives you a strong feeling, p-value
gives you a formal proof.
D-statistic values (your data):
6bit: 0.33%
7bit: 0.29%
8bit: 0.27%
9bit: 0.21%
10bit: 6.34%
11bit: 19.07%
12bit: 54.80%
What I would say: increasing the number of bits from 6 to 9 does not
affect distribution "uniformity", reaching the tenth bit results in
sudden increase in the difference measure - the more bits, the more
difference is observed. Distribution shape analysis for the 10th bit
shows non-linear function. Lack of "randomness" in the quntile
difference curve - chart shows completely lack of noise (pure
functional relation). These are very strong indicators that starting
from 10th bit distribution was changed and is no longer uniform.
To formally confirm above conclusion for i.e. 5% significance level,
which means that confidence level is 95%, I need some extra data
regarding sample sizes. Please pass to me number of collected
observations in each 6-12 bit experiment.
Total number of observations was 162833.
Ok, finally I have some formal results. To be completely honest I need
to point out that, in fact, we have a discrete data (for example
integers 0, 1, ..., 63, but not continues numbers spread across 0 and
63). That is way I am going to use two sample Kolmogorov-Smirnov test.
Methodology is simple:
- Pawel’s data will be called empirical one
- Theoretical data will be generated as a sequence of unique integer
numbers from 0 to 2**n -1, where n is the number of bits. Assumption -
each number appears in theoretical data only once representing ideal
uniform distribution.
Calculations will be done in the R-cran package
Loading empirical data form files:
> e6 = read.table("E:\\pawel\\dhr2_6bit_sorted.txt")
> e7 = read.table("E:\\pawel\\dhr2_7bit_sorted.txt")
> e8 = read.table("E:\\pawel\\dhr2_8bit_sorted.txt")
> e9 = read.table("E:\\pawel\\dhr2_9bit_sorted.txt")
> e10 = read.table("E:\\pawel\\dhr2_10bit_sorted.txt")
> e11 = read.table("E:\\pawel\\dhr2_11bit_sorted.txt")
> e12 = read.table("E:\\pawel\\dhr2_12bit_sorted.txt")
Generating ideal theoretical data:
> t6 = c(0:(2**6-1))
> t7 = c(0:(2**7-1))
> t8 = c(0:(2**8-1))
> t9 = c(0:(2**9-1))
> t10 = c(0:(2**10-1))
> t11 = c(0:(2**11-1))
> t12 = c(0:(2**12-1))
Performing KS tests:
> ks.test(e6, t6)
D = 0.0032, p-value = 1
> ks.test(e7, t7)
D = 0.0029, p-value = 1
> ks.test(e8, t8)
D = 0.0027, p-value = 1
> ks.test(e9, t9)
D = 0.0022, p-value = 1
> ks.test(e10, t10)
D = 0.0634, p-value = 0.0005562
> ks.test(e11, t11)
D = 0.1907, p-value < 2.2e-16
> ks.test(e12, t12)
D = 0.5479, p-value < 2.2e-16
As you can see D-statistics are almost the same as calculated by Pawel
(considering roundings). P-values are very interesting due to very high
number of observations generated by Pawel. Between 6 bits and 9 bits
estimated p-values are equal to 1, so it means that it is impossible (at
any significance level) to reject null hypothesis stating that compared
distributions are equal. Final conclusion: it has to be random, and for
sure it is random!
Additionally starting form 10 bits we can observe dramatic decrease of
p-value (from 100% to c.a. 0,06% and much less for the 11-12 bits). So
low p-value means that it is impossible not to reject null hypothesis
stating that compared distributions are equal. Final conclusion: it
cannot be random, and for sure it is not random.
I did the same comparison for the previous real device attach data (2081
obs.). R code and the results are below:
> e16 = read.table("E:\\pawel\\device_attach_16bit.log")
> t16 = c(0:(2**16-1))
> ks.test(e16, t16)
D = 0.0178, p-value = 0.5422
Again, D-statistic an p-value are almost the same as previously
calculated "manually". P-value is very high (it is not as high as in the
6-12 bits tests, but consider much lower number of observations: 2081 vs
162833), giving almost sureness that you have captured real 16-bits
entropy!
Regards,
Mariusz
_______________________________________________
freebsd-security@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-security
To unsubscribe, send any mail to "freebsd-security-unsubscr...@freebsd.org"