W dniu 2012-09-23 17:17, Pawel Jakub Dawidek pisze:
On Sun, Sep 23, 2012 at 02:37:48AM +0200, Mariusz Gromada wrote:
W dniu 2012-09-22 21:53, Pawel Jakub Dawidek pisze:
Mariusz, can you confirm my findings?

Pawel,

Your conclusions can be easily confirmed by shape analysis of the EDF.
Usually maximum quantile difference (called D-statistic) gives you a
kind of overview, function shape gives you a strong feeling, p-value
gives you a formal proof.
D-statistic values (your data):

   6bit:   0.33%
   7bit:   0.29%
   8bit:   0.27%
   9bit:   0.21%
10bit:   6.34%
11bit:  19.07%
12bit:  54.80%

What I would say: increasing the number of bits from 6 to 9 does not
affect distribution "uniformity", reaching the tenth bit results in
sudden increase in the difference measure -  the more bits, the more
difference is observed. Distribution shape analysis for the 10th bit
shows non-linear function. Lack of "randomness" in the quntile
difference curve - chart  shows completely lack of noise (pure
functional relation).  These are very strong indicators that starting
from 10th bit distribution was changed and is no longer uniform.

To formally confirm above conclusion for i.e. 5% significance level,
which means that confidence level is 95%, I need some extra data
regarding sample sizes. Please pass to me number of collected
observations in each 6-12 bit experiment.

Total number of observations was 162833.


Ok, finally I have some formal results. To be completely honest I need to point out that, in fact, we have a discrete data (for example integers 0, 1, ..., 63, but not continues numbers spread across 0 and 63). That is way I am going to use two sample Kolmogorov-Smirnov test. Methodology is simple:

- Pawel’s data will be called empirical one
- Theoretical data will be generated as a sequence of unique integer numbers from 0 to 2**n -1, where n is the number of bits. Assumption - each number appears in theoretical data only once representing ideal uniform distribution.

Calculations will be done in the R-cran package

Loading empirical data form files:

> e6 = read.table("E:\\pawel\\dhr2_6bit_sorted.txt")
> e7 = read.table("E:\\pawel\\dhr2_7bit_sorted.txt")
> e8 = read.table("E:\\pawel\\dhr2_8bit_sorted.txt")
> e9 = read.table("E:\\pawel\\dhr2_9bit_sorted.txt")
> e10 = read.table("E:\\pawel\\dhr2_10bit_sorted.txt")
> e11 = read.table("E:\\pawel\\dhr2_11bit_sorted.txt")
> e12 = read.table("E:\\pawel\\dhr2_12bit_sorted.txt")

Generating ideal theoretical data:

> t6 = c(0:(2**6-1))
> t7 = c(0:(2**7-1))
> t8 = c(0:(2**8-1))
> t9 = c(0:(2**9-1))
> t10 = c(0:(2**10-1))
> t11 = c(0:(2**11-1))
> t12 = c(0:(2**12-1))

Performing KS tests:

> ks.test(e6, t6)
D = 0.0032, p-value = 1

> ks.test(e7, t7)
D = 0.0029, p-value = 1

> ks.test(e8, t8)
D = 0.0027, p-value = 1

> ks.test(e9, t9)
D = 0.0022, p-value = 1

> ks.test(e10, t10)
D = 0.0634, p-value = 0.0005562

> ks.test(e11, t11)
D = 0.1907, p-value < 2.2e-16

> ks.test(e12, t12)
D = 0.5479, p-value < 2.2e-16

As you can see D-statistics are almost the same as calculated by Pawel (considering roundings). P-values are very interesting due to very high number of observations generated by Pawel. Between 6 bits and 9 bits estimated p-values are equal to 1, so it means that it is impossible (at any significance level) to reject null hypothesis stating that compared distributions are equal. Final conclusion: it has to be random, and for sure it is random!

Additionally starting form 10 bits we can observe dramatic decrease of p-value (from 100% to c.a. 0,06% and much less for the 11-12 bits). So low p-value means that it is impossible not to reject null hypothesis stating that compared distributions are equal. Final conclusion: it cannot be random, and for sure it is not random.

I did the same comparison for the previous real device attach data (2081 obs.). R code and the results are below:

> e16 = read.table("E:\\pawel\\device_attach_16bit.log")
> t16 = c(0:(2**16-1))
> ks.test(e16, t16)
D = 0.0178, p-value = 0.5422

Again, D-statistic an p-value are almost the same as previously calculated "manually". P-value is very high (it is not as high as in the 6-12 bits tests, but consider much lower number of observations: 2081 vs 162833), giving almost sureness that you have captured real 16-bits entropy!

Regards,
Mariusz

_______________________________________________
freebsd-security@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-security
To unsubscribe, send any mail to "freebsd-security-unsubscr...@freebsd.org"

Reply via email to