Re: [R] P-values Kolmogorov–Smirnov test

Rui Barradas Thu, 05 Sep 2019 12:34:32 -0700

Hello,

Inline.


Às 20:09 de 05/09/19, Boo G. escreveu:

Hello again.

I have tied this before but I see two problems:
1) According to the documentation I could read (including the ks.testcode), the ks statistic would be max(abs(x - y)) and if you plot thisfor very low sample sizes you can actually see that this make sense. Theresults of ks.test(x, y) yields very different values.

The problem is that the distribution of Dn is very difficult to compute.From the reference [1] in the help page ?ks.test:

Kolmogorov's goodness-of-fit measure, Dn , for a sample CDF hasconsistently been set aside for methods such as the D+n or D-n ofSmirnov, primarily, it seems, because of the difficulty of computing thedistribution of Dn . As far as we know, no easy way to compute thatdistribution has ever been provided in the 70+ years since Kolmogorov'sfundamental paper. We provide one here, a C procedure that providesPr(Dn < d) with 13-15 digit accuracy for n ranging from 2 to at least 16000.

That is why I used ks.test and its Dn and p-values. Note that n >= 2,size = 1 is not covered (p-value == 1).

Also, the p-values distribution seem to become closer to a uniform withincreasing sizes. Try


hist(d_stat[801:1000, 3])


[1]https://www.jstatsoft.org/article/view/v008i18


Hope this helps,

Rui Barradas

2) Also in this case the p-values don’t make much sense, according to myprevious interpretation.
Again, I could be wrong in my interpretation.
On 5 Sep 2019, at 20:46, Rui Barradas <ruipbarra...@sapo.pt<mailto:ruipbarra...@sapo.pt>> wrote:
Hello,

I'm sorry, but apparently I missed the point of your problem.
Please do not take my previous answer seriously.
But you can use ks.test, just in a different way than what I wrotepreviously.
Corrected code:


#simulation
for (i in 1:1000) {
 #sample from the reference distribution
 m_2 <-m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
 m_2 <-m_2[order(m_2$d_1),]
 d_2 <- m_2$d_1
 p_2 <- m_2$p_1

 #weighted ecdf for the reference distribution and the sample
 f_d_1 <- ewcdf(d_1, normalise=F)
 f_d_2 <- ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))

 #kolmogorov-smirnov distance
 x <- f_d_1(d_2)
 y <- f_d_2(d_2)
 ht <- ks.test(x, y)
 d_stat[i, 2] <- ht$statistic
 d_stat[i, 3] <- ht$p.value
}


Hope this helps,

Rui Barradas

Às 19:29 de 05/09/19, Rui Barradas escreveu:
Hello,
I don't have the algorithms at hand but the KS statistic calculationis more complicated than your max/abs difference.
Anyway, why not use ks.test? it's not that difficult:
set.seed(1234)
#reference distribution
d_1 <- sort(rpois(1000, 500))
p_1 <- d_1/sum(d_1)
m_1 <- data.frame(d_1, p_1)
#data frame to store the values of the simulation
d_stat <- data.frame(1:1000, NA, NA)
names(d_stat) <- c("sample_size", "ks_distance", "p_value")
#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <-m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
d_2 <- m_2$d_1
ht <- ks.test(d_1, d_2)
#kolmogorov-smirnov distance
d_stat[i, 2] <- ht$statistic
d_stat[i, 3] <- ht$p.value
}
hist(d_stat[, 2])
hist(d_stat[, 3])
Note that d_2 is not sorted, but the results are equal in the senseof function identical(), meaning they are *exactly* the same. Whyshouldn't they?
Hope this helps,
Rui Barradas
Às 17:06 de 05/09/19, Boo G. escreveu:
Hello,
I am trying to perform a Kolmogorov–Smirnov test to assess thedifference between a distribution and samples drawn proportionallyto size of different sizes. I managed to compute theKolmogorov–Smirnov distance but I am lost with the p-value. I havelooked into the ks.test function unsuccessfully. Can anyone help mewith computing p-values for a two-tailed test?
Below a simplified version of my code.

Thanks in advance.
Gianluca


library(spatstat)

#reference distribution
d_1 <- sort(rpois(1000, 500))
p_1 <- d_1/sum(d_1)
m_1 <- data.frame(d_1, p_1)

#data frame to store the values of the siumation
d_stat <- data.frame(1:1000, NA, NA)
names(d_stat) <- c("sample_size", "ks_distance", "p_value")

#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <-m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
m_2 <-m_2[order(m_2$d_1),]
d_2 <- m_2$d_1
p_2 <- m_2$p_1

#weighted ecdf for the reference distribution and the sample
f_d_1 <- ewcdf(d_1, normalise=F)
f_d_2 <- ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))

#kolmogorov-smirnov distance
d_stat[i,2] <- max(abs(f_d_1(d_2) - f_d_2(d_2)))
}


    [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org <mailto:R-help@r-project.org> mailing list --To UNSUBSCRIBE and more, see
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=ZntDJlPtp%2Bu%2FeO7xNZLbUQgLwpvS1M%2FUVNwovp%2FZPmA%3D&amp;reserved=0
PLEASE do read the postingguidehttps://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=YKZNBeGme6V9QGyV2%2F150H6rZnLy9bX7xly%2BQEf6O14%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org <mailto:R-help@r-project.org>mailing list -- ToUNSUBSCRIBE and more, see
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&amp;data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=ZntDJlPtp%2Bu%2FeO7xNZLbUQgLwpvS1M%2FUVNwovp%2FZPmA%3D&amp;reserved=0
PLEASE do read the postingguidehttps://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=YKZNBeGme6V9QGyV2%2F150H6rZnLy9bX7xly%2BQEf6O14%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] P-values Kolmogorov–Smirnov test

Reply via email to