I don't think cov.wt uses frequency weights.  However, I don't think this is 
mentioned in its help page.

Here is some information about the difference:
http://stats.stackexchange.com/questions/61225/correct-equation-for-weighted-unbiased-sample-covariance

The frequency version isn't hard to program (below), and is probably somewhere 
in R already.

Chris

mywtcov <- function(x, frqwt=rep(1,nrow(x)), unbiased=TRUE) {
        if (is.data.frame(x)) x <- as.matrix(x)
        n <- sum(frqwt)
        center <- colSums(frqwt * x) / n
        xcw <- sqrt(frqwt) * sweep(x, 2, center, check.margin = TRUE)
        cov <- crossprod(xcw)
        cov <- if (unbiased) {
                cov/(n - 1)
        } else {
                cov/n
        }
        list(cov=cov, center=center, unbiased=unbiased)
}

mywtcov(mydata)

mywtcov(mytable[,1:2,], mytable[,3])

all.equal(mywtcov(mytable[,1:2,], mytable[,3])$cov, xcov)



-----Original Message-----
From: Emilio Torres Manzanera [mailto:tor...@uniovi.es] 
Sent: Sunday, March 30, 2014 6:31 PM
To: r-help@r-project.org
Subject: [R] cov.wt gives different results from other (co)variance functions 
(cov, wtd.var)

Dear  Sir,
I am not sure about the precision of the cov.wt function. It seems that it 
provides different results when using frequency weights. This discrepancy only 
occurs with the covariance matrix, not with the correlation matrix.
Do you know to how to solve this issue? Thank you
Best regards,
Emilio

rm(list=ls())
library(plyr)
library(Hmisc)

mydata <- iris[,1:2]
xcor <- cor(mydata)
xcov <- cov(mydata)
all.equal(cov.wt(mydata)$cov,xcov) # OK

## Now, we use frequency weights
mytable <- count(mydata) # Compute frequency table
all.equal(wtd.var(mytable[,1],weights=mytable$freq),  xcov[1,1]) # OK 
(Hmisc::wtd.var and cov)

# But with cov.wt
result <- cov.wt(mytable[,1:2],wt=mytable$freq,cor=TRUE)
all.equal(result$cov, xcov) # Wrong!
# "Mean relative difference: 0.003579418"
all.equal(wtd.var(mytable[,1],weights=mytable$freq),  result$cov[1,1]) # Wrong!
# "Mean relative difference: 0.003592277"
all.equal(result$cov[1,1], xcov[1,1]) # Wrong!
# "Mean relative difference: 0.003579418"

# The correlations are equal
all.equal(result$cor, xcor) # OK


sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] splines   grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Hmisc_3.14-0    Formula_1.1-1   survival_2.37-7 lattice_0.20-27
[5] plyr_1.8       

loaded via a namespace (and not attached):
[1] cluster_1.15.1      latticeExtra_0.6-26 RColorBrewer_1.0-5 

-- 
=================================================
Emilio Torres Manzanera
Fac. de Comercio - Universidad de Oviedo
c/ Luis Moya 261, E-33203 Gijón (Spain)
Tel. 985 182 197 email: tor...@uniovi.es


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be 
used for urgent or sensitive issues 
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to