Please remember the 'reply all' for the r-help page. First Question: How can i use Pearson correlation with dichotomous data? i want to use a correlation between dichotomous variables like spearman correlation in ordered categorical variables?
cor(variable1, variable2, *method = "pearson"*) Second Question: Would like two separate populations (1000 samples, 10 var). Variables *within* datasets highly correlated, minimal correlation *between* datasets. As I have stated in a previous response, the code you have is sufficient. You can go through as many variables as you like *for each dataset* and induce correlations. You should do this for as many variables as you require to be correlated. As the code induces these correlations randomly, there should be *minimal* correlation between datasets but still some if the datasets have the same structure (same variables correlated within). If different variables are correlated within each, then the correlation between datasets would likely be lower. It is extremely unrealistic to believe that there will be absolutely no correlation between datasets so you must decide at which point you consider it sufficiently low. One final point, in the code section "# subset variable to have a stronger correlation", you can only do one at a time or you must change the name of the second object otherwise you are just overwriting the previous 'v1'. You have described what you want to me and you have the code to do it. The major hurdle here would be an implementation of some 'for loops', which is not terribly complex if you are working on your programming. However, they are not necessary if you just want to write several lines with new object names for each variable in each dataset. Give it a try, you know how to induce correlations now. Just chose which variables to correlate and do it for all of those for each dataset and compare. Regards, Dr. Charles Determan On Thu, Jul 31, 2014 at 9:10 AM, thanoon younis <thanoon.youni...@gmail.com> wrote: > Many thanks to you > > firstly : how can i use Pearson correlation with dichotomous data? i want > to use a correlation between dichotomous variables like spearman > correlation in ordered categorical variables. > > secondly: i have two different population and each population has 1000 > samples and 10 var. so i want to put a high correlation coefficient between > variables in the first population and also put a high correlation > coefficient between variables in the second population and no correlation > between two populations because i want to use multiple group structural > equation models. > > > many thanks again > > Thanoon > > > > > On 31 July 2014 16:45, Charles Determan Jr <deter...@umn.edu> wrote: > >> Thanoon, >> >> You should still send the question to the R help list even when I helped >> you with the code you are currently using. I will not always know the best >> way or even how to proceed with some questions. As for to your question >> with the code below. >> >> Firstly, there is no 'phi' method for cor in base R. If you are using >> it, you must have neglected to include a package you are using. However, >> given that the phi coefficient is equal to the pearson coefficient for >> dichotomous data, you can use the 'pearson' method. >> >> Secondly, with respect to your primary concern. In this case, we have >> randomly chosen variables to correlate between two INDEPENDENT DATASETS >> (i.e. different groups of samples). The idea with this code is that R1 and >> R2 are datasets of 1000 samples and 10 variables. It would be miraculous >> if they correlated when each had variables randomly assigned as >> correlated. The code work correctly, the question now becomes if you want >> to see correlations across variables for all samples (which this does for >> each DATASET) or if you want two DATASETS to be correlated. >> >> ords <- seq(0,1) >> p <- 10 >> N <- 1000 >> percent_change <- 0.9 >> >> R1 <- as.data.frame(replicate(p, sample(ords, N, replace = T))) >> R2 <- as.data.frame(replicate(p, sample(ords, N, replace = T))) >> >> # phi is more appropriate for dichotomous data >> cor(R1, method = "phi") >> cor(R2, method = "phi") >> >> # subset variable to have a stronger correlation >> v1 <- R1[,1, drop = FALSE] >> v1 <- R2[,1, drop = FALSE] >> >> # randomly choose which rows to retain >> keep <- sample(as.numeric(rownames(v1)), size = percent_change*nrow(v1)) >> change <- as.numeric(rownames(v1)[-keep]) >> >> # randomly choose new values for changing >> new.change <- sample(ords, ((1-percent_change)*N)+1, replace = T) >> >> # replace values in copy of original column >> v1.samp <- v1 >> v1.samp[change,] <- new.change >> >> # closer correlation >> cor(v1, v1.samp, method = "phi") >> >> # set correlated column as one of your other columns >> R1[,2] <- v1.samp >> R2[,2] <- v1.samp >> R1 >> R2 >> >> >> On Thu, Jul 31, 2014 at 7:29 AM, thanoon younis < >> thanoon.youni...@gmail.com> wrote: >> >>> dear Dr. Charles >>> i have a problem with the following R - program in simulation data with >>> 2 different samples and with high correlation between variables in each >>> sample so when i applied the program i got on a results but without >>> correlation between each sample. >>> i appreciate your help and your time >>> i did not send this code to R- help because you helped me before to >>> write it . >>> >>> many thanks to you >>> >>> Thanoon >>> >> >> >> >> -- >> Dr. Charles Determan, PhD >> Integrated Biosciences >> > > -- Dr. Charles Determan, PhD Integrated Biosciences [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.