Re: [R] simulation dichotomous data

Charles Determan Jr Fri, 01 Aug 2014 06:53:30 -0700

Please remember the 'reply all' for the r-help page.

First Question: How can i use Pearson correlation with dichotomous data? i
want to use a correlation between dichotomous variables like spearman
correlation in ordered categorical variables?

cor(variable1, variable2, *method = "pearson"*)

Second Question: Would like two separate populations (1000 samples, 10
var).  Variables *within* datasets highly correlated, minimal correlation
*between* datasets.

As I have stated in a previous response, the code you have is sufficient.
You can go through as many variables as you like *for each dataset* and
induce correlations.  You should do this for as many variables as you
require to be correlated.  As the code induces these correlations randomly,
there should be *minimal* correlation between datasets but still some if
the datasets have the same structure (same variables correlated within).
If different variables are correlated within each, then the correlation
between datasets would likely be lower.  It is extremely unrealistic to
believe that there will be absolutely no correlation between datasets so
you must decide at which point you consider it sufficiently low.

One final point, in the code section "# subset variable to have a stronger
correlation", you can only do one at a time or you must change the name of
the second object otherwise you are just overwriting the previous 'v1'.

You have described what you want to me and you have the code to do it.  The
major hurdle here would be an implementation of some 'for loops', which is
not terribly complex if you are working on your programming.  However, they
are not necessary if you just want to write several lines with new object
names for each variable in each dataset.  Give it a try, you know how to
induce correlations now.  Just chose which variables to correlate and do it
for all of those for each dataset and compare.

Regards,
Dr. Charles Determan

On Thu, Jul 31, 2014 at 9:10 AM, thanoon younis <thanoon.youni...@gmail.com>
wrote:

> Many thanks to you
>
> firstly : how can i use Pearson correlation with dichotomous data? i want
> to use a correlation between dichotomous variables like spearman
> correlation in ordered categorical variables.
>
> secondly: i have two different population and each population has 1000
> samples and 10 var. so i want to put a high correlation coefficient between
> variables in the  first population and also put a high correlation
> coefficient between variables in the  second population and no correlation
> between two populations because i want to use multiple group structural
> equation models.
>
>
> many thanks again
>
> Thanoon
>
>
>
>
> On 31 July 2014 16:45, Charles Determan Jr <deter...@umn.edu> wrote:
>
>> Thanoon,
>>
>> You should still send the question to the R help list even when I helped
>> you with the code you are currently using.  I will not always know the best
>> way or even how to proceed with some questions.  As for to your question
>> with the code below.
>>
>> Firstly, there is no 'phi' method for cor in base R.  If you are using
>> it, you must have neglected to include a package you are using.  However,
>> given that the phi coefficient is equal to the pearson coefficient for
>> dichotomous data, you can use the 'pearson' method.
>>
>> Secondly, with respect to your primary concern.  In this case, we have
>> randomly chosen variables to correlate between two INDEPENDENT DATASETS
>> (i.e. different groups of samples).  The idea with this code is that R1 and
>> R2 are datasets of 1000 samples and 10 variables.  It would be miraculous
>> if they correlated when each had variables randomly assigned as
>> correlated.  The code work correctly, the question now becomes if you want
>> to see correlations across variables for all samples (which this does for
>> each DATASET) or if you want two DATASETS to be correlated.
>>
>> ords <- seq(0,1)
>> p <- 10
>> N <- 1000
>> percent_change <- 0.9
>>
>> R1 <- as.data.frame(replicate(p, sample(ords, N, replace = T)))
>> R2 <- as.data.frame(replicate(p, sample(ords, N, replace = T)))
>>
>> # phi is more appropriate for dichotomous data
>> cor(R1, method = "phi")
>> cor(R2, method = "phi")
>>
>> # subset variable to have a stronger correlation
>> v1 <- R1[,1, drop = FALSE]
>> v1 <- R2[,1, drop = FALSE]
>>
>> # randomly choose which rows to retain
>> keep <- sample(as.numeric(rownames(v1)), size = percent_change*nrow(v1))
>> change <- as.numeric(rownames(v1)[-keep])
>>
>> # randomly choose new values for changing
>> new.change <- sample(ords, ((1-percent_change)*N)+1, replace = T)
>>
>> # replace values in copy of original column
>> v1.samp <- v1
>> v1.samp[change,] <- new.change
>>
>> # closer correlation
>> cor(v1, v1.samp, method = "phi")
>>
>> # set correlated column as one of your other columns
>> R1[,2] <- v1.samp
>> R2[,2] <- v1.samp
>> R1
>> R2
>>
>>
>> On Thu, Jul 31, 2014 at 7:29 AM, thanoon younis <
>> thanoon.youni...@gmail.com> wrote:
>>
>>> dear Dr. Charles
>>> i have a problem with the following R - program in simulation data with
>>> 2 different samples and with high correlation between variables in each
>>> sample so when i applied the program i got on a results but without
>>> correlation between each sample.
>>> i appreciate your help and your time
>>> i did not send this code to R- help because you helped me before to
>>> write it .
>>>
>>> many thanks to you
>>>
>>> Thanoon
>>>
>>
>>
>>
>> --
>> Dr. Charles Determan, PhD
>> Integrated Biosciences
>>
>
>

-- 
Dr. Charles Determan, PhD
Integrated Biosciences

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simulation dichotomous data

Reply via email to