Re: [R] Generating an autocorrelated binary variable

Simon Zehnder Tue, 02 Oct 2012 06:18:50 -0700

Thank you very much for your answer Rolf. It helped. 

I try to simulate a trade indicator model from market microstructure, where the 
1 or -1 indicate a buyer or seller initiated trade respectively. 
I use a Gaussian copula for simulation, so I can put in some correlation if I 
want to. So I generate my multivariate Gaussian sample and transform it to a 
uniform sample so I can use whatever marginal distribution I want to. 
Then I generate the autocorrelated binary by beginning with a transformation on 
the first observation and then by using the uniform sample to compare it to 
0.6. Afterwards I have a 0.6 autocorrelated binary variable:


"sampleCop" <- function(n = 1000, rho = 0.6) {
        
        require(splus2R)
        mvrs <- rmvnorm(n + 1, mean = rep(0, 3), cov = diag(3))
        pmvrs <- pnorm(mvrs, 0, 1)
        var1 <- matrix(0, nrow = n + 1, ncol = 1)
        var1[1] <- qbinom(pmvrs[1, 1], 1, 0.5)
        if(var1[1] == 0) var1[1] <- -1
        for(i in  1:(nrow(pmvrs) - 1)) {
                if(pmvrs[i + 1, 1] <= rho) var1[i + 1] <- var1[i]
                else var1[i + 1] <- var1[i] * (-1)
        }
        sample <- matrix(0, nrow = n, ncol = 4)
        sample[, 2] <- var1[1:nrow(var1) - 1]
        sample[, 3] <- var1[2:nrow(var1)]
        sample[, 4] <- qnorm(pmvrs[1:nrow(var1) - 1, 2], 0, 1, 1, 0) + 
qnorm(pmvrs[1:nrow(var1) - 1, 3], 0, 1, 1, 0) - qnorm(pmvrs[2:nrow(var1), 3], 
0, 1, 1, 0)
        sample[, 1] <- sample[,2] * (0.015 + 0.03) - sample[,3] * (0.015 + 
0.2*0.03) + sample[,4]
        
        sample
        
}

It is interesting though, that if I run my GMM-model on it it needs very high 
numbers of observations to converge to the real parameters. Using the 
gmm-function

"gmmfun" <- function(theta, data) {
        
        res = data[,1] - (theta[1] + theta[2]) * data[,2] + (theta[1] + 
theta[3] * theta[2]) * data[,3]
    
        moments = matrix(0, nrow=nrow(data), ncol=3)
        moments[,1] = data[,2] * data[,3] - data[,2]^2 * theta[3]
        moments[,2] = res * data[,2]
        moments[,3] = res * data[,3]

        return(moments)
}

And the objective function 

"obj_fun" <- function(theta, data) {
        moments <- gmmfun(theta, data)
        moments_sum <- colSums(moments)
        val <- moments_sum %*% diag(3) %*% moments_sum * 1.0 / nrow(data)
        val
}

Running then the command 

optim(fn = obj_fun, par = c(0.05, 0.05, .1), method = "BFGS")

gives a result with the first parameter fairly high away from its true value 
(true values for the three parameters were (0.015, 0.03, 0.2)). I wonder if 
this is due to the autocorrelation in the binary variable (btw. I also tried 
here lm, as the third parameter can be estimated via the correlation, but it 
remains highly dependent on the number of observations).

Any suggestions here?

Best regards

Simon 

On Sep 28, 2012, at 5:02 AM, Rolf Turner <rolf.tur...@xtra.co.nz> wrote:

> 
> I have no idea what your code is doing, nor why you want correlated binary
> variables.  Correlation makes little or no sense in the context of binary 
> random
> variables --- or more generally in the context of discrete random variables.
> 
> Be that as it may, it is an easy calculation to show that if X and Y are 
> binary
> random variables both with success probability of 0.5 then cor(X,Y) = 0.2 if
> and only if Pr(X=1 | Y = 1) = 0.6.  So just generate X and Y using that fact:
> 
> set.seed(42)
> X <- numeric(1000)
> Y <- numeric(1000)
> for(i in 1:1000) {
>   Y[i] <- rbinom(1,1,0.5)
>   X[i] <- if(Y[i]==1) rbinom(1,1,0.6) else rbinom(1,1,0.4)
> }
> 
> # Check:
> cor(X,Y) # Get 0.2012336
> 
> Looks about right.  Note that the sample proportions are 0.484 and
> 0.485 for X and Y respectively.  These values do not differ significantly
> from 0.5.
> 
>    cheers,
> 
>            Rolf Turner
> 
> On 28/09/12 08:26, Simon Zehnder wrote:
>> Hi R-fellows,
>> 
>> I am trying to simulate a multivariate correlated sample via the Gaussian 
>> copula method. One variable is a binary variable, that should be 
>> autocorrelated. The autocorrelation should be rho = 0.2. Furthermore, the 
>> overall probability to get either outcome of the binary variable should be 
>> 0.5.
>> Below you can see the R code (I use for simplicity a diagonal matrix in 
>> rmvnorm even if it produces no correlated sample):
>> 
>> "sampleCop" <- function(n = 1000, rho = 0.2) {
>>      
>>      require(splus2R)
>>      mvrs <- rmvnorm(n + 1, mean = rep(0, 3), cov = diag(3))
>>      pmvrs <- pnorm(mvrs, 0, 1)
>>      var1 <- matrix(0, nrow = n + 1, ncol = 1)
>>      var1[1] <- qbinom(pmvrs[1, 1], 1, 0.5)
>>      if(var1[1] == 0) var1[nrow(mvrs)] <- -1
>>      for(i in  1:(nrow(pmvrs) - 1)) {
>>              if(pmvrs[i + 1, 1] <= rho) var1[i + 1] <- var1[i]
>>              else var1[i + 1] <- var1[i] * (-1)
>>      }
>>      sample <- matrix(0, nrow = n, ncol = 4)
>>      sample[, 1] <- var1[1:nrow(var1) - 1]
>>      sample[, 2] <- var1[2:nrow(var1)]
>>      sample[, 3] <- qnorm(pmvrs[1:nrow(var1) - 1, 2], 0, 1, 1, 0)
>>      sample[, 4] <- qnorm(pmvrs[1:nrow(var1) - 1, 3], 0, 1, 1, 0)
>>      
>>      sample
>>      
>> }
>> 
>> Now, the code is fine, everything compiles. But when I compute the 
>> autocorrelation of the binary variable, it is not 0.2, but 0.6. Does anyone 
>> know why this happens?


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating an autocorrelated binary variable

Reply via email to