Hi, I am using the sim1000G R package to simulate data for case/control study. I can not figure out how to manipulate this code to be able to generate 10% or 50% causal SNPs in R.
This is whole code provided as example on GitHub: library(sim1000G) vcf_file = "region-chr4-357-ANK2.vcf.gz" #nvariants = 442, ss=1000 vcf = readVCF( vcf_file, maxNumberOfVariants = 442 ,min_maf = 0.0005,max_maf = 0.01) #lowest MAF dim( vcf$gt1 ) #rows represent number of variants, columns represent number of individuals ## Download and use full chromosome genetic map downloadGeneticMap(4) readGeneticMap(4) sample.size=3000 startSimulation(vcf, totalNumberOfIndividuals = sample.size) data_sim = function(seed.num){ SIM$reset() id = generateUnrelatedIndividuals(sample.size) gt = retrieveGenotypes(id) freq = apply(gt,2,sum)/(2*nrow(gt)) causal = sample(setdiff(1:ncol(gt),which(freq==0)),45) beta.sign = rep(1,45) c.value = 0.402 beta.abs = c.value*abs(log10(freq[causal])) beta.val = beta.sign*beta.abs x.bar = apply(gt[,causal],2,mean) x.bar = as.matrix(x.bar) beta.val = t(as.matrix(beta.val)) #disease prvalance = 1% #beta0 = -log(99)-beta.val %*% x.bar #disease prvalance = 1.5% beta0 = 0-beta.val %*% x.bar eta = beta.val %*% t(gt[,causal]) eta = as.vector(eta) + rep(beta0,nrow(gt)) prob = exp(eta)/(1+exp(eta)) genocase = rep(NA, sample.size) set.seed(seed.num) for(i in 1:sample.size){ genocase[i] = rbinom(1, 1, prob[i]) } case.idx = sample(which(genocase==1),1000) control.idx = sample(which(genocase==0),1000) return(rbind(gt[case.idx,],gt[control.idx,])) } How I can modify code in a way that it will simulate: 50 % of causal SNPs** ( exmp. 24 causal variants and 24 non causal SNPs) 10 % of causal SNP (exmpl. 5 causal and 43 non causal SNPs) Thanks a lot for any suggestion. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.