Re: [R] Convergence in Monte Carlo Simulation
Hello, I put together the following code and am curious about its correctness. My first question relates to the Monte Carlo simulations – the goal is to continue to iterate until I get n = 1000 simulations where the model successfully converges. I am wondering if I coded it correctly below with the while loop. Is the idea that the counter increments by one only if “model” does not return a string? I would also like to know how I can create n = 1000 independent data sets. I think to do this, I would have to set a random number seed via set.seed() before the creation of each dataset. Where would I enter set.seed in the syntax below? Would it be in the function (as indicated in red)? powercrosssw <- function(nclus, clsize) { set.seed() cohortsw <- genData(nclus, id = "cluster") cohortsw <- addColumns(clusterDef, cohortsw) cohortswTm <- addPeriods(cohortsw, nPeriods = 8, idvars = "cluster", perName = "period") cohortstep <- trtStepWedge(cohortswTm, "cluster", nWaves = 4, lenWaves = 1, startPer = 1, grpName = "Ijt") pat <- genCluster(cohortswTm, cLevelVar = "timeID", numIndsVar = clsize, level1ID = "id") dx <- merge(pat[, .(cluster, period, id)], cohortstep, by = c("cluster", "period")) dx <- addColumns(patError, dx) setkey(dx, id, cluster, period) dx <- addColumns(outDef, dx) return(dx) } i=1 while (i < 1000) { dx <- powercrosssw() #Fit multi-level model to simulated dataset model5 <- tryCatch(lme(y ~ factor(period) + factor(Ijt), data = dx, random = ~1|cluster, method = "REML"), warning = function(w) { "warning" } ) if (! is.character(model5)) { coeff <- coef(summary(model5))["factor(Ijt)1", "Value"] pvalue <- coef(summary(model5))["factor(Ijt)1", "p-value"] error <- coef(summary(model5))["factor(Ijt)1", "Std.Error"] bresult <- c(bresult, coeff) presult <- c(presult, pvalue) eresult <- c(eresult, error) i <- i + 1 } } Thank you so much. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with a (g)lmer code
Hi Saudi, Apologies for the delay. (also returning to the list) In your initial code: model1<- lmer (better ~ gender + age + education + WF + (1 | part), > data=sub_data) you have age as a fixed effect and there are also 36 levels. This probably causing the error you describe above and I have changed it to a random factor. Your response variable is "better", which has the same levels as WF and is not numeric. This looks like a mistake. I have written four models with the "hum" and "cul" variables as response variables. This looks more sensible to me. The levels of the "education" variable are not ordered correctly. The following code runs okay, but there is a singular fit for EA_cul. The effects seem to be of education, except for the EA_cul model. The following may get you started: sub_data<-read.csv("sub_data.csv",stringsAsFactors=FALSE) # get the education factor into the correct order sub_data$education<-factor(sub_data$education, levels=c("seconadry or below","university","postgrad")) library(lme4) modelSA_hum<-lmer(SA_hum~gender+education+WF+(1|age),data=sub_data) modelSA_cul<-lmer(SA_cul~gender+education+WF+(1|age),data=sub_data) modelEA_hum<-lmer(EA_hum~gender+education+WF+(1|age),data=sub_data) modelEA_cul<-lmer(EA_cul~gender+education+WF+(1|age),data=sub_data) summary(modelSA_hum) summary(modelSA_cul) summary(modelEA_hum) summary(modelEA_cul) # look at the distribution of responses table(sub_data$SA_hum) table(sub_data$SA_cul) table(sub_data$EA_hum) table(sub_data$EA_cul) Jim On Sun, Jun 14, 2020 at 9:42 AM Saudi Sadiq wrote: > > Hi Jim, > Hope you are safe and sound. > So sorry to bother you again. I am still waiting for your reply after I have > attached the dataset. > I know you are very busy, but I will appreciate it a lot if you can guide me > in how to make the g(lmer) mkdel work, or guide me to something different. > All the best > > -- Forwarded message - > From: Saudi Sadiq > Date: Fri, 12 Jun 2020, 4:18 pm > Subject: Re: [R] Help with a (g)lmer code > To: Jim Lemon > Cc: r-help mailing list > > > Hi Jim, > > So many thanks for your reply. I actually made a mistake in presenting the > problem; I should have clarified that the 1-10 linear scale questions went > as: 10 most humorous/closest to Egyptian culture and 1 the least. Also, I > should have attached some examples so the participant issue could be clear. > Here is attached the dataset (if there is no problem or I am not going > against the rules of the R-help group). > > Actually, I wanted better to be the only dependent factor and asking > participants 'which subtitle is better?' could be enough, but I wanted to > have detailed information of why a subtitle is better by asking participants > specific questions (regarding which subtitle is more humorous and closer to > Egyptian culture). Most of the time, the total of the hum + cul = better, but > sometimes it is not (e.g. the sum for subtitle EA could be bigger than for > SA, but the participant prefers SA in the better column). > > The WF (watched first) is the mode via which participants watched the two > subtitles; some participants watched the SA subtitle first and other watched > the EA first. > > Does this make sense? > > All the best > > > On Thu, 11 Jun 2020 at 05:24, Jim Lemon wrote: >> >> Hi Saudi, >> I can only make a guess, but that is that a variable having a unique >> value for each participant has been read in as a factor. I assume that >> "better" is some combination of "hum" and "cul" and exactly what is >> WF? >> >> Jim >> >> On Thu, Jun 11, 2020 at 5:27 AM Saudi Sadiq wrote: >> > >> > Dear Sir/Madam, >> > >> > Hope everyone is safe and sound. I appreciate your help a lot. >> > >> > I am evaluating two Arabic subtitles of a humorous English scene and asked >> > 263 participants (part) to evaluate the two subtitles (named Standard >> > Arabic, SA, and Egyptian Arabic, EA) via a questionnaire that asked them to >> > rank the two subtitles in terms of how much each subtitle is >> > >> > 2) more humorous (hum), >> > >> > 5) closer to Egyptian culture (cul) >> > >> > >> > >> > The questionnaire contained two 1-10 linear scale questions regarding the 2 >> > points clarified, with 1 meaning the most humorous and closest to Egyptian >> > culture, and 1 meaning the least humorous and furthest from Egyptian >> > culture. Also, the questionnaire had a general multiple-choice question >> > regarding which subtitle is better in general (better). General information >> > about the participants were also collected concerning gender (categorical >> > factor), age (numeric factor) and education (categorical factor). >> > >> > Two versions of the questionnaire were relied on: one showing the ‘SA >> > subtitle first’ and another showing the ‘EA subtitle first’. Nearly half >> > the participants answered the first and nearly half answered the latter. >> > >> > I am focusing on which social factor/s lead/s the participants to evaluate >> > on
Re: [R] Convergence in Monte Carlo Simulation
I am not 100% clear what your code is doing as it gets a bit wangled as you posted in HTML but here are a couple of thoughts. You need to set the seed outside any loops so it happens once and for all. I would test after trycatch and keep a separate count of failures and successes as the failure to converge must be meaningful about the scientific question whatever that is. At the moment your count appears to be in the correct place to count successes. Michael On 14/06/2020 02:50, Phat Chau wrote: Hello, I put together the following code and am curious about its correctness. My first question relates to the Monte Carlo simulations – the goal is to continue to iterate until I get n = 1000 simulations where the model successfully converges. I am wondering if I coded it correctly below with the while loop. Is the idea that the counter increments by one only if “model” does not return a string? I would also like to know how I can create n = 1000 independent data sets. I think to do this, I would have to set a random number seed via set.seed() before the creation of each dataset. Where would I enter set.seed in the syntax below? Would it be in the function (as indicated in red)? powercrosssw <- function(nclus, clsize) { set.seed() cohortsw <- genData(nclus, id = "cluster") cohortsw <- addColumns(clusterDef, cohortsw) cohortswTm <- addPeriods(cohortsw, nPeriods = 8, idvars = "cluster", perName = "period") cohortstep <- trtStepWedge(cohortswTm, "cluster", nWaves = 4, lenWaves = 1, startPer = 1, grpName = "Ijt") pat <- genCluster(cohortswTm, cLevelVar = "timeID", numIndsVar = clsize, level1ID = "id") dx <- merge(pat[, .(cluster, period, id)], cohortstep, by = c("cluster", "period")) dx <- addColumns(patError, dx) setkey(dx, id, cluster, period) dx <- addColumns(outDef, dx) return(dx) } i=1 while (i < 1000) { dx <- powercrosssw() #Fit multi-level model to simulated dataset model5 <- tryCatch(lme(y ~ factor(period) + factor(Ijt), data = dx, random = ~1|cluster, method = "REML"), warning = function(w) { "warning" } ) if (! is.character(model5)) { coeff <- coef(summary(model5))["factor(Ijt)1", "Value"] pvalue <- coef(summary(model5))["factor(Ijt)1", "p-value"] error <- coef(summary(model5))["factor(Ijt)1", "Std.Error"] bresult <- c(bresult, coeff) presult <- c(presult, pvalue) eresult <- c(eresult, error) i <- i + 1 } } Thank you so much. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R 4.0.1 built with Intel Composer 19.1.1, error in R CMD make check on CentOS 7.7
Hi there, Built R 4.0.1 with the Intel Composer 19.1.1. Build seems to go fine. I built it like this: module purge module load intel/19.1.1 module list export CC=icc export CXX=icpc export F77=ifort export FC=ifort export AR=xiar export LD=xild export CFLAGS="-O3 -ipo -qopenmp -axAVX,CORE-AVX2,CORE-AVX512" export F77FLAGS="-O3 -ipo -qopenmp -axAVX,CORE-AVX2,CORE-AVX512" export FFLAGS="-O3 -ipo -qopenmp -axAVX,CORE-AVX2,CORE-AVX512" export CXXFLAGS="-O3 -ipo -qopenmp -axAVX,CORE-AVX2,CORE-AVX512" export MKL="-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread" VERSION=4.0.1 /scratch/novosirj/install-files/R-${VERSION}/configure --with-blas="$MKL" --with-lapack --prefix=/opt/sw/packages/intel-19_1/R-Project/${VERSION} && \ make -j32 && make check && make -j32 install However, the “make check" phase fails at this part: Testing examples for package ‘parallel’ make[2]: Leaving directory `/mnt/scratch/novosirj/R-4.0.1-intel-19.1-build/tests/Examples' make[1]: Leaving directory `/mnt/scratch/novosirj/R-4.0.1-intel-19.1-build/tests' make[1]: Entering directory `/mnt/scratch/novosirj/R-4.0.1-intel-19.1-build/tests' running strict specific tests make[2]: Entering directory `/mnt/scratch/novosirj/R-4.0.1-intel-19.1-build/tests' running code in '/scratch/novosirj/install-files/R-4.0.1/tests/eval-etc.R' ... OK comparing 'eval-etc.Rout' to '/scratch/novosirj/install-files/R-4.0.1/tests/eval-etc.Rout.save' ... OK running code in '/scratch/novosirj/install-files/R-4.0.1/tests/simple-true.R' ... OK comparing 'simple-true.Rout' to '/scratch/novosirj/install-files/R-4.0.1/tests/simple-true.Rout.save' ... OK running code in '/scratch/novosirj/install-files/R-4.0.1/tests/arith-true.R' ... OK comparing 'arith-true.Rout' to '/scratch/novosirj/install-files/R-4.0.1/tests/arith-true.Rout.save' ... OK running code in '/scratch/novosirj/install-files/R-4.0.1/tests/arith.R' ... OK comparing 'arith.Rout' to '/scratch/novosirj/install-files/R-4.0.1/tests/arith.Rout.save' ... OK running code in '/scratch/novosirj/install-files/R-4.0.1/tests/lm-tests.R' ... OK comparing 'lm-tests.Rout' to '/scratch/novosirj/install-files/R-4.0.1/tests/lm-tests.Rout.save' ... OK /bin/sh: line 1: 62064 Segmentation fault (core dumped) LANGUAGE=en LC_ALL=C SRCDIR=/scratch/novosirj/install-files/R-4.0.1/tests R_DEFAULT_PACKAGES= ../bin/R --vanilla < /scratch/novosirj/install-files/R-4.0.1/tests/ok-errors.R > ok-errors.Rout.fail 2>&1 running code in '/scratch/novosirj/install-files/R-4.0.1/tests/ok-errors.R' ...make[2]: *** [ok-errors.Rout] Error 1 make[2]: Leaving directory `/mnt/scratch/novosirj/R-4.0.1-intel-19.1-build/tests' make[1]: *** [test-Specific] Error 2 make[1]: Leaving directory `/mnt/scratch/novosirj/R-4.0.1-intel-19.1-build/tests' make: *** [test-all-basics] Error 1 Is this something I should be concerned about, or something I can fix? Not seeing any real information about what’s going wrong here. Here’s what’s contained in ok-errors.Rout.fail: --- R version 4.0.1 (2020-06-06) -- "See Things Now" Copyright (C) 2020 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > STRICT test suite in the spirit of no-segfaults, > but with explicit statements. > > options(error=expression(NULL)) > stop("test of `options(error=expression(NULL))'") Error: test of `options(error=expression(NULL))' > > if(FALSE) { + ## these ought to work on machines with enough memory + ## These segfaulted in 1.3.x , give "could not allocate" errors now + integer(2^30+1) +double(2^30+1) + complex(2^30+1) + character(2^30+1) + vector("list", 2^30+2) + } > > ## bad infinite recursion / on.exit / ... interactions > ## catch the error to permit different error messages emitted > ## (handling of infinite recursion is different in the AST interpreter > ## and the byte-code interpreter) > > bar <- function() 1+1 > foo <- function() { on.exit(bar()); foo() } > tryCatch(foo(), error=function(x) TRUE) # now simple "infinite recursion" *** caught segfault *** address 0x7fff4dc1b9f8, cause 'memory not mapped' Traceback: 1: foo() 2: foo() 3: foo() 4: foo() ... 2712: foo() 2713: foo() 2714: foo() 2715: foo() 2716: foo() 2717: foo() 2718: foo() 2719: doTryCatch(return(expr), name, parentenv, handler) 2720: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 2721: tryCatchList(expr, classes, parentenv, handlers) 2722: tryCatch(foo(), error = function(x) TRUE) An irrecoverable exception occurr
Re: [R] Convergence in Monte Carlo Simulation
Thank you Michael. I will clarify some more. The function in the first part of the code that I posted generates the simulated dataset for a cluster randomized trial from the simstudy package. I am not quite clear what you mean by placing it outside the loop. So the goal here is to create n = 1000 independent datasets with different (randomly drawn values from the specified normal distributions not shown) for all of the parameters. What I have tried to do is place the seed at the very top of all my code in the past, but what that does is it leads to the creation of a single dataset that gets repeated over and over n = 1000 times. Hence, there ends up being no variability in the data (and power estimates from the p-values given the stated and required power). Regarding the counter, is it correct in this instance that the loop will continue until n = 1000 iterations have successfully converged? I am not so concerned with counting failures. Thank you. Edward On 2020-06-14, 6:46 AM, "Michael Dewey" wrote: I am not 100% clear what your code is doing as it gets a bit wangled as you posted in HTML but here are a couple of thoughts. You need to set the seed outside any loops so it happens once and for all. I would test after trycatch and keep a separate count of failures and successes as the failure to converge must be meaningful about the scientific question whatever that is. At the moment your count appears to be in the correct place to count successes. Michael On 14/06/2020 02:50, Phat Chau wrote: > Hello, > > I put together the following code and am curious about its correctness. My first question relates to the Monte Carlo simulations – the goal is to continue to iterate until I get n = 1000 simulations where the model successfully converges. I am wondering if I coded it correctly below with the while loop. Is the idea that the counter increments by one only if “model” does not return a string? > > I would also like to know how I can create n = 1000 independent data sets. I think to do this, I would have to set a random number seed via set.seed() before the creation of each dataset. Where would I enter set.seed in the syntax below? Would it be in the function (as indicated in red)? > > powercrosssw <- function(nclus, clsize) { > >set.seed() > >cohortsw <- genData(nclus, id = "cluster") >cohortsw <- addColumns(clusterDef, cohortsw) >cohortswTm <- addPeriods(cohortsw, nPeriods = 8, idvars = "cluster", perName = "period") >cohortstep <- trtStepWedge(cohortswTm, "cluster", nWaves = 4, lenWaves = 1, startPer = 1, grpName = "Ijt") > >pat <- genCluster(cohortswTm, cLevelVar = "timeID", numIndsVar = clsize, level1ID = "id") > >dx <- merge(pat[, .(cluster, period, id)], cohortstep, by = c("cluster", "period")) >dx <- addColumns(patError, dx) > >setkey(dx, id, cluster, period) > >dx <- addColumns(outDef, dx) > >return(dx) > > } > > i=1 > > while (i < 1000) { > >dx <- powercrosssw() > >#Fit multi-level model to simulated dataset >model5 <- tryCatch(lme(y ~ factor(period) + factor(Ijt), data = dx, random = ~1|cluster, method = "REML"), > warning = function(w) { "warning" } >) > >if (! is.character(model5)) { > > coeff <- coef(summary(model5))["factor(Ijt)1", "Value"] > pvalue <- coef(summary(model5))["factor(Ijt)1", "p-value"] > error <- coef(summary(model5))["factor(Ijt)1", "Std.Error"] > bresult <- c(bresult, coeff) > presult <- c(presult, pvalue) > eresult <- c(eresult, error) > > i <- i + 1 >} > } > > Thank you so much. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convergence in Monte Carlo Simulation
Dear Edward Every time you call your function powercrosssw() it resets the seed so you must be calling it multiple times in some way. Michael On 14/06/2020 13:57, Phat Chau wrote: Thank you Michael. I will clarify some more. The function in the first part of the code that I posted generates the simulated dataset for a cluster randomized trial from the simstudy package. I am not quite clear what you mean by placing it outside the loop. So the goal here is to create n = 1000 independent datasets with different (randomly drawn values from the specified normal distributions not shown) for all of the parameters. What I have tried to do is place the seed at the very top of all my code in the past, but what that does is it leads to the creation of a single dataset that gets repeated over and over n = 1000 times. Hence, there ends up being no variability in the data (and power estimates from the p-values given the stated and required power). Regarding the counter, is it correct in this instance that the loop will continue until n = 1000 iterations have successfully converged? I am not so concerned with counting failures. Thank you. Edward On 2020-06-14, 6:46 AM, "Michael Dewey" wrote: I am not 100% clear what your code is doing as it gets a bit wangled as you posted in HTML but here are a couple of thoughts. You need to set the seed outside any loops so it happens once and for all. I would test after trycatch and keep a separate count of failures and successes as the failure to converge must be meaningful about the scientific question whatever that is. At the moment your count appears to be in the correct place to count successes. Michael On 14/06/2020 02:50, Phat Chau wrote: > Hello, > > I put together the following code and am curious about its correctness. My first question relates to the Monte Carlo simulations – the goal is to continue to iterate until I get n = 1000 simulations where the model successfully converges. I am wondering if I coded it correctly below with the while loop. Is the idea that the counter increments by one only if “model” does not return a string? > > I would also like to know how I can create n = 1000 independent data sets. I think to do this, I would have to set a random number seed via set.seed() before the creation of each dataset. Where would I enter set.seed in the syntax below? Would it be in the function (as indicated in red)? > > powercrosssw <- function(nclus, clsize) { > >set.seed() > >cohortsw <- genData(nclus, id = "cluster") >cohortsw <- addColumns(clusterDef, cohortsw) >cohortswTm <- addPeriods(cohortsw, nPeriods = 8, idvars = "cluster", perName = "period") >cohortstep <- trtStepWedge(cohortswTm, "cluster", nWaves = 4, lenWaves = 1, startPer = 1, grpName = "Ijt") > >pat <- genCluster(cohortswTm, cLevelVar = "timeID", numIndsVar = clsize, level1ID = "id") > >dx <- merge(pat[, .(cluster, period, id)], cohortstep, by = c("cluster", "period")) >dx <- addColumns(patError, dx) > >setkey(dx, id, cluster, period) > >dx <- addColumns(outDef, dx) > >return(dx) > > } > > i=1 > > while (i < 1000) { > >dx <- powercrosssw() > >#Fit multi-level model to simulated dataset >model5 <- tryCatch(lme(y ~ factor(period) + factor(Ijt), data = dx, random = ~1|cluster, method = "REML"), > warning = function(w) { "warning" } >) > >if (! is.character(model5)) { > > coeff <- coef(summary(model5))["factor(Ijt)1", "Value"] > pvalue <- coef(summary(model5))["factor(Ijt)1", "p-value"] > error <- coef(summary(model5))["factor(Ijt)1", "Std.Error"] > bresult <- c(bresult, coeff) > presult <- c(presult, pvalue) > eresult <- c(eresult, error) > > i <- i + 1 >} > } > > Thank you so much. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Michael http://www.dewey.myzen.co.uk/home.html -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,