Re: [R] Converting SAS Code
Hello, in my experience the most direct path of converting SAS code to R is by using dplyr. dplyr provides the filter function, the first part of your code could look like this, assuming your datasets are stored as data.frames: library(dplyr) yield <- filter(stress, field != "YV", field != "HV", barcode != "16187DD4015", barcode != "16187DD6002") (and so on for the other barcodes.) For mixed effects look into the lme4 package, lmer should use the reml criterion per default, the model specifications work very different in R. Look into the vingette [1] of the lme4 package chapter 2.1. gives an explanation of the used model formulas. You should get the coeficients of the fitted glmer model with the coef function. The Plots and univariate statistics work very different in R, have a look at the functions group_by and summarise provided by the dplyr package for calculating univariate statistics by groups, and the ggplot 2 package for plotting. Tobi [1] https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf On Fri, 2017-09-29 at 07:47 -0500, Andrew Harmon wrote: > Hello all, > > My statistical analysis training up until this point has been entirely done > in SAS. The code I frequently used was: > > *Yield Champagin; > > data yield; > > set stress; > > if field='YV' then delete; > > if field='HB' then delete; > > if barcode='16187DD4015' then delete; > > if barcode='16187DD6002' then delete; > > if barcode='16187DD2007' then delete; > > if barcode='16187DD5016' then delete; > > if barcode='16187DD8007' then delete; > > if barcode='16187DD7010' then delete; > > if barcode='16187DD7007' then delete; > > if barcode='16187DD8005' then delete; > > if barcode='16187DD6004' then delete; > > if barcode='16187DD5008' then delete; > > if barcode='16187DD7012' then delete; > > if barcode='16187DD6010' then delete; > > run; quit; > > > > Title'2016 Asilomar Stress Relief champagin yield'; > > proc mixed method=reml data=yield; > > class rep Management Foliar_Fungicide Chemical_Treatment; > > model Grain_Yield__Mg_h_ =Management|Foliar_Fungicide|Chemical_Treatment > Final_Stand__Plants_A_ / outpred=resids residual ddfm=kr; > > random rep rep*Management rep*Management*Foliar_Fungicide; > > lsmeans Management|Foliar_Fungicide|Chemical_Treatment / pdiff; > > ods output diffs=ppp lsmeans=means; > > ods listing exclude diffs lsmeans; > > run; quit; > > %include'C:\Users\harmon12\Desktop\pdmix800.sas'; > > %pdmix800(ppp,means,alpha=0.10,sort=yes); > > ods graphics off; > > run; quit; > > proc univariate data=resids normal plot; id Barcode Grain_Yield__Mg_h_ > pearsonresid; var resid; > proc print data=resids (obs=3);run; > > Can someone please help me convert my code to R? Any help would be much > appreciated. > > > Thanks, > > > Andrew Harmon > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] integral of the fuction for each value
Hi, I'm not entirely sure what you want to calculate. If you want to integrate by u in the interval from 0 to 4 for every value of x I would suggest something like this: x <- rnorm(10,0,1) f <- function(u,x){ exp((x-u)) } sapply(x, function(i){ integrate(f,lower=1,upper=4, x=i)$value }) You can just pass additional arguments to f to integrate, integrate always integrates by the first argument of the function passed as its first argument, see the help entry of integrate for more details. Best Regards, Tobias On Sun, 2018-10-21 at 12:02 +, malika yassa via R-help wrote: > hello > please you help me i have this functionx<-rnorm(10,0,1)f<-fuction(u,x) > {exp((x-u)}I want to calculate the integral of this function for each value > of x{for(i in 1:lenght(x)integrate(f,lower=1,upper=4) > thinks > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] as.factor and floating point numbers
Hello, I'm encountering the following error: In a package for survival analysis I use a data.frame is created, one column is created by applying unique on the event times while others are created by running table on the event times and the treatment arm. When there are event times very close together they are put in the same factor level when coerced to factor while unique outputs both values, leading to different lengths of the columns. Try this to reproduce: x <- c(1, 1+.Machine$double.eps) unique(x) table(x) Is there a general best practice to deal with such issues? Should calling table on floats be avoided in general? What can one use instead? One could easily iterate over the unique values and compare all values with the whole vector but this are N*N comparisons, compared to N*log(N) when sorting first and taking into account that the vector is sorted. I think for my purposes I'll round to a hundredth of a day before calling the function, but any advice on avoiding this issue an writing more fault tolerant code is greatly appreciated. all the best, Tobias [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.factor and floating point numbers
Hello, I'll reply in one mail to all. Thank you for your suggestions. I already tried Andrews solution with increasing the digits. In the most extreme case I encountered I had to take the maximum possible digits in format but it worked. Tims solution is also a good workaround but in this case I would have to know much about the user input. Valentins solution works and is surely the safest of the options but somehow more than I need. The case I encountered does not really need to deal with the levels, but just with the counts of every unique value across another variable. After thinking about it a little bit longer I came up with another solution that works alright for my purposes: I use table on the ranks. Since in the case I encountered the vector does not have duplicates and is already sorted, I can use table on the ranks of the vector and get the counts in the right order. Thanks Everyone, Tobias On Mittwoch, 25. Jänner 2023 20:59:16 CET Valentin Petzel wrote: > Hello Tobias, > > A factor is basically a way to get a character to behave like an integer. It > consists of an integer with values from 1 to nlev, and a character vector > levels, specifying for each value a level name. > > But this means that factors only really make sense with characters, and > anything that is not a character will be forced to be a character. Thus two > values that are represented by the same value in as.character will be > treated as the same. > > Now this is probably reasonable most of the time, as numeric values will > usually represent metric data, which tends to make little sense as factor. > But if we want to do this we can easily build or own factors from floats, > and even write some convenience wrapper around table, as shown in the > appended file. > > Best regards, > Valentin > > Am Mittwoch, 25. Jänner 2023, 10:03:01 CET schrieb Tobias Fellinger: > > Hello, > > > > I'm encountering the following error: > > > > In a package for survival analysis I use a data.frame is created, one > > column is created by applying unique on the event times while others are > > created by running table on the event times and the treatment arm. > > > > When there are event times very close together they are put in the same > > factor level when coerced to factor while unique outputs both values, > > leading to different lengths of the columns. > > > > Try this to reproduce: > > x <- c(1, 1+.Machine$double.eps) > > unique(x) > > table(x) > > > > Is there a general best practice to deal with such issues? > > > > Should calling table on floats be avoided in general? > > > > What can one use instead? > > > > One could easily iterate over the unique values and compare all values > > with > > the whole vector but this are N*N comparisons, compared to N*log(N) when > > sorting first and taking into account that the vector is sorted. > > > > I think for my purposes I'll round to a hundredth of a day before calling > > the function, but any advice on avoiding this issue an writing more fault > > tolerant code is greatly appreciated. > > > > all the best, Tobias > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsetting/slicing xml2 nodesets
Dear R-help members, I'm working with the xml2 package to parse an xml document, and I don't understand how subsetting / slicing of xml_nodesets works. I'd expect xml_find_all to only return children of the nodes I selected with [ or [[ but it returns all nodes found in the whole document. I did not find any documentation on the [ and [[ operators for xml_nodeset. Below is a small example and the sessionInfo. thanks in advance, Tobias Fellinger # load package require(xml2) # test document as text test_chr <- " paragraph 1 paragraph 2 " # parse test document test_doc <- read_xml(test_chr) # extract nodeset test_nodeset <- xml_find_all(test_doc, "//p") # subset nodeset (working as expected) test_nodeset[1] # {xml_nodeset (1)} # [1] paragraph 1 test_nodeset[[1]] # {xml_node} # # extract from subset (not working as expected) xml_find_all(test_nodeset[1], "//p") # {xml_nodeset (2)} # [1] paragraph 1 # [2] paragraph 2 xml_find_all(test_nodeset[[1]], "//p") # {xml_nodeset (2)} # [1] paragraph 1 # [2] paragraph 2 sessionInfo() # R version 3.6.0 (2019-04-26) # Platform: x86_64-w64-mingw32/x64 (64-bit) # Running under: Windows 7 x64 (build 7601) Service Pack 1 # # Matrix products: default # # locale: # [1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252 LC_MONETARY=German_Austria.1252 LC_NUMERIC=C LC_TIME=German_Austria.1252 # # attached base packages: # [1] stats graphics grDevices utils datasets methods base # # other attached packages: # [1] xml2_1.2.2 # # loaded via a namespace (and not attached): # [1] compiler_3.6.0 tools_3.6.0Rcpp_1.0.2 packrat_0.5.0 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [SPAM] Re: The "--slave" option
Hello everyone, I think Richard's proposal to update the documentation is a good idea. Not only because it puts the phrasing into context but also because it makes the documentation clearer. About the initial mail: I think the awareness for language has increased a lot in the recent years and I think this is overall a good thing. New code should consider this from the beginning on and in old code should be changed where it is possible, particularly the documentation. General terminology like master/slave is hard to replace but there are alternative wordings that are less offensive and as clear if not clearer. A few thoughts on whether this should be discussed, or if this is the right place for this discussion. To get changes in the code or the documentation done, the help mailing list is definitely not the best place. But discussing the topic does have some merit, also if it's only very loosely related to the topic of the mailing- list. Changing the name of one commandline option will not change society but having a discussion about phrasing, naming or jokes in documentation and comments in the code is valuable, even if just to establish a certain awareness. Whether the original poster is a troll or not does not change much about this, there are more participants in this conversation than the op. I think this discussion could be had much less cynical. Assuming without reason that anyone acts in bad faith in starting the discussion or arguing for either side does not help. I also think discussing this separately for each comment and each commandline option is not the best way to do this. But the fact, that discussions like this resurface every few years in many open-source communities shows, that there are concerns here. I think dismissing completely or belittling these concerns unnecessarily alienates a (maybe small, maybe larger than it appears) group in the community. kind regards, Tobias __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem running stan interactively
Hi, I try to run a stan model in R 4.0.2 and the session crashes if I run the code interactively (sourcing from an interactive session or from Rstudio) but runs fine if run with Rscript or R -e 'source("rstan_test.R")' . I don't really know where to begin to debug this. I'm not sure if this is due to the setup on my machine or if this is a bug in either the packaging on Debian, the rstan package or R itself, so I'm posting it here in the most general mailinglist. For now just running the model in a non-interactive session is a workaround. R and stan code and sessionInfo below. All the best and thanks in advance, Tobias rstan_test.R library(rstan) schools_dat <- list(J = 8, y = c(28, 8, -3, 7, -1, 1, 18, 12), sigma = c(15, 10, 16, 11, 9, 11, 10, 18)) fit <- stan(file = 'schools.stan', data = schools_dat) message("done") schools.stan // saved as schools.stan data { int J; // number of schools real y[J]; // estimated treatment effects real sigma[J]; // standard error of effect estimates } parameters { real mu; // population treatment effect real tau; // standard deviation in treatment effects vector[J] eta; // unscaled deviation from mu by school } transformed parameters { vector[J] theta = mu + tau * eta; // school treatment effects } model { target += normal_lpdf(eta | 0, 1); // prior log-density target += normal_lpdf(y | theta, sigma); // log-likelihood } output of sessionInfo $ R -e 'library(rstan); sessionInfo()' R version 4.0.2 (2020-06-22) -- "Taking Off Again" Copyright (C) 2020 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(rstan); sessionInfo() Loading required package: StanHeaders Loading required package: ggplot2 rstan (Version 2.21.2, GitRev: 2e1f913d3ca3) For execution on a local, multicore CPU with excess RAM we recommend calling options(mc.cores = parallel::detectCores()). To avoid recompilation of unchanged Stan programs, we recommend calling rstan_options(auto_write = TRUE) R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster) Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_DK.UTF-8 LC_COLLATE=en_US.utf8 [5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_DK.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rstan_2.21.2 ggplot2_3.3.2 StanHeaders_2.21.0-6 loaded via a namespace (and not attached): [1] Rcpp_1.0.5 pillar_1.4.6 compiler_4.0.2 prettyunits_1.1.1 [5] tools_4.0.2 pkgbuild_1.1.0 jsonlite_1.7.0 lifecycle_0.2.0 [9] tibble_3.0.3 gtable_0.3.0 pkgconfig_2.0.3 rlang_0.4.7 [13] cli_2.0.2 parallel_4.0.2 curl_4.3 loo_2.3.1 [17] gridExtra_2.3 withr_2.2.0 dplyr_1.0.2 generics_0.0.2 [21] vctrs_0.3.4 stats4_4.0.2 grid_4.0.2 tidyselect_1.1.0 [25] glue_1.4.2 inline_0.3.15 R6_2.4.1 processx_3.4.3 [29] fansi_0.4.1 callr_3.4.3 purrr_0.3.4 magrittr_1.5 [33] codetools_0.2-16 scales_1.1.1 ps_1.3.4 ellipsis_0.3.1 [37] matrixStats_0.56.0 assertthat_0.2.1 colorspace_1.4-1 V8_3.2.0 [41] RcppParallel_5.0.2 munsell_0.5.0 crayon_1.3.4 > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.