Re: [R] Converting SAS Code

2017-09-29 Thread Tobias Fellinger
Hello, 

in my experience the most direct path of converting SAS code to R is by
using dplyr. dplyr provides the filter function, the first part of your
code could look like this, assuming your datasets are stored as
data.frames:

library(dplyr)

yield <- filter(stress,
  field != "YV",
  field != "HV",
  barcode != "16187DD4015",
  barcode != "16187DD6002")

(and so on for the other barcodes.)

For mixed effects look into the lme4 package, lmer should use the reml
criterion per default, the model specifications work very different in
R. Look into the vingette [1] of the lme4 package chapter 2.1. gives an
explanation of the used model formulas.

You should get the coeficients of the fitted glmer model with the coef
function. 

The Plots and univariate statistics work very different in R, have a
look at the functions group_by and summarise provided by the dplyr
package for calculating univariate statistics by groups, and the ggplot
2 package for plotting. 

Tobi

[1] https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf


On Fri, 2017-09-29 at 07:47 -0500, Andrew Harmon wrote:
> Hello all,
> 
> My statistical analysis training up until this point has been entirely done
> in SAS. The code I frequently used was:
> 
> *Yield Champagin;
> 
> data yield;
> 
> set stress;
> 
> if field='YV' then delete;
> 
> if field='HB' then delete;
> 
> if barcode='16187DD4015' then delete;
> 
> if barcode='16187DD6002' then delete;
> 
> if barcode='16187DD2007' then delete;
> 
> if barcode='16187DD5016' then delete;
> 
> if barcode='16187DD8007' then delete;
> 
> if barcode='16187DD7010' then delete;
> 
> if barcode='16187DD7007' then delete;
> 
> if barcode='16187DD8005' then delete;
> 
> if barcode='16187DD6004' then delete;
> 
> if barcode='16187DD5008' then delete;
> 
> if barcode='16187DD7012' then delete;
> 
> if barcode='16187DD6010' then delete;
> 
> run; quit;
> 
> 
> 
> Title'2016 Asilomar Stress Relief champagin yield';
> 
> proc mixed method=reml data=yield;
> 
> class rep Management Foliar_Fungicide Chemical_Treatment;
> 
> model Grain_Yield__Mg_h_ =Management|Foliar_Fungicide|Chemical_Treatment
> Final_Stand__Plants_A_ / outpred=resids residual ddfm=kr;
> 
> random rep rep*Management rep*Management*Foliar_Fungicide;
> 
> lsmeans Management|Foliar_Fungicide|Chemical_Treatment / pdiff;
> 
> ods output diffs=ppp lsmeans=means;
> 
> ods listing exclude diffs lsmeans;
> 
> run; quit;
> 
> %include'C:\Users\harmon12\Desktop\pdmix800.sas';
> 
> %pdmix800(ppp,means,alpha=0.10,sort=yes);
> 
> ods graphics off;
> 
> run; quit;
> 
> proc univariate data=resids normal plot; id Barcode Grain_Yield__Mg_h_
> pearsonresid; var resid;
> proc print data=resids (obs=3);run;
> 
> Can someone please help me convert my code to R? Any help would be much
> appreciated.
> 
> 
> Thanks,
> 
> 
> Andrew Harmon
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] integral of the fuction for each value

2018-10-21 Thread Tobias Fellinger
Hi, 

I'm not entirely sure what you want to calculate. 

If you want to integrate by u in the interval from 0 to 4 for every value
of x I would suggest something like this:

x <- rnorm(10,0,1)
f <- function(u,x){
  exp((x-u))
}

sapply(x, function(i){
  integrate(f,lower=1,upper=4, x=i)$value
})

You can just pass additional arguments to f to integrate, integrate always
integrates by the first argument of the function passed as its first
argument, see the help entry of integrate for more details.

Best Regards, Tobias

On Sun, 2018-10-21 at 12:02 +, malika yassa via R-help wrote:
> hello
> please you help me i have this functionx<-rnorm(10,0,1)f<-fuction(u,x)  
> {exp((x-u)}I want to calculate the integral of this function for each value 
> of x{for(i in 1:lenght(x)integrate(f,lower=1,upper=4) 
> thinks
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] as.factor and floating point numbers

2023-01-25 Thread Tobias Fellinger
Hello,

I'm encountering the following error: 

In a package for survival analysis I use a data.frame is created, one column is 
created by applying unique on the event times while others are created by 
running table on the event times and the treatment arm.

When there are event times very close together they are put in the same factor 
level when coerced to factor while unique outputs both values, leading to 
different lengths of the columns.

Try this to reproduce: 
x <- c(1, 1+.Machine$double.eps)
unique(x)
table(x)

Is there a general best practice to deal with such issues?

Should calling table on floats be avoided in general?

What can one use instead? 

One could easily iterate over the unique values and compare all values with the 
whole vector but this are N*N comparisons, compared to N*log(N) when sorting 
first and taking into account that the vector is sorted.

I think for my purposes I'll round to a hundredth of a day before calling the 
function, but any advice on avoiding this issue an writing more fault tolerant 
code is greatly appreciated.

all the best, Tobias


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.factor and floating point numbers

2023-01-25 Thread Tobias Fellinger
Hello, 

I'll reply in one mail to all. 

Thank you for your suggestions. I already tried Andrews solution with 
increasing the digits. In the most extreme case I encountered I had to take 
the maximum possible digits in format but it worked. 

Tims solution is also a good workaround but in this case I would have to know 
much about the user input.

Valentins solution works and is surely the safest of the options but somehow 
more than I need. The case I encountered does not really need to deal with the 
levels, but just with the counts of every unique value across another 
variable.

After thinking about it a little bit longer I came up with another solution 
that works alright for my purposes: I use table on the ranks. Since in the 
case I encountered the vector does not have duplicates and is already sorted, 
I can use table on the ranks of the vector and get the counts in the right 
order.

Thanks Everyone, Tobias

On Mittwoch, 25. Jänner 2023 20:59:16 CET Valentin Petzel wrote:
> Hello Tobias,
> 
> A factor is basically a way to get a character to behave like an integer. It
> consists of an integer with values from 1 to nlev, and a character vector
> levels, specifying for each value a level name.
> 
> But this means that factors only really make sense with characters, and
> anything that is not a character will be forced to be a character. Thus two
> values that are represented by the same value in as.character will be
> treated as the same.
> 
> Now this is probably reasonable most of the time, as numeric values will
> usually represent metric data, which tends to make little sense as factor.
> But if we want to do this we can easily build or own factors from floats,
> and even write some convenience wrapper around table, as shown in the
> appended file.
> 
> Best regards,
> Valentin
> 
> Am Mittwoch, 25. Jänner 2023, 10:03:01 CET schrieb Tobias Fellinger:
> > Hello,
> > 
> > I'm encountering the following error:
> > 
> > In a package for survival analysis I use a data.frame is created, one
> > column is created by applying unique on the event times while others are
> > created by running table on the event times and the treatment arm.
> > 
> > When there are event times very close together they are put in the same
> > factor level when coerced to factor while unique outputs both values,
> > leading to different lengths of the columns.
> > 
> > Try this to reproduce:
> > x <- c(1, 1+.Machine$double.eps)
> > unique(x)
> > table(x)
> > 
> > Is there a general best practice to deal with such issues?
> > 
> > Should calling table on floats be avoided in general?
> > 
> > What can one use instead?
> > 
> > One could easily iterate over the unique values and compare all values
> > with
> > the whole vector but this are N*N comparisons, compared to N*log(N) when
> > sorting first and taking into account that the vector is sorted.
> > 
> > I think for my purposes I'll round to a hundredth of a day before calling
> > the function, but any advice on avoiding this issue an writing more fault
> > tolerant code is greatly appreciated.
> > 
> > all the best, Tobias
> > 
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting/slicing xml2 nodesets

2019-08-21 Thread Tobias Fellinger

Dear R-help members,

I'm working with the xml2 package to parse an xml document, and I don't 
understand how subsetting / slicing of xml_nodesets works. I'd expect 
xml_find_all to only return children of the nodes I selected with [ or 
[[ but it returns all nodes found in the whole document. I did not find 
any documentation on the [ and [[ operators for xml_nodeset. Below is a 
small example and the sessionInfo.


thanks in advance, Tobias Fellinger



# load package
require(xml2)

# test document as text
test_chr <- "


paragraph 1
paragraph 2


"

# parse test document
test_doc <- read_xml(test_chr)

# extract nodeset
test_nodeset <- xml_find_all(test_doc, "//p")

# subset nodeset (working as expected)
test_nodeset[1]
# {xml_nodeset (1)}
# [1] paragraph 1
test_nodeset[[1]]
# {xml_node}
# 

# extract from subset (not working as expected)
xml_find_all(test_nodeset[1], "//p")
# {xml_nodeset (2)}
# [1] paragraph 1
# [2] paragraph 2
xml_find_all(test_nodeset[[1]], "//p")
# {xml_nodeset (2)}
# [1] paragraph 1
# [2] paragraph 2

sessionInfo()
# R version 3.6.0 (2019-04-26)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
#
# Matrix products: default
#
# locale:
#   [1] LC_COLLATE=German_Austria.1252  LC_CTYPE=German_Austria.1252
LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
LC_TIME=German_Austria.1252

#
# attached base packages:
#   [1] stats graphics  grDevices utils datasets  methods   
base

#
# other attached packages:
#   [1] xml2_1.2.2
#
# loaded via a namespace (and not attached):
#   [1] compiler_3.6.0 tools_3.6.0Rcpp_1.0.2 packrat_0.5.0

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [SPAM] Re: The "--slave" option

2019-09-22 Thread Tobias Fellinger
Hello everyone,

I think Richard's proposal to update the documentation is a good idea. Not
only because it puts the phrasing into context but also because it makes
the documentation clearer.

About the initial mail: I think the awareness for language has increased a
lot in the recent years and I think this is overall a good thing. New code
should consider this from the beginning on and in old code should be
changed where it is possible, particularly the documentation. General
terminology like master/slave is hard to replace but there are alternative
wordings that are less offensive and as clear if not clearer. 

A few thoughts on whether this should be discussed, or if this is the right
place for this discussion. 

To get changes in the code or the documentation done, the help mailing list
is definitely not the best place. But discussing the topic does have some
merit, also if it's only very loosely related to the topic of the mailing-
list. Changing the name of one commandline option will not change society
but having a discussion about phrasing, naming or jokes in documentation
and comments in the code is valuable, even if just to establish a certain
awareness. Whether the original poster is a troll or not does not change
much about this, there are more participants in this conversation than the
op.

I think this discussion could be had much less cynical. Assuming without
reason that anyone acts in bad faith in starting the discussion or arguing
for either side does not help. I also think discussing this separately for
each comment and each commandline option is not the best way to do this.
But the fact, that discussions like this resurface every few years in many
open-source communities shows, that there are concerns here. I think
dismissing completely or belittling these concerns unnecessarily alienates
a (maybe small, maybe larger than it appears) group in the community.

kind regards, Tobias

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem running stan interactively

2020-08-31 Thread Tobias Fellinger
Hi,


I try to run a stan model in R 4.0.2 and the session crashes if I run the code 
interactively (sourcing from an interactive session or from Rstudio) but runs 
fine if run with Rscript or R -e 'source("rstan_test.R")' . I don't really know 
where to begin to debug this.

I'm not sure if this is due to the setup on my machine or if this is a bug in 
either the packaging on Debian, the rstan package or R itself, so I'm posting 
it here in the most general mailinglist.


For now just running the model in a non-interactive session is a workaround. R 
and stan code and sessionInfo below.


All the best and thanks in advance, Tobias


 rstan_test.R 
library(rstan)


schools_dat <- list(J = 8,

y = c(28, 8, -3, 7, -1, 1, 18, 12),

sigma = c(15, 10, 16, 11, 9, 11, 10, 18))


fit <- stan(file = 'schools.stan', data = schools_dat)


message("done")




 schools.stan 

// saved as schools.stan

data {

int J; // number of schools

real y[J]; // estimated treatment effects

real sigma[J]; // standard error of effect estimates

}

parameters {

real mu; // population treatment effect

real tau; // standard deviation in treatment effects

vector[J] eta; // unscaled deviation from mu by school

}

transformed parameters {

vector[J] theta = mu + tau * eta; // school treatment effects

}

model {

target += normal_lpdf(eta | 0, 1); // prior log-density

target += normal_lpdf(y | theta, sigma); // log-likelihood

}




 output of sessionInfo 

$ R -e 'library(rstan); sessionInfo()'


R version 4.0.2 (2020-06-22) -- "Taking Off Again"

Copyright (C) 2020 The R Foundation for Statistical Computing

Platform: x86_64-pc-linux-gnu (64-bit)


R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.


Natural language support but running in an English locale


R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.


Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.


> library(rstan); sessionInfo()

Loading required package: StanHeaders

Loading required package: ggplot2

rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)

For execution on a local, multicore CPU with excess RAM we recommend calling

options(mc.cores = parallel::detectCores()).

To avoid recompilation of unchanged Stan programs, we recommend calling

rstan_options(auto_write = TRUE)

R version 4.0.2 (2020-06-22)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Debian GNU/Linux 10 (buster)


Matrix products: default

BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3

LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so


locale:

[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C

[3] LC_TIME=en_DK.UTF-8 LC_COLLATE=en_US.utf8

[5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_US.utf8

[7] LC_PAPER=en_DK.UTF-8 LC_NAME=C

[9] LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C


attached base packages:

[1] stats graphics grDevices utils datasets methods base


other attached packages:

[1] rstan_2.21.2 ggplot2_3.3.2 StanHeaders_2.21.0-6


loaded via a namespace (and not attached):

[1] Rcpp_1.0.5 pillar_1.4.6 compiler_4.0.2 prettyunits_1.1.1

[5] tools_4.0.2 pkgbuild_1.1.0 jsonlite_1.7.0 lifecycle_0.2.0

[9] tibble_3.0.3 gtable_0.3.0 pkgconfig_2.0.3 rlang_0.4.7

[13] cli_2.0.2 parallel_4.0.2 curl_4.3 loo_2.3.1

[17] gridExtra_2.3 withr_2.2.0 dplyr_1.0.2 generics_0.0.2

[21] vctrs_0.3.4 stats4_4.0.2 grid_4.0.2 tidyselect_1.1.0

[25] glue_1.4.2 inline_0.3.15 R6_2.4.1 processx_3.4.3

[29] fansi_0.4.1 callr_3.4.3 purrr_0.3.4 magrittr_1.5

[33] codetools_0.2-16 scales_1.1.1 ps_1.3.4 ellipsis_0.3.1

[37] matrixStats_0.56.0 assertthat_0.2.1 colorspace_1.4-1 V8_3.2.0

[41] RcppParallel_5.0.2 munsell_0.5.0 crayon_1.3.4

>

>



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.