[R] New User Having Trouble Loading R Commander on Mac OS Yosemite
I keep getting the same error message when trying to install R Commander. My operating system is Mac OS Yosemite 10.10 I have installed R 3.2, Rstudio, XQuartz (X11), and tcltk-8.x.x-x11.dmg. But I keep getting the following error:Loading required package: splinesLoading required package: RcmdrMiscLoading required package: carError in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called ‘SparseM’Error: package ‘car’ could not be loaded -- View this message in context: http://r.789695.n4.nabble.com/New-User-Having-Trouble-Loading-R-Commander-on-Mac-OS-Yosemite-tp470.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with big data and parallel computing: 500, 000 x 4 linear models
Don't run 500K separate models. Use the limma package to fit one model that can learn the variance parameters jointly. Run it on your laptop. And don't use %methylation as your Y variable, use logit(percent), i.e. the Beta value. -Aaron On Mon, Aug 8, 2016 at 2:49 PM, Ellis, Alicia M wrote: > I have a large dataset with ~500,000 columns and 1264 rows. Each column > represents the percent methylation at a given location in the genome. I > need to run 500,000 linear models for each of 4 predictors of interest in > the form of: > Methylation.stie1 ~ predictor1 + covariate1+ covariate2 + ... covariate9 > ...and save only the pvalue for the predictor > > The original methylation data file had methylation sites as row labels and > the individuals as columns so I read the data in chunks and transposed it > so I now have 5 csv files (chunks) with columns representing methylation > sites and rows as individuals. > > I was able to get results for all of the regressions by running each chunk > of methylation data separately on our supercomputer using the code below. > However, I'm going to have to do this again for another project and I would > really like to accomplish two things to make the whole process more > computationally efficient: > > > 1) Work with data.tables instead of data.frames (reading and > manipulating will be much easier and faster) > > 2) Do the work in parallel using say 12 cores at once and having the > program divide the work up on the cores rather than me having to split the > data and run 5 separate jobs on the supercomputer. > > I have some basic knowledge of the data.table package but I wasn't able to > modify the foreach code below to get it to work and the code using > data.frames didn't seem to be using all 12 cores that I created in the > cluster. > > Can anyone suggest some modifications to the foreach code below that will > allow me to do this in parallel with datatables and not have to do it in > chunks? > > > # Set up cluster > clus = makeCluster(12, type = "SOCK") > registerDoSNOW(clus) > getDoParWorkers() > getDoParName() > > > ### Following code needs to be modified to run the full > dataset (batch1-batch5) in parallel > ### Currently I read in the following chunks, and run each predictor > separately for each chunk of data > > ### Methylation data in batches > batch1=read.csv("/home/alicia.m.ellis/batch1.csv") ## #Each batch > has about 100,000 columns and 1264 rows; want to alter this to: > ## batch1=fread(file= ) > batch2=read.csv(file="/home/alicia.m.ellis/batch2.csv") > batch3=read.csv(file="/home/alicia.m.ellis/batch3.csv") > batch4=read.csv(file="/home/alicia.m.ellis/batch4.csv") > batch5=read.csv(file="/home/alicia.m.ellis/batch5.csv") > > predictors ## this is a data.frame with 4 columns and 1264 rows > > covariates ## this is a data.frame with 9 columns and 1264 rows > > fits <- as.data.table(batch1)[, list(MyFits = lapply(1:ncol(batch1), > function(x) summary(lm(batch1[, x] ~ predictors[,1] + > > covariates[,1]+ > > covariates[,2]+ > > covariates[,3]+ > > covariates[,4]+ > > covariates[,5]+ > > covariates[,6]+ > > covariates[,7]+ > > covariates[,8]+ > > covariates[,9] > ) > )$coefficients[2,4] > ) > ) > ] > > > ## This is what I was trying but > wasn't having much luck > I'm having trouble getting the data merged as a single data.frame and > the code below doesn't seem to be dividing the work among the 12 cores in > the cluster > > all. fits = foreach (j=1:ncol(predictors), i=1:ncol(meth1), > combine='rbind', .inorder=TRUE) %dopar% { > > model = lm(meth[, i] ~ predictors[,j] + >covariates[,1]+ >covariates[,2]+ >covariates[,3]+ >covariates[,4]+ >covariates[,5]+ >covariates[,6]+ >covariates[,7]+ >covariates[,8]+ >covariates[,9]) > summary(model)$coefficients[2,4] > } > > > Alicia Ellis, Ph.D > Biostatistician > Pathology & Laboratory Medicine > Colchester Research Facility > 360 South Park Drive, Room 209C > Colchester, VT 05446 > 802-656-9840 > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.or
[R] Unexpected errors in sparse Matrix arithmetic with zero-length dimensions
Dear list, The Matrix package exhibits some unexpected behaviour in its arithmetic methods for the edge case of a sparse matrix with a dimension of zero length. The example below is the most illustrative, where changing the contents of the vector causes the subtraction to fail for a sparse matrix with no columns: > library(Matrix) > x <- rsparsematrix(10, 0, density=0.1) > > x - rep(1, nrow(x)) # OK > x - rep(0, nrow(x)) # fails Error in .Ops.recycle.ind(e1, len = l2) : vector too long in Matrix - vector operation This is presumably because Matrix recognizes that subtraction of zero preserves sparsity and thus uses a different method in the second case. However, I would have expected subtraction of a zero vector to work if subtraction of a general vector is permissible. This is accompanied by a host of related errors for sparsity-preserving arithmetic: > x / 1 # OK > x / rep(1, nrow(x)) # fails Error in .Ops.recycle.ind(e1, len = l2) : vector too long in Matrix - vector operation > > x * 1 # OK > x * rep(1, nrow(x)) # fails Error in .Ops.recycle.ind(e1, len = l2) : vector too long in Matrix - vector operation A different error is raised for a sparse matrix with no rows: > y <- rsparsematrix(0, 10, density=0.1) > > y - numeric(1) # OK > y - numeric(0) # fails Error in y - numeric(0) : - numeric(0) is undefined I would have expected to just get 'y' back, given that the same code works fine for other Matrix classes: > z <- as(y, "dgeMatrix") > z - numeric(0) # OK Correct behaviour of zero-dimension sparse matrices is practically important to me; I develop a number of packages that rely on Matrix classes, and in those packages, I do a lot of unit testing with zero- dimension inputs. This ensures that my functions return sensible results or fail gracefully in edge cases that might be encountered by users. The current behaviour of sparse Matrix arithmetic causes my unit tests to fail for no (obvious) good reason. Best, Aaron Lun Research Associate CRUK Cambridge Institute University of Cambridge > sessionInfo() R Under development (unstable) (2019-01-14 r75992) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.5 LTS Matrix products: default BLAS: /home/cri.camres.org/lun01/Software/R/trunk/lib/libRblas.so LAPACK: /home/cri.camres.org/lun01/Software/R/trunk/lib/libRlapack.so locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Matrix_1.2-15 loaded via a namespace (and not attached): [1] compiler_3.6.0 grid_3.6.0 lattice_0.20-38 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmer syntax, matrix of (grouped) covariates?
I have a fairly large model: > length(Y) [1] 3051 > dim(covariates) [1] 3051 211 All of these 211 covariates need to be nested hierarchically within a grouping "class", of which there are 8. I have an accessory vector, " cov2class" that specifies the mapping between covariates and the 8 classes. Now, I understand I can break all this information up into individual vectors (cov1, cov2, ..., cov211, class1, class2, ..., class8), and do something like this: model <- lmer(Y ~ 1 + cov1 + cov2 + ... + cov211 + (cov1 + cov2 + ... | class1) + (...) + (... + cov210 + cov211 | class8) But I'd like keep things syntactically simpler, and use the covariates and cov2class variables directly. I haven't been able to find the right syntactic sugar to get this done. Thanks for any help, -Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hex2RGB back to hex not the same?
Witness this oddity (to me): > rainbow_hcl(10)[1] [1] "#E18E9E" > d <- attributes(hex2RGB(rainbow_hcl(10)))$coords[1,] > rgb(d[1], d[2], d[3]) [1] "#C54D5F" What happened? FYI, this came up as I'm trying to reuse the RGB values I get from rainbow_hcl in a call to rgb() where I can also set alpha transparency levels ... -Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hmm.discnp or other?
(I think) I'd like to use the hmm.discnp package for a simple discrete, two-state HMM, but my training data is irregularly shaped (i.e. the observation chains are of varying length). Additionally, I do not see how to label the state of the observations given to the hmm() function. Ultimately, I'd like to 1) train the hmm on labeled data, 2) use viterbi() to calculate optimal labeling of unlabeled observations. More concretely, I have labeled data that looks something like: 11212321221223121221112233222122112 ABA 21221223121221112233222122112 ABAAA 3121221112233222122112 BB from which I'd like to build the two hidden state (A and B) hmm that emits observed 1, 2, or 3 at probabilities dictated by the hidden state, with transition probabilities between the two states. Given the trained HMM, I then wish to label new sequences via viterbi(). Am I missing the purpose of this package? I also read through the msm package docs, but my data doesn't really have a time coordinate on which the data should be "aligned". Thanks for any pointers, -Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] importing S4 methods using a namespace
I want to call summary on a mer object (from lme4) within my package but I can't seem to get the namespace to import the necessary method. I've simplified my package to this one function: --- ss <- function(m) { summary(m) } --- And my namespace file looks like this, where I've attempted to follow the instructions in "Writing R Extensions" http://cran.r-project.org/doc/manuals/R-exts.html#Name-spaces-with-S4-classes-and-methods --- import(lme4) importMethodsFrom(lme4, "summary") export("ss") --- But when I call my new function, I get the summary.default method instead of the mer method. > m <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) > ss(m) Length Class Mode 1mer S4 Thanks, -- Aaron Rendahl, Ph.D. Statistical Consulting Manager School of Statistics, University of Minnesota NEW OFFICE (as of June 2009): 48C McNeal Hall, St. Paul Campus 612-625-1062 www.stat.umn.edu/consulting __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] importing S4 methods using a namespace
Thanks very much! Importing from Matrix as you suggest fixes it. -- Aaron Rendahl, Ph.D. Statistical Consulting Manager School of Statistics, University of Minnesota NEW OFFICE (as of June 2009): 48C McNeal Hall, St. Paul Campus 612-625-1062 www.stat.umn.edu/consulting __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read a list into R??
Hi, You should not use the 'sink' function to save these data files. If you want a readable format, you should look at 'dump' instead. If you simply want to save your data structures then 'save' might be best. 'sink' is not appropriate for data saving. It's simply a convenient way to log what you see in the terminal. Aaron On Mon, Jun 29, 2009 at 23:58, Li,Hua wrote: > Dear R helpers: > I have tried many times to find some way to read a list into R. But I > faid. Here is an example: > I have a file 'List.txt' which includes data as follows: > [[1]] > [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 > [19] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > > [[2]] > [1] 0.000 0.500 0.000 0.000 0.500 0.000 0.000 > [8] 0.000 0.000 0.000 > > [[3]] > [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > [19] 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 > > 'List.txt' was output by 'sink' from R. > I try to read 'List.txt' into R. > First I tried 'dget', I got >> dget('Vlist300.txt') > Error in parse(file = file) : Vlist300.txt: unexpected '[[' at > 1: [[ > > Then I tried 'scan', >>scan('List.txt', what='list') > Read 86 items > [1] "[[1]]" "[1]" "0.0" "0.0" "0.0" "0.0" > [7] "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" > [13] "0.0" "0.0" "0.0" "0.0" "0.5" "0.0" > [19] "0.0" "0.0" "[19]" "0.0" "0.0" "0.0" > [25] "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" > [31] "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" > [37] "0.0" "0.0" "[[2]]" "[1]" "0.000" "0.500" > [43] "0.000" "0.000" "0.500" "0.000" "0.000" "[8]" > [49] "0.000" "0.000" "0.000" "[[3]]" "[1]" "0.0" > [55] "0.0" "0.0" "0.0" "0.0" "0.0" "0.0" > [61] "0.0" "0.5" "0.0" "0.0" "0.0" "0.0" > [67] "0.0" "0.0" "0.0" "0.0" "0.0" "[19]" > [73] "0.0" "0.0" "0.0" "0.0" "0.5" "0.0" > [79] "0.0" "0.5" "0.0" "0.5" "0.0" "0.0" > [85] "0.0" "0.0" > > Unfortunately I can't find any function to read 'List.txt' into R and to give > me the right format as in List.txt. Do you know if there's a function that > can read 'List.txt' into R and keep the format as follows? > > [[1]] > [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 > [19] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > > [[2]] > [1] 0.000 0.500 0.000 0.000 0.500 0.000 0.000 > [8] 0.000 0.000 0.000 > > [[3]] > [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 > [19] 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0 > > I appreciate any help!! > Best, > Hua > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Specify CRAN repository from command line
Hi, It feels like I should be able to do something like: R CMD INSTALL lib='/usr/lib64/R/library' repos='http://proxy.url/cran' package We have a bunch of servers (compute nodes in a Rocks cluster) in an isolated subnet, there is a basic pass-through proxy set up on the firewall (the head node) which just passes HTTP requests through to our nearest CRAN mirror. when using install. packages it's easy to make R install from the repository with the repos='address' option, but I can't figure out how do this from the command line. Is there a command line option for this? Currently I'm doing it using an R script, but that's causing issues because it's not 'visible' to the installer. This would greatly streamline R installation with a standard package set. Regards, Aaron Hicks Please consider the environment before printing this email Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails. The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error using getBM() to query BioMart archives
I'm trying to identify the positions of all genes within a specific chromosomal region using biomart. When using the current biomart database I'm able to do this without issue. However, I need to use build 36 of the mouse genome which was last included in ensembl mart 46. I selected this mart and the mouse dataset as follows: mart<-useMart(biomart="ensembl_mart_46", host="www.biomart.org", path="/biomart/martservice", port=80, archive=TRUE) mart<-useDataset("mmusculus_gene_ensembl", mart=mart) I'm able to list the available attributes and filters just fine, but when I attempt to actually retrieve data using getBM() I receive the following error: > genes<-getBM(attributes=c("ensembl_gene_id", "external_gene_id", "description", "chromosome_name", "start_position", "transcript_start"), + filters=c("chromosome_name","start","end"), + values=list(12,4000,7000), + mart=mart) Error in listFilters(mart, what = "type") : The function argument 'what' contains an invalid value: type Valid are: name, description, options, fullDescription The same error is returned if I check to see what value type is required for a particular filter: > filterType("chromosome_name", mart=mart) Error in listFilters(mart, what = "type") : The function argument 'what' contains an invalid value: type Valid are: name, description, options, fullDescription I'd really appreciate some help with this issue. Cheers, Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] deleting/removing previous warning message in loop
Hello R Users, I am having difficulty deleting the last warning message in a loop so that the only warning that is produced is that from the most recent line of code. I have tried options(warn=1), rm(last.warning), and resetting the last.warning using something like: > warning("Resetting warning message") This problem has been addressed in a previous listserve string, however I do not follow the advice given. See the below web link. Any help would be greatly appreciated. Thanks! Aaron Wells https://stat.ethz.ch/pipermail/r-help/2008-October/176765.html A general example is first, followed by an example with the loop. - Example 1: > demo.glm<-glm(test.data[,1]~c(1:38)+I(c(1:38)^2),family=binomial) ### > Generalized linear model run on the first column of my example data > warnings() ### no warnings reported NULL > demo.glm<-glm(test.data[,9]~c(1:38)+I(c(1:38)^2),family=binomial) ### > Generalized linear model run on the 9th column of my example data Warning messages: 1: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : algorithm did not converge 2: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : fitted probabilities numerically 0 or 1 occurred > warnings()### the model with column 9 as data produces warnings Warning messages: 1: In glm.fit(x = X, y = Y, weights = weights, start = start, ... : algorithm did not converge 2: In glm.fit(x = X, y = Y, weights = weights, start = start, ... : fitted probabilities numerically 0 or 1 occurred > demo.glm<-glm(test.data[,1]~c(1:38)+I(c(1:38)^2),family=binomial) ### Re-run > the model with column 1 as data > warnings() ### reports the same warnings from the column 9 model, ideally > it would report the actual warning message for the column 1 model ("NULL") as > above Warning messages: 1: In glm.fit(x = X, y = Y, weights = weights, start = start, ... : algorithm did not converge 2: In glm.fit(x = X, y = Y, weights = weights, start = start, ... : fitted probabilities numerically 0 or 1 occurred -- Example 2: Loop ###In the below example I have reset warnings() before each iteration by using warning("Resetting warning message"). I would like the warnings to somehow be consolidated into a list that I could later examine to determine which model iterations ran with and without warnings. The below code doesn't work because the functions are being run in the loop environment, and not the base environment. > test.warn<-rep(0,ncol(test.data));test.warn<-as.list(test.warn) > > > for (i in 1:ncol(test.data)) {warn.reset<-warning("Resetting warning message") + demo.glm<-glm(test.data[,i]~c(1:38)+I(c(1:38)^2),family=binomial) + warn.new<-warnings() + cbind.warn<-cbind(warn.reset,warn.new) + test.warn[[i]]<-cbind.warn + test.warn + } There were 38 warnings (use warnings() to see them) > test.warn [[1]] warn.reset warn.new Resetting warning message "Resetting warning message" NULL [[2]] warn.reset warn.new Resetting warning message "Resetting warning message" NULL . . . Aaron F. Wells, PhD Senior Scientist ABR, Inc. 2842 Goldstream Road Fairbanks, AK 99709 _ [[elided Hotmail spam]] plorer 8. [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] deleting/removing previous warning message in loop
William, The function keepWarnings that you wrote did the trick. Thanks for the help! Aaron > Subject: Re: [R] deleting/removing previous warning message in loop > Date: Fri, 27 Mar 2009 13:33:51 -0700 > From: wdun...@tibco.com > To: awell...@hotmail.com > > You try a using function like the following (based > on suppressWarnings): > keepWarnings <- function(expr) { > localWarnings <- list() > value <- withCallingHandlers(expr, > warning = function(w) { > localWarnings[[length(localWarnings)+1]] <<- w > invokeRestart("muffleWarning") > }) > list(value=value, warnings=localWarnings) > } > It returns a 2-element list, the first being the value > of the expression given to it and the second being a > list of all the warnings. Your code can look through > the list of warnings and decide which to omit. E.g., > > > d<-data.frame(x=1:10, y=rep(c(FALSE,TRUE),c(4,6))) > > z <- keepWarnings(glm(y~x, data=d, family=binomial)) > > z$value > > Call: glm(formula = y ~ x, family = binomial, data = d) > > Coefficients: > (Intercept) x > -200.37 44.52 > > Degrees of Freedom: 9 Total (i.e. Null); 8 Residual > Null Deviance: 13.46 > Residual Deviance: 8.604e-10 AIC: 4 > > z$warnings > [[1]] > start, etastart = etastart, mustart = mustart, offset = offset, > family = family, control = control, intercept = attr(mt, > "intercept") > 0): algorithm did not converge> > > [[2]] > start, etastart = etastart, mustart = mustart, offset = offset, > family = family, control = control, intercept = attr(mt, > "intercept") > 0): fitted probabilities numerically 0 or 1 occurred> > > > str(z$warnings[[1]]) > List of 2 > $ message: chr "algorithm did not converge" > $ call : language glm.fit(x = X, y = Y, weights = weights, start = > start, etastart = etastart, mustart = mustart, offset = offset, > family = family, control = control, ... > - attr(*, "class")= chr [1:3] "simpleWarning" "warning" "condition" > > sapply(z$warnings, function(w)w$message) > [1] "algorithm did not converge" > [2] "fitted probabilities numerically 0 or 1 occurred" > > You can filter out the ones you don't want to hear about > and recall warning() with the interesting ones or present > them in some other way. > > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > > --- > I am having difficulty deleting the last warning message in a loop so > that the only warning that is produced is that from the most recent line > of code. I have tried options(warn=1), rm(last.warning), and resetting > the last.warning using something like: ... _ [[elided Hotmail spam]] plorer 8. [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Public R servers?
Hello, Earlier I posted a question about memory usage, and the community's input was very helpful. However, I'm now extending my dataset (which I use when running a regression using lm). As a result, I am continuing to run into problems with memory usage, and I believe I need to shift to implementing the analysis on a different system.. I know that R supports R servers through Rserve. Are there any public servers where I could upload my datasets (either as a text file, or through a connection to a SQL server), execute the analysis, then download the results? I identifed Wessa.net (http://www.wessa.net/mrc.wasp?outtype=Browser%20Blue%20-%20Charts%20White), but it's not clear it will meet my needs. Can anyone suggest any other resources? Thanks in advance, Aaron Barzilai [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] function output with for loop and if statement
Hello all, turns out i'm having a bad R week. I am at my wits end with a function that I am trying to write. When I run the lines of code outside of a function, I get the desired output. When I wrap the lines of code into a function it doesn't work as expected. Not sure what is going on here. I suspected that the syntax of the if statement with the for loop was the culprit, but when I only ran the part of the code with the for loop with no if statement I still had the above problem (works outside function, fails when wrapped into a function). Below is the code and example output. Please help! Thanks, Aaron concov.test<-function(vegetation,specieslist) { test.veg<-vegetation names(test.veg)<-specieslist$LifeForm tmp<-matrix(nrow=nrow(test.veg),ncol=length(unique(names(test.veg for (i in unique(names(test.veg))) {test.out<-apply(test.veg[,names(test.veg)==i],1,sum) tmp.match<-unique(names(test.veg))[unique(names(test.veg))==i] tmp.col<-match(tmp.match,unique(names(test.veg))) tmp[1:nrow(test.veg),tmp.col]<-test.out tmp.out<-data.frame(row.names(test.veg),tmp,row.names=1);names(tmp.out)<-unique(names(test.veg)) tmp.out tmp.out.sort<-tmp.out[,order(names(tmp.out))] } if(table(names(tmp.out))[i]==1) tmp.match2<-names(tmp.out.sort)[names(tmp.out.sort)==i] tmp.col2<-match(tmp.match2,names(tmp.out.sort)) tmp.out.sort[1:nrow(test.veg),tmp.col2]<-test.veg[,names(test.veg)==i] return(tmp.out.sort) else return(tmp.out.sort) } Incorrect output when run as function- > test<-concov.test(ansveg_all,spplist.class) > test Bare_Ground Deciduous_Shrubs Deciduous_Tree Evergreen_Shrubs Evergreen_Tree Forbs Grasses Lichens Mosses Sedges ANSG_T01_01_2008 NA NA NA NA NANA NA NA 95.0 NA ANSG_T01_02_2008 NA NA NA NA NANA NA NA 16.0 NA ANSG_T01_03_2008 NA NA NA NA NANA NA NA 71.0 NA ANSG_T01_04_2008 NA NA NA NA NANA NA NA 10.0 NA ANSG_T02_01_2008 NA NA NA NA NANA NA NA 92.2 NA ANSG_T02_02_2008 NA NA NA NA NANA NA NA 14.0 NA . . . Correct output when code is run outside of a function > test.veg<-ansveg_all > names(test.veg)<-spplist.class$LifeForm > tmp<-matrix(nrow=nrow(test.veg),ncol=length(unique(names(test.veg > > for (i in unique(names(test.veg))) > {test.out<-apply(test.veg[,names(test.veg)==i],1,sum) + tmp.match<-unique(names(test.veg))[unique(names(test.veg))==i] + tmp.col<-match(tmp.match,unique(names(test.veg))) + tmp[1:nrow(test.veg),tmp.col]<-test.out + tmp.out<-data.frame(row.names(test.veg),tmp,row.names=1);names(tmp.out)<-unique(names(test.veg)) + tmp.out + tmp.out.sort<-tmp.out[,order(names(tmp.out))] + } > if(table(names(tmp.out))[i]==1) + tmp.match2<-names(tmp.out.sort)[names(tmp.out.sort)==i] > tmp.col2<-match(tmp.match2,names(tmp.out.sort)) > tmp.out.sort[1:nrow(test.veg),tmp.col2]<-test.veg[,names(test.veg)==i] > return(tmp.out.sort) > else return(tmp.out.sort) > > > tmp.out.sort Bare_Ground Deciduous_Shrubs Deciduous_Tree Evergreen_Shrubs Evergreen_Tree Forbs Grasses Lichens Mosses Sedges ANSG_T01_01_2008 0 57.01.0 40.0 35.0 22.0 5.035.0 95.01.1 ANSG_T01_02_2008 0 0.00.0 0.0 0.0 34.0 0.0 0.0 16.0 24.0 ANSG_T01_03_2008 0 31.00.0 47.0 1.0 9.1 3.0 3.0 71.0 14.0 ANSG_T01_04_2008 0 0.00.0 12.0 0.0 13.2 0.0 0.0 10.0 16.0 ANSG_T02_01_2008 0 15.01.0 22.0 36.0 9.2 2.038.0 92.20.1 ANSG_T02_02_2008 0 33.0 66.0 23.0 2.0 5.0 0.0 3.0 14.00.0 . . . _ Rediscover Hotmail®: Get quick friend updates right in your inbox. Updates2_042009 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function output with for loop and if statement
Mark, thanks for the suggestions. Unfortunately that did not fix the problem. I have experimented (with no success) with placing braces in different locations around the if/else statements and removing them all together. Thanks again, Aaron Date: Wed, 22 Apr 2009 15:24:24 -0500 From: markle...@verizon.net To: awell...@hotmail.com Subject: Re: [R] function output with for loop and if statement Hi Aaron: i just looked quickly because I have to go but try wrapping braces around the last if else like below and see if that helps. if you have multiple statements in an if else, i think you need them so I'm actually a little surpised that your function didn't give messages when you tried to run it ? Also, braces in R can have some strange behavior ( because , if code is run at the prompt, and a statement can complete and there's no brace on that line then that statement is executed regardless f there's a brace later. that probably doesn't make much sense but it's kind of hard to explain ) but I'm hoping that below fixes the problem. good luck. function ( ) { # brace for beginning of function . . . if (table(names(tmp.out))[i]==1) { tmp.match2<-names(tmp.out.sort)[names(tmp.out.sort)==i] tmp.col2<-match(tmp.match2,names(tmp.out.sort)) tmp.out.sort[1:nrow(test.veg),tmp.col2]<-test.veg[,names(test.veg)==i] return(tmp.out.sort) } else { return(tmp.out.sort) } } # brace for end of function On Apr 22, 2009, aaron wells wrote: Hello all, turns out i'm having a bad R week. I am at my wits end with a function that I am trying to write. When I run the lines of code outside of a function, I get the desired output. When I wrap the lines of code into a function it doesn't work as expected. Not sure what is going on here. I suspected that the syntax of the if statement with the for loop was the culprit, but when I only ran the part of the code with the for loop with no if statement I still had the above problem (works outside function, fails when wrapped into a function). Below is the code and example output. Please help! Thanks, Aaron concov.test<-function(vegetation,specieslist) { test.veg<-vegetation names(test.veg)<-specieslist$LifeForm tmp<-matrix(nrow=nrow(test.veg),ncol=length(unique(names(test.veg for (i in unique(names(test.veg))) {test.out<-apply(test.veg[,names(test.veg)==i],1,sum) tmp.match<-unique(names(test.veg))[unique(names(test.veg))==i] tmp.col<-match(tmp.match,unique(names(test.veg))) tmp[1:nrow(test.veg),tmp.col]<-test.out tmp.out<-data.frame(row.names(test.veg),tmp,row.names=1);names(tmp.out)<-unique(names(test.veg)) tmp.out tmp.out.sort<-tmp.out[,order(names(tmp.out))] } if(table(names(tmp.out))[i]==1) tmp.match2<-names(tmp.out.sort)[names(tmp.out.sort)==i] tmp.col2<-match(tmp.match2,names(tmp.out.sort)) tmp.out.sort[1:nrow(test.veg),tmp.col2]<-test.veg[,names(test.veg)==i] return(tmp.out.sort) else return(tmp.out.sort) } Incorrect output when run as function- > test<-concov.test(ansveg_all,spplist.class) > test Bare_Ground Deciduous_Shrubs Deciduous_Tree Evergreen_Shrubs Evergreen_Tree Forbs Grasses Lichens Mosses Sedges ANSG_T01_01_2008 NA NA NA NA NA NA NA NA 95.0 NA ANSG_T01_02_2008 NA NA NA NA NA NA NA NA 16.0 NA ANSG_T01_03_2008 NA NA NA NA NA NA NA NA 71.0 NA ANSG_T01_04_2008 NA NA NA NA NA NA NA NA 10.0 NA ANSG_T02_01_2008 NA NA NA NA NA NA NA NA 92.2 NA ANSG_T02_02_2008 NA NA NA NA NA NA NA NA 14.0 NA . . . Correct output when code is run outside of a function > test.veg<-ansveg_all > names(test.veg)<-spplist.class$LifeForm > tmp<-matrix(nrow=nrow(test.veg),ncol=length(unique(names(test.veg > > for (i in unique(names(test.veg))) > {test.out<-apply(test.veg[,names(test.veg)==i],1,sum) + tmp.match<-unique(names(test.veg))[unique(names(test.veg))==i] + tmp.col<-match(tmp.match,unique(names(test.veg))) + tmp[1:nrow(test.veg),tmp.col]<-test.out + tmp.out<-data.frame(row.names(test.veg),tmp,row.names=1);names(tmp.out)<-unique(names(test.veg)) + tmp.out + tmp.out.sort<-tmp.out[,order(names(tmp.out))] + } > if(table(names(tmp.out))[i]==1) + tmp.match2<-names(tmp.out.sort)[names(tmp.out.sort)==i] > tmp.col2<-match(tmp.match2,names(tmp.out.sort)) > tmp.out.sort[1:nrow(test.veg),tmp.col2]<-test.veg[,names(test.veg)==i] > return(tmp.out.sort) > else return(tmp.out.sort) > > > tmp.out.sort Bare_Ground Deciduous_Shrubs Deciduous_Tree Evergreen_Shrubs Evergreen_Tree Forbs Grasses Lichens Mosses Sedges ANSG_T01_01_2008 0 57.0 1.0 40.0 35.0 22.0 5.0 35.0 95.0 1.1 ANSG_T01_02_2008 0 0.0 0.0 0.0 0.0 34.0 0.0 0.0 16.0 24.0 ANSG_T01_03_2008 0 31.0 0.0 47.0 1.0 9.1 3.0 3.0 71.0 14.0 ANSG_T01_04_2008 0 0.0 0.0 12.0 0.0 13.2 0.0 0.0 10.0 16.0 ANSG_T02_01_2008 0 15.0 1.0 22.0
Re: [R] function output with for loop and if statement
Gavin, thank you for the suggestions. Unfortunately the function is still not working correctly. Below are the dummy datasets that you requested. In the function dummy.vegdata = vegetation; and dummy.spplist = specieslist. A little clarification on why the if statement is in the function. I am using the apply function to sum columns of data that correspond with different lifeforms in order to derive a total cover value for each lifeform in each plot (plt). When only one species occurs in a lifeform the apply function doesn't work since there is only one column of data. So, the if statement is an attempt to include the column of data from the dummy.vegdata into the output when there is only one species in a given lifeform. Examples of this condition in the dummy.vegdata include water (Bare_Ground) and popbal (Deciduous_Tree). Aaron > dummy.vegdata plt water salarb salpul popbal leddec picgla picmar arcuva zygele epiang calpur poaarc pelaph flacuc tomnit hylspl carvag caraqu calcan carsax T1 0 0 10.00.0 200.0 35 00.00.0 0 0.05.00.0 20 550.0 15.0 0 T2 0 00.00.0 30.0 62 00.00.0 0 0.08.02.0 5 650.0 03.0 0 T3 0 00.00.0 00.0 1 802.00.0 4 0.00.00.0 0 00.0 00.0 0 T4 0 10.0 30.0 00.1 0 00.00.0 0 0.00.00.0 0 00.0 00.0 0 T5 0 00.00.0 00.0 0 00.00.0 0 0.00.00.0 0 00.0 00.0 0 T6 0 00.00.0 0 20.0 0 00.01.0 0 0.02.00.1 0 200.0 00.0 0 T7 0 00.00.0 50.0 40 00.00.0 0 0.05.00.0 0 850.0 00.0 0 T8 0 03.00.0 400.0 5 00.00.0 0 0.00.00.0 1 00.0 20.0 0 T9 0 01.00.0 200.0 10 00.00.0 0 0.00.10.0 0 30.0 10.0 0 T10 0 00.0 65.0 02.0 0 00.00.0 0 0.00.00.0 0 00.0 05.0 0 T11 0 11.00.0 300.0 45 00.00.0 0 0.11.00.0 0 101.0 00.0 0 T12 0 57.00.1 03.0 5 00.00.0 0 0.00.00.0 0 51.0 02.0 5 T13 0 0 20.00.0 52.0 5 00.00.0 0 0.00.00.0 25 02.0 23.0 15 T14 0 0 35.00.0 00.0 0 00.00.0 0 0.00.00.0 0 00.1 03.0 2 T15 0 00.01.0 0 20.0 0 50.10.0 0 0.00.00.0 0 100.0 00.0 0 T16 0 00.00.0 1 80.0 0 00.00.1 0 0.01.00.0 0 500.0 00.1 0 T17 100 00.00.0 00.0 0 00.00.0 0 0.00.00.0 0 00.0 00.0 0 T18 0 00.00.0 0 15.0 0 50.02.0 0 0.01.00.0 0 200.0 01.0 0 T19 0 00.0 10.0 0 50.0 0 00.00.0 0 0.00.10.0 0 550.0 0 20.0 0 T20 0 00.00.0 450.0 15 00.00.0 0 0.01.00.0 0 500.0 00.0 0 T21 0 00.00.0 00.0 0 00.00.0 0 0.00.00.0 0 00.0 0 35.0 0 T22 100 00.00.0 00.0 0 00.00.0 0 0.00.00.0 0 00.0 00.0 0 T23 0 00.00.0 0 15.0 0 10.00.0 0 0.00.00.0 0 20.0 00.0 0 T24 0 00.10.0 00.0 0 00.00.0 0 0.00.00.0 0 00.0 0 25.0 0 T25 0 00.00.0 0 25.0 0 00.01.0 0 0.00.00.0 0 150.0 02.0 0 > dummy.spplist code SciName LifeFormClass water Water Bare_GroundOther salarbSalix_arbusculoides Deciduous_Shrubs Vascular salpul Salix_planifolia_pulchra Deciduous_Shrubs Vascular popbalPopulus_balsamifera Deciduous_Tree Vasc
[R] rmysql query help
R HELP, I am trying to use an R script to connect to a mysql database. I am having a problem using a variable in the where clause that contains a "space" in its value. If I include the variable inside the quotes of the query - i think it is searching for the name of the variable in the database and not the value of the variable. If I put it outside the quotes, then it complains about the space. Are there special escape characters or something else Im missing? This date format in a mysql table is pretty standard Any ideas? Thanks, Aaron require(RMySQL) startdatetime<-"2009-04-04 01:00:00" connect <- dbConnect(MySQL(),user="x",password="xx",dbname="x",host="xxx.xxx.xxx.xxx") forecast <- dbSendQuery(connect, statement=paste("SELECT ICE FROM table1 WHERE BEGTIME >= 'startdatetime'")) # doesnt read variable or forecast <- dbSendQuery(connect, statement=paste("SELECT ICE FROM table1 WHERE BEGTIME >="startdatetime)) # space error but this seems to work forecast <- dbSendQuery(connect, statement=paste("SELECT ICE FROM table1 WHERE BEGTIME >='2009-04-04 01:00:00'")) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optimization challenge
FYI, in bioinformatics, we use dynamic programming algorithms in similar ways to solve similar problems of finding guaranteed-optimal partitions in streams of data (usually DNA or protein sequence, but sometimes numerical data from chip-arrays). These "path optimization" algorithms are often called Viterbi algorithms, a web search for which should provide multiple references. The solutions are not necessarily unique (there may be multiple paths/partitions with identical integer maxima in some systems) and there is much research on whether the optimal solution is actually the one you want to work with (for example, there may be a fair amount of probability mass within an area/ensemble of suboptimal solutions that overall have greater posterior probabilities than does the optimal solution "singleton"). See Chip Lawrence's PNAS paper for more erudite discussion, and references therein: www.pnas.org/content/105/9/3209.abstract -Aaron P.S. Good to see you here Albyn -- I enjoyed your stat. methods course at Reed back in 1993, which started me down a somewhat windy road to statistical genomics! -- Aaron J. Mackey, PhD Assistant Professor Center for Public Health Genomics University of Virginia amac...@virginia.edu On Wed, Jan 13, 2010 at 5:23 PM, Ravi Varadhan wrote: > Greg - thanks for posting this interesting problem. > > Albyn - thanks for posting a solution. Now, I have some questions: (1) is > the algorithm guaranteed to find a "best" solution? (2) can there be > multiple solutions (it seems like there can be more than 1 solution > depending on the data)?, and (3) is there a good reference for this and > similar algorithms? > > Thanks & Best, > Ravi. > > > > --- > > Ravi Varadhan, Ph.D. > > Assistant Professor, The Center on Aging and Health > > Division of Geriatric Medicine and Gerontology > > Johns Hopkins University > > Ph: (410) 502-2619 > > Fax: (410) 614-9625 > > Email: rvarad...@jhmi.edu > > Webpage: > > http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h > tml<http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml> > > > > > > > > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On > Behalf Of Albyn Jones > Sent: Wednesday, January 13, 2010 1:19 PM > To: Greg Snow > Cc: r-help@r-project.org > Subject: Re: [R] optimization challenge > > The key idea is that you are building a matrix that contains the > solutions to smaller problems which are sub-problems of the big > problem. The first row of the matrix SSQ contains the solution for no > splits, ie SSQ[1,j] is just the sum of squares about the overall mean > for reading chapters1 through j in one day. The iteration then uses > row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j > chapters in m-1 days) is part of the overall optimal solution, you > have already computed it, and so don't ever need to recompute it. > > TS = SSQ[m-1,j]+(SSQ1[j+1]) > > computes the vector of possible solutions for SSQ[m,n] (n chapters in n > days) > breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1 > to > n in 1 day. j is a vector in the function, and min(TS) is the minimum > over choices of j, ie SSQ[m,n]. > > At the end, SSQ[128,239] is the optimal value for reading all 239 > chapters in 128 days. That's just the objective function, so the rest > involves constructing the list of optimal cuts, ie which chapters are > grouped together for each day's reading. That code uses the same > idea... constructing a list of lists of cutpoints. > > statisticians should study a bit of data structures and algorithms! > > albyn > > On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote: > > WOW, your results give about half the variance of my best optim run > (possibly due to my suboptimal use of optim). > > > > Can you describe a little what the algorithm is doing? > > > > -- > > Gregory (Greg) L. Snow Ph.D. > > Statistical Data Center > > Intermountain Healthcare > > greg.s...@imail.org > > 801.408.8111 > > > > > > > -Original Message- > > > From: Albyn Jones [mailto:jo...@reed.edu] > > > Sent: Tuesday, January 12, 2010 5:31 PM > > > To: Greg Snow > > > Cc: r-help@r-project.org > > > Subject: Re: [R] optimization challenge > > > > > > Greg > > > > > > Nice problem: I wa
[R] prop.test CI depends on null hypothesis?
Why does prop.test use continuity correction "only if it does not exceed the difference between sample and null proportions in absolute value"? I'm referring here to the single group method, though I believe there is a similar issue with the two group method. What this means in practice is that the confidence interval changes depending on the null hypothesis; see examples below. This is unexpected, and I have been unable to find any documentation explaining why this is done (see links below examples). ## when the null proportion is equal to the sample proportion, it does not ## use the continuity correction, even when one is asked for > prop.test(30,60,p=0.5, correct=TRUE) 1-sample proportions test without continuity correction data: 30 out of 60, null probability 0.5 X-squared = 0, df = 1, p-value = 1 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.3773502 0.6226498 sample estimates: p 0.5 ## however, when the null proportion is not equal to the sample proportion, ## it does use the continuity correction when it is asked for. > prop.test(30,60,p=0.499, correct=TRUE) 1-sample proportions test with continuity correction data: 30 out of 60, null probability 0.499 X-squared = 0, df = 1, p-value = 1 alternative hypothesis: true p is not equal to 0.499 95 percent confidence interval: 0.3764106 0.6235894 sample estimates: p 0.5 The documentation refers to Newcombe's 1998 Statistics in Medicine article; I read through this and found nothing about not using the continuity correction in this situation. https://doi.org/10.1002/(SICI)1097-0258(19980430)17:8%3C857::AID-SIM777%3E3.0.CO;2-E On this mailing list, there was a 2013 post "prop.test correct true and false gives same answer", which was answered only with the quote from the help page: https://stat.ethz.ch/pipermail/r-help/2013-March/350386.html I also found several questions asking which Newcombe method is implemented, which didn't elicit specific answers; here's one from 2011: https://stat.ethz.ch/pipermail/r-help/2011-April/274086.html -- Aaron Rendahl, Ph.D. Assistant Professor of Statistics and Informatics College of Veterinary Medicine, University of Minnesota 295L AS/VM, 612-301-2161 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prop.test CI depends on null hypothesis?
I believe this is correct behavior for computing the p-value, though the wording is awkward in that it implies that R is not implementing the continuity correction in this situation, when in fact, this behavior is part of how the continuity correction is defined. The correction simply treats the normal approximation as appropriately discrete, so (to translate to a binomial variable) computes P(X > 12) using P(X > 11.5). The case the documentation discusses is simply the case where the null hypothesis falls within the discrete band corresponding to the observed value; in this only enough of the correction is used so that the test statistic is appropriately zero and the p-value is 1. However, this is not correct behavior for the confidence interval. There is nothing in any of the listed documentation that would support such behavior, and additionally, it doesn't make sense for a confidence interval to depend on a null parameter. If continuity correction is desired, the edges of the confidence bound should still be fully adjusted even when the observed proportion is close to the null parameter. What's currently happening is that it's not adjusted at all when the observed proportion equals the null proportion, and in cases where it is not equal but still close enough that the correction is adjusted, the confidence intervals are neither "with" correction" or "without" correction but instead somewhere in between! An additional confusing matter is how R reports whether the test was performed "with" or "without" continuity correction; this is determined in code by whether or not the adjusted correction is zero or not. This happens when the observed proportion equals the null proportion, so whenever that happens, it's reported "without" continuity correction, so this "flips" on the user in this case. Though oddly (to the user), changing the null p by a tiny amount gives only tiny changes to the result but then it is reported "with" correction. This behavior has presumably been in R for a long time (though I haven't checked the code history), so I would love to have feedback from the R-help community about: * does the current behavior really make sense, and I've just misunderstood something? * is there documentation or discussions about this behavior out there somewhere that I've missed? * if this really is a "new" discovery, how best to bring it to the attention of those who can decide what to do about it? Thanks! On Mon, Oct 21, 2019 at 11:33 AM Aaron Rendahl wrote: > Why does prop.test use continuity correction "only if it does not exceed > the difference between sample and null proportions in absolute value"? I'm > referring here to the single group method, though I believe there is a > similar issue with the two group method. > > What this means in practice is that the confidence interval changes > depending on the null hypothesis; see examples below. This is unexpected, > and I have been unable to find any documentation explaining why this is > done (see links below examples). > > ## when the null proportion is equal to the sample proportion, it does not > ## use the continuity correction, even when one is asked for > > > prop.test(30,60,p=0.5, correct=TRUE) > > 1-sample proportions test without continuity correction > > data: 30 out of 60, null probability 0.5 > X-squared = 0, df = 1, p-value = 1 > alternative hypothesis: true p is not equal to 0.5 > 95 percent confidence interval: > 0.3773502 0.6226498 > sample estimates: > p > 0.5 > > ## however, when the null proportion is not equal to the sample > proportion, > ## it does use the continuity correction when it is asked for. > > > prop.test(30,60,p=0.499, correct=TRUE) > > 1-sample proportions test with continuity correction > > data: 30 out of 60, null probability 0.499 > X-squared = 0, df = 1, p-value = 1 > alternative hypothesis: true p is not equal to 0.499 > 95 percent confidence interval: > 0.3764106 0.6235894 > sample estimates: > p > 0.5 > > > The documentation refers to Newcombe's 1998 Statistics in Medicine > article; I read through this and found nothing about not using the > continuity correction in this situation. > > https://doi.org/10.1002/(SICI)1097-0258(19980430)17:8%3C857::AID-SIM777%3E3.0.CO;2-E > > On this mailing list, there was a 2013 post "prop.test correct true and > false gives same answer", which was answered only with the quote from the > help page: https://stat.ethz.ch/pipermail/r-help/2013-March/350386.html > > I also found several questions asking which Newcombe method is > implemented, which didn't elicit specific answers; here's one from 2011: > https://stat.ethz.ch/p
Re: [R] XYZ data
for plotting purposes, I typically jitter() the x's and y's to see the otherwise overlapping data points -Aaron On Wed, Jun 26, 2013 at 12:29 PM, Shane Carey wrote: > Nope, neither work. :-( > > > On Wed, Jun 26, 2013 at 5:16 PM, Clint Bowman wrote: > > > John, > > > > That still leaves a string of identical numbers in the vector. > > > > Shane, > > > > ?jitter > > > > perhaps jitter(X,1,0.0001) > > > > Clint > > > > Clint BowmanINTERNET: cl...@ecy.wa.gov > > Air Quality Modeler INTERNET: cl...@math.utah.edu > > Department of Ecology VOICE: (360) 407-6815 > > PO Box 47600FAX:(360) 407-7534 > > Olympia, WA 98504-7600 > > > > USPS: PO Box 47600, Olympia, WA 98504-7600 > > Parcels:300 Desmond Drive, Lacey, WA 98503-1274 > > > > On Wed, 26 Jun 2013, John Kane wrote: > > > > mm <- 1:10 > >> nn <- mm + .001 > >> > >> John Kane > >> Kingston ON Canada > >> > >> > >> -Original Message- > >>> From: careys...@gmail.com > >>> Sent: Wed, 26 Jun 2013 16:48:34 +0100 > >>> To: r-help@r-project.org > >>> Subject: [R] XYZ data > >>> > >>> I have x, y, z data. The x, y fields dont change but Z does. How do I > add > >>> a > >>> very small number onto the end of each x, y data point. > >>> > >>> For example: > >>> > >>> Original (X) Original (Y) Original (Z) > >>> 15 20 30 > >>> 15 20 40 > >>> > >>> > >>> > >>> > >>> New (X) New (Y) New (Z) > >>> 15.1 20.01 30 > >>> 15.2 20.02 40 > >>> > >>> > >>> Thanks > >>> -- > >>> Shane > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> __** > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/**listinfo/r-help< > https://stat.ethz.ch/mailman/listinfo/r-help> > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/**posting-guide.html< > http://www.R-project.org/posting-guide.html> > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> __**__ > >> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! > >> > >> __** > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/**listinfo/r-help< > https://stat.ethz.ch/mailman/listinfo/r-help> > >> PLEASE do read the posting guide http://www.R-project.org/** > >> posting-guide.html <http://www.R-project.org/posting-guide.html> > >> and provide commented, minimal, self-contained, reproducible code. > >> > >> > > > -- > Shane > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] points3d and ordirgl
Hello all, I have been using the function ordirgl to plot 3D dynamic ordinations. The ordirgl function works just fine. IN fact, I was even able to write a function that allows me to identify points in the 3D plot: identify.rgl<-function(env_var,ord,dim1,dim2,dim3) { tmp<-select3d(button="left") tmp.keep<-tmp(ord[,dim1],ord[,dim2],ord[,dim3]) env_var[tmp.keep=="TRUE"] } where env_var = a variable to be identified (e.g. plot IDs as in > row.names(dataframe)) ord = ordination points or scores created using a function such as metaMDS or nmds) that is recognized by points or scores dim1 = dimension 1 (e.g., 1) dim2 = dimension 2 (e.g., 2) dim 3 = dimension 3 (e.g, 3 e.g., > identify.rgl(row.names(vegmat),veg_nmds$points,1,2,3) My issue is that I would like to use the points3d function to add points of different colors and sizes to the dynamic 3D plot created by using ordirgl. In my case the different colored and sized points represent different clusters from the results of the Partitioning Around Mediods (pam) clustering function (from library cluster). I have used this with success in the past (two years back), but can't get it to work properly now. An example of the code I have used in the past is: > points3d(veg_nmds$points[,1],veg_nmds$points[,2],veg_nmds$points[,3],display = "sites",veg_pam12$clustering=="1",col=2,size=3) The code above is intended to add the points from cluster 1 to the nmds plot in the color red and size 3. Anyone have an ideas? Thanks, Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Renaming variables
On Fri, Sep 20, 2013 at 10:10 AM, Preetam Pal wrote: > I have 25 variables in the data file (name: score), i.e. X1,X2,.,X25. > > I dont want to use score$X1, score$X2 everytime I use these variables. > attach(score) plot(X1, X2) # etc. etc. -Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combine glmnet and coxph (and survfit) with strata()
I'm also curious how to use glmnet with survfit -- specifically, for use with interval regression (which, under the hood, is implemented using survfit). Can you show how you converted your Surv object formula to a design matrix for use with glmnet? Thanks, -Aaron On Sun, Dec 8, 2013 at 12:45 AM, Jieyue Li wrote: > Dear All, > > I want to generate survival curve with cox model but I want to estimate the > coefficients using glmnet. However, I also want to include a strata() term > in the model. Could anyone please tell me how to have this strata() effect > in the model in glmnet? I tried converting a formula with strata() to a > design matrix and feeding to glmnet, but glmnet just treats the strata() > term with one independent variable... > > I know that if there is no such strata(), I can estimate coefficients from > glmnet and use "...init=selectedBeta,iter=0)" in the coxph. Please advise > me or also correct me if I'm wrong. > > Thank you very much! > > Best, > > Jieyue > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generate random percentages and placing vectors
Thanks for the help! However, for the code in #2, it seems to just randomly split up the vectors. I would still like to keep the integrity of each vector. For example: if v1 = (1,2,3) v2 = (4,5,6) output = (0,0,0,1,2,3,0,0,0,0,4,5,6,0,0,0,0,0,0,0,0,0,0,0,0) - which has a specified length of 25 With v1 and v2 inserted in at random locations. On Wed, Oct 27, 2010 at 10:25 AM, Jonathan P Daily wrote: > > 1) > rands <- runif(5) > rands <- rands/sum(rands)*100 > > 2) > # assume vectors are v1, v2, etc. > v_all <- c(v1, v2, ...) > v_len <- length(v_all) > > output <- rep(0,25) > output[sample(1:25, v_len)] <- v_all > > -- > Jonathan P. Daily > Technician - USGS Leetown Science Center > 11649 Leetown Road > Kearneysville WV, 25430 > (304) 724-4480 > "Is the room still a room when its empty? Does the room, > the thing itself have purpose? Or do we, what's the word... imbue it." > - Jubal Early, Firefly > > > From: Aaron Lee To: r-help@r-project.org Date: > 10/27/2010 > 11:06 AM Subject: [R] Generate random percentages and placing vectors Sent > by: r-help-boun...@r-project.org > -- > > > > Hello everyone, > > I have two questions: > > 1.) I would like to generate random percentages that add up to 100. For > example, if I need 5 percentages, I would obtain something like: 20, 30, > 40, > 5, 5. Is there some way to do this in R? > > 2.) I would like to insert vectors of specified length into a larger vector > of specified length randomly, and fill the gaps with zeroes. For example, > if > I have 3 vectors of length 3, 2, and 2 with values and I would like to > randomly place them into a vector of length 25 made of 0's. > > Thank you in advance! > > -Aaron > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Arrange elements on a matrix according to rowSums + short 'apply' Q
Greetings, My goal is to create a Markov transition matrix (probability of moving from one state to another) with the 'highest traffic' portion of the matrix occupying the top-left section. Consider the following sample: inputData <- c( c(5, 3, 1, 6, 7), c(9, 7, 3, 10, 11), c(1, 2, 3, 4, 5), c(2, 4, 6, 8, 10), c(9, 5, 2, 1, 1) ) MAT <- matrix(inputData, nrow = 5, ncol = 5, byrow = TRUE) colnames(MAT) <- c("A", "B", "C", "D", "E") rownames(MAT) <- c("A", "B", "C", "D", "E") rowSums(MAT) I wan to re-arrange the elements of this matrix such that the elements with the largest row sums are placed to the top-left, in descending order. Does this make sense? In this case the order I'm looking for would be B, D, A, E, C Any thoughts? As an aside, here is the function I've written to construct the transition matrix. Is there a more elegant way to do this that doesn't involve a double transpose? TMAT <- apply(t(MAT), 2, function(X) X/sum(X)) TMAT <- t(TMAT) I tried the following: TMAT <- apply(MAT, 1, function(X) X/sum(X)) But my the custom function is still getting applied over the columns of the array, rather than the rows. For a check try: rowSums(TMAT) colSums(TMAT) Row sums here should equal 1... Many thanks in advance, Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Arrange elements on a matrix according to rowSums + short 'apply' Q
Ivan and Michael, Many thanks for the tips, those solved my queries. Still interested in how to force custom functions to work over rows rather than columns when using apply, but the MAT/rowSums(MAT) technique is definitely the most efficient way to go for this application. Cheers, Aaron 2010/12/2 Michael Bedward > Hi Aaron, > > Following up on Ivan's suggestion, if you want the column order to > mirror the row order... > > mo <- order(rowSums(MAT), decreasing=TRUE) > MAT2 <- MAT[mo, mo] > > Also, you don't need all those extra c() calls when creating > inputData, just the outermost one. > > Regarding your second question, your statements... > > TMAT <- apply(t(MAT), 2, function(X) X/sum(X)) > TMAT <- t(TMAT) > > is actually just a complicated way of doing this... > > TMAT <- MAT / rowSums(MAT) > > You can confirm that by doing it your way and then this... > > TMAT == MAT / rowSums(MAT) > > ...and you should see a matrix of TRUE values > > Michael > > > On 2 December 2010 20:43, Ivan Calandra > wrote: > > Hi, > > > > Here is a not so easy way to do your first step, but it works: > > MAT2 <- cbind(MAT, rowSums(MAT)) > > MAT[order(MAT2[,6], decreasing=TRUE),] > > > > For the second, I don't know! > > > > HTH, > > Ivan > > > > > > Le 12/2/2010 09:46, Aaron Polhamus a écrit : > >> > >> Greetings, > >> > >> My goal is to create a Markov transition matrix (probability of moving > >> from > >> one state to another) with the 'highest traffic' portion of the matrix > >> occupying the top-left section. Consider the following sample: > >> > >> inputData<- c( > >> c(5, 3, 1, 6, 7), > >> c(9, 7, 3, 10, 11), > >> c(1, 2, 3, 4, 5), > >> c(2, 4, 6, 8, 10), > >> c(9, 5, 2, 1, 1) > >> ) > >> > >> MAT<- matrix(inputData, nrow = 5, ncol = 5, byrow = TRUE) > >> colnames(MAT)<- c("A", "B", "C", "D", "E") > >> rownames(MAT)<- c("A", "B", "C", "D", "E") > >> > >> rowSums(MAT) > >> > >> I wan to re-arrange the elements of this matrix such that the elements > >> with > >> the largest row sums are placed to the top-left, in descending order. > Does > >> this make sense? In this case the order I'm looking for would be B, D, > A, > >> E, > >> C Any thoughts? > >> > >> As an aside, here is the function I've written to construct the > transition > >> matrix. Is there a more elegant way to do this that doesn't involve a > >> double > >> transpose? > >> > >> TMAT<- apply(t(MAT), 2, function(X) X/sum(X)) > >> TMAT<- t(TMAT) > >> > >> I tried the following: > >> > >> TMAT<- apply(MAT, 1, function(X) X/sum(X)) > >> > >> But my the custom function is still getting applied over the columns of > >> the > >> array, rather than the rows. For a check try: > >> > >> rowSums(TMAT) > >> colSums(TMAT) > >> > >> Row sums here should equal 1... > >> > >> Many thanks in advance, > >> Aaron > >> > >>[[alternative HTML version deleted]] > >> > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > -- > > Ivan CALANDRA > > PhD Student > > University of Hamburg > > Biozentrum Grindel und Zoologisches Museum > > Abt. Säugetiere > > Martin-Luther-King-Platz 3 > > D-20146 Hamburg, GERMANY > > +49(0)40 42838 6231 > > ivan.calan...@uni-hamburg.de > > > > ** > > http://www.for771.uni-bonn.de > > http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > -- Aaron Polhamus Statistical consultant, Revolution Analytics MSc Applied Statistics, The University of Oxford, 2009 838a NW 52nd St, Seattle, WA 98107 Cell: +1 (206) 380.3948 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Writing out data from a list
Hello, I have a list of data, such that: [[1]] [1] 0.00 0.00 0.03 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.01 0.00 0.00 0.03 0.01 0.00 0.01 0.00 0.03 0.16 0.14 0.02 0.17 0.01 0.01 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 [42] 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.04 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 [[2]] [1] 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.07 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 [[3]] [1] 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 etc. I would like to write to a text file with this data, but would like each section of the file to be separated by some text. For example: "Event 1" "Random Text" 0 0 0.03 0.01 "Event 2" "Random Text" 0 0 0 0 0.01 etc. Is there some way to continually write text out using a loop and also attaching a string before each data segment? Thank you in advance! -Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Difficult with round() function
Dear list, I'm writing a function to re-grid a data set from finer to coarser resolutions in R as follows (I use this function with sapply/apply): gridResize <- function(startVec = stop("What's your input vector"), to = stop("Missing 'to': How long do you want the fnial vector to be?")){ from <- length(startVec) shortVec<-numeric() tics <- from*to for(j in 1:to){ interval <- ((j/to)*tics - (1/to)*tics + 1):((j/to)*tics) benchmarks <- interval/to #FIRST RUN ASSUMES FINAL BENCHMARK/TO IS AN INTEGER... positions <- which(round(benchmarks) == benchmarks) indeces <- benchmarks[positions] fracs <- numeric() #SINCE MUCH OF THE TIME THIS WILL NOT BE THE CASE, THIS SCRIPT DEALS WITH THE REMAINDER... for(i in 1:length(positions)){ if(i == 1) fracs[i] <- positions[i]/length(benchmarks) else{ fracs[i] <- (positions[i] - sum(positions[1:(i-1)]))/length(benchmarks) } } #AND UPDATES STARTVEC INDECES AND FRACTION MULTIPLIERS if(max(positions) != length(benchmarks)) indeces <- c(indeces, max(indeces) + 1) if(sum(fracs) != 1) fracs <- c(fracs, 1 - sum(fracs)) fromVals <- startVec[indeces] if(any(is.na(fromVals))){ NAindex <- which(is.na(fromVals)) if(sum(Fracs[-NAindex]) >= 0.5) shortVec[j] <- sum(fromVals*fracs, na.rm=TRUE) else shortVec[j] <- NA }else{shortVec[j] <- sum(fromVals*fracs)} } return(shortVec) } for the simple test case test <- gridResize(startVec = c(2,4,6,8,10,8,6,4,2), to = 7) the function works fine. For larger vectors, however, it breaks down. E.g.: test <- gridResize(startVec = rnorm(300, 9, 20), to = 200) This returns the error: Error in positions[1:(i - 1)] : only 0's may be mixed with negative subscripts and the problem seems to be in the line positions <- which(round(benchmarks) == benchmarks). In this particular example the code cracks up at j = 27. When set j = 27 and run the calculation manually I discover the following: > benchmarks[200] [1] 40 > benchmarks[200] == 40 [1] FALSE > round(benchmarks[200]) == 40 [1] TRUE Even though my benchmark calculation seems to be returning a clean integers to serve as inputs for the creation of the 'positions' variable, for whatever reason R doesn't read it that way. I would be very grateful for any advice on how I can either alter my approach entirely (I am sure there is a far more elegant way to regrid data in R) or a simple fix for this rounding error. Many thanks in advance, Aaron -- Aaron Polhamus Statistical consultant, Revolution Analytics MSc Applied Statistics, The University of Oxford, 2009 838a NW 52nd St, Seattle, WA 98107 Cell: +1 (206) 380.3948 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Shrink file size of pdf graphics
You can try something like this, at the command line: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf evidently, the new compactPDF() function in R 2.13 does something very similar. -Aaron On Thu, May 19, 2011 at 11:30 AM, Duncan Murdoch wrote: > > On 19/05/2011 11:14 AM, Layman123 wrote: >> >> Hi everyone, >> >> My data consists of a system of nearly 75000 roads, available as a >> shapefile. When I plot the road system, by adding the individual roads with >> 'lines' and store it as a pdf-file with 'pdf' I get a file of size 13 MB. >> This is way too large to add it in my LaTeX-document, because there will be >> some more graphics of this type. >> Now I'm curious to learn wheter there is a possibility in R to shrink the >> file size of this graphic? I merely need it in a resolution so that it looks >> "smooth" when printed out. I don't know much about the storage of R >> graphics, but maybe there is a way to change the way the file is stored >> perhaps as a pixel image? > > > There are several possibilities. You can use a bitmapped device (e.g. png()) > to save the image; pdflatex can include those. > > You can compress the .pdf file using an external tool like pdftk (or do it > internally in R 2.14.x, coming soon). > > There are probably others... > > Duncan Murdoch > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Variable in file name png
Hi, I'm having trouble with getting the png function to properly produce multiple graphs. RIght now I have: for (z in data) { png(file=z,bg="white") thisdf<-data[[z]] plot(thisdf$rc,thisdf$psi) dev.off() } Which should take the "data" object, a list of data sets and produce a graph of each with respect to the two variables rc and psi. I want the names to change for each graph, but am not sure how to do it, any help would be apreciated. Thanks, -Acoutino __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running gplots package with Windows 7
Thank-you for the replies. I believe I figured out what the problem was. When I installed the package on linux it ran smoothly, but I just need to install a lot of accessory packages to make gplots work with Windows. Thanks again, Aaron 2010/5/17 Uwe Ligges > Additionally, please give the full output that let you assume > "The package installs fine"... > > Uwe Ligges > > > > On 17.05.2010 10:18, Henrik Bengtsson wrote: > >> I won't have an answer but it will help others to help you if you also >> report what the following gives: >> >> library("gtools"); >> print(sessionInfo()); >> >> and >> >> print(packageDescription("gtools")); >> >> My $.02 >> >> Henrik >> >> >> On Mon, May 17, 2010 at 4:01 AM, agusdon wrote: >> >>> >>> Hello, >>> >>> I'm fairly new to R and am running version 2.11.0 with Windows 7. I need >>> to >>> run the package gplots. The package installs fine, but when I try to >>> load >>> it I receive the message: >>> >>> Loading required package: gtools >>> Error: package 'gtools' could not be loaded >>> In addition: Warning message: >>> In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = >>> lib.loc) : >>> there is no package called 'gtools' >>> >>> After that, the package has some functionality but I cannot run the >>> barplot2 >>> command, which is what I need to use the most. >>> >>> If anyone has suggests how to fix this problem, I would be very grateful. >>> >>> Thanks! >>> >>> Aaron >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/Running-gplots-package-with-Windows-7-tp2219020p2219020.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Complex sampling?
What I think you need is something along the lines of: matrix(c(sample(3:7), sample(3:7), sample(3:7), sample(3:7), ...), nrow=2) now, each column are your random pairs. -Aaron On Wed, Mar 9, 2011 at 1:01 PM, Hosack, Michael wrote: > > -Original Message- > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at > r-project.org] > > On Behalf Of Hosack, Michael > > Sent: Wednesday, March 09, 2011 7:34 AM > > To: r-help at R-project.org > > Subject: [R] Complex sampling? > > > > R users, > > > > I am trying to generate a randomized weekday survey schedule that ensures > > even coverage of weekdays in > > the sample, where the distribution of variable DOW is random with respect > > to WEEK. To accomplish this I need > > to randomly sample without replacement two weekdays per week for each of > > 27 weeks (only 5 are shown). > > This seems simple enough, sampling without replacement. > > However, > > I need to sample from a sequence (3:7) that needs to be completely > > depleted and replenished until the > > final selection is made. Here is an example of what I want to do, > > beginning at WEEK 1. I would prefer to do > > this without using a loop, if possible. > > > > sample frame: [3,4,5,6,7] --> [4,5,6] --> [4],[1,2,3,(4),5,6] --> > > [1,2,4,5,6] --> for each WEEK in dataframe > > OK, now you have me completely lost. Sorry, but I have no clue as to what > you just did here. I looks like you are trying to describe some > transformation/algorithm but I don't follow it. > > > > I could not reply to this email because it not been delivered to my inbox, > so I had to copy it from the forum. > I apologize for the confusion, this would take less than a minute to > explain in conversation but an hour > to explain well in print. Two DOW_NUMs will be selected randomly without > replacement from the vector 3:7 for each WEEK. When this vector is reduced > to a single integer that integer will be selected and the vector will be > restored and a single integer will then be selected that differs from the > prior selected integer (i.e. cannot sample the same day twice in the same > week). This process will be repeated until two DOW_NUM have been assigned > for each WEEK. That process is what I attempted to illustrate in my original > message. This is beyond my current coding capabilities. > > > > > > > Randomly sample 2 DOW_NUM without replacement from each WEEK ( () = no > two > > identical DOW_NUM can be sampled > > in the same WEEK) > > > > sample = {3,7}, {5,6}, {4,3}, {1,5}, --> for each WEEK in dataframe > > > > So, are you sampling from [3,4,5,6,7], or [1,2,4,5,6], or ...? Can you > show an 'example' of what you would like to end up given your data below? > > > > > Thanks you, > > > > Mike > > > > > > DATE DOW DOW_NUM WEEK > > 2 2011-05-02 Mon 31 > > 3 2011-05-03 Tue 41 > > 4 2011-05-04 Wed 51 > > 5 2011-05-05 Thu 61 > > 6 2011-05-06 Fri 71 > > 9 2011-05-09 Mon 32 > > 10 2011-05-10 Tue 42 > > 11 2011-05-11 Wed 52 > > 12 2011-05-12 Thu 62 > > 13 2011-05-13 Fri 72 > > 16 2011-05-16 Mon 33 > > 17 2011-05-17 Tue 43 > > 18 2011-05-18 Wed 53 > > 19 2011-05-19 Thu 63 > > 20 2011-05-20 Fri 73 > > 23 2011-05-23 Mon 34 > > 24 2011-05-24 Tue 44 > > 25 2011-05-25 Wed 54 > > 26 2011-05-26 Thu 64 > > 27 2011-05-27 Fri 74 > > 30 2011-05-30 Mon 35 > > 31 2011-05-31 Tue 45 > > 32 2011-06-01 Wed 55 > > 33 2011-06-02 Thu 65 > > 34 2011-06-03 Fri 75 > > > > DF <- > > structure(list(DATE = structure(c(15096, 15097, 15098, 15099, > > 15100, 15103, 15104, 15105, 15106, 15107, 15110, 15111, 15112, > > 15113, 15114, 15117, 15118, 15119, 15120, 15121, 15124, 15125, > > 15126, 15127, 15128), class = "Date"), DOW = c("Mon", "Tue", > > "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Mon", > > "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", > > "Mon", "Tue", "Wed", "Thu", "Fri"), DOW_NUM = c(3, 4, 5, 6, 7, > > 3, 4, 5, 6, 7, 3, 4
[R] Easy 'apply' question
Dear list, I couldn't find a solution for this problem online, as simple as it seems. Here's the problem: #Construct test dataframe tf <- data.frame(1:3,4:6,c("A","A","A")) #Try the apply function I'm trying to use test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1]) #Look at the output--all columns treated as character columns... test #Look at the format of the original data--the first two columns are integers. str(tf) In general terms, I want to differentiate what function I apply over a row/column based on what type of data that row/column contains. Here I want a simple mean if the column is numeric, and the first unique value if the column is a character column. As you can see, 'apply' treats all columns as characters the way I've written his function. Any thoughts? Many thanks in advance, Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Easy 'apply' question
Perfect, thanks Josh! Cheers, A 2011/3/10 Joshua Wiley > Dear Aaron, > > The problem is not with your function, but using apply(). Look at the > "Details" section of ?apply You will see that if the data is not an > array or matrix, apply will coerce it to one (or try). Now go over to > the "Details" section of ?matrix and you will see that matrices can > only contain a single class of data and that this follows a hierarchy. > In short, your data frame is coerced to a data frame and the classes > are all coerced to the highest---character. You can use lapply() > instead to get your desired results. Here is an example: > > ## Construct (named) test dataframe > tf <- data.frame(x = 1:3, y = 4:6, z = c("A","A","A")) > > ## Show why what you tried did not work > (test <- apply(tf, 2, class)) > > ## using lapply() > (test <- lapply(tf, function(x) { > if(is.numeric(x)) mean(x) else unique(x)[1]})) > > > Hope this helps, > > Josh > > On Thu, Mar 10, 2011 at 5:11 PM, Aaron Polhamus > wrote: > > Dear list, > > > > I couldn't find a solution for this problem online, as simple as it > seems. > > Here's the problem: > > > > > > #Construct test dataframe > > tf <- data.frame(1:3,4:6,c("A","A","A")) > > > > #Try the apply function I'm trying to use > > test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else > unique(x)[1]) > > > > #Look at the output--all columns treated as character columns... > > test > > > > #Look at the format of the original data--the first two columns are > > integers. > > str(tf) > > > > > > In general terms, I want to differentiate what function I apply over a > > row/column based on what type of data that row/column contains. Here I > want > > a simple mean if the column is numeric, and the first unique value if the > > column is a character column. As you can see, 'apply' treats all columns > as > > characters the way I've written his function. > > > > Any thoughts? Many thanks in advance, > > Aaron > > > >[[alternative HTML version deleted]] > > > > ______ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > http://www.joshuawiley.com/ > -- Aaron Polhamus NASA Jet Propulsion Lab Statistical consultant, Revolution Analytics 160 E Corson Street Apt 207, Pasadena, CA 91103 Cell: +1 (206) 380.3948 Email: [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reporting odds ratios or risk ratios from GLM
OR <- exp(coef(GLM.2)[-1]) OR.ci <- exp(confint(GLM.2)[-1,]) -Aaron On Tue, Mar 15, 2011 at 1:25 PM, lafadnes wrote: > I am a new R user (am using it through the Rcmdr package) and have > struggled > to find out how to report OR and RR directly when running GLM models (not > only reporting coefficients.) > > Example of the syntax that I have used: > > GLM.2 <- glm(diarsev ~ treatmentarm +childage +breastfed, > family=binomial(logit), data=fieldtrials2) > summary(GLM.2) > > This works well except that I manually have to calculate the OR based on > the > coefficients. Can I get these directly (with confidence intervals) by just > amending the syntax? > > Will be grateful for advice! > > -- > View this message in context: > http://r.789695.n4.nabble.com/Reporting-odds-ratios-or-risk-ratios-from-GLM-tp3357209p3357209.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Determining frequency and period of a wave
Hello! I'm collecting data on a refrigerator that I'm using to cure meat. Specifically I am collection humidity and temperature readings. The temperature readings look sinusoidal (due to the refrigerator turning on and off). I'd like to calculate the frequency and period of the wave so that I can determine if modifications I make to the equipment are increasing or decreasing efficiency. Unfortunately, I'm pretty new to R, so I'm not sure how to figure this out. I *suspect* I should be doing an fft on the temperature data, but I'm not sure where to go from there. Here is a graph I'm producing: http://i.imgur.com/WpsDi.png Here is the program I have so far: https://github.com/tenderlove/rsausage/blob/master/graphing.r I have posted a repository with a SQLite database that has the data I've collected here: https://github.com/tenderlove/rsausage Any help would be greatly appreciated! -- Aaron Patterson http://tenderlovemaking.com/ pgp8lz0FZdvoQ.pgp Description: PGP signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hardy Weinberg
H-W only gives you the expected frequency of AA, AB, and BB genotypes (i.e. a 1x3 table): minor <- runif(1, 0.05, 0.25) major <- 1-minor AA <- minor^2 AB <- 2*minor*major BB <- major^2 df <- cbind(AA, AB, BB) -Aaron On Tue, Jun 21, 2011 at 9:30 PM, Jim Silverton wrote: > Hello all, > I am interested in simulating 10,000 2 x 3 tables for SNPs data with the > Hardy Weinberg formulation. Is there a quick way to do this? I am assuming > that the minor allelle frequency is uniform in (0.05, 0.25). > > -- > Thanks, > Jim. > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very slow optim()
Why use a hammer when you need a wrench? Admb seems to be the best tool for the job. It has several slick interfaces with R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HWEBayes, swapping the homozygotes genotype frequencies
Without really knowing this code, I can guess that it may be the "triangular" prior at work. Bayes Factors are notorious for being sensitive to the prior. Presumably, the prior somehow prefers to see the rarer allele as the "BB", and not the "AA" homozygous genotype (this is a common assumption: that AA is the reference, and thus the major, more frequent, allele). -Aaron On Sat, Oct 8, 2011 at 7:52 PM, stat999 wrote: > I evaluated the Bayes factor in the k=2 allele case with a "triangular" > prior under the null as in the example in the help file: > > > HWETriangBF2(nvec=c(88,10,2)) > [1] 0.4580336 > > When I swap the n11 entry and n22 entry of nvec, I received totally > different Bayes factor: > > > > > HWETriangBF2(nvec=c(2,10,88)) > [1] 5.710153 > > > > In my understanding, defining the genotype frequency as n11 or n22 are > arbitrary. > So I was expecting the same value of Bayes factor. > > This is the case for conjugate Dirichlet prior: > >DirichNormHWE(nvec=c(88,10,2), c(1,1))/DirichNormSat(nvec=c(88,10,2), > c(1,1,1)) > [1] 1.542047 > >DirichNormHWE(nvec=c(2,10,88), c(1,1))/DirichNormSat(nvec=c(2,10,88), > c(1,1,1)) > [1] 1.542047 > > Could you explain why the HWETriangBF2 is returining completely different > values of Bayes Factor?? > > > -- > View this message in context: > http://r.789695.n4.nabble.com/HWEBayes-swapping-the-homozygotes-genotype-frequencies-tp3886313p3886313.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to selectively sum rows [Beginner question]
Sorry, I attempted to paste the sample data but it must have been stripped out when I posted. It is hopefully now listed below. tapply looks useful. I will check it out further. Here's the sample data: > flights[1:10,] PASSENGERS DISTANCE ORIGIN ORIGIN_CITY_NAME ORIGIN_WAC DEST DEST_CITY_NAME DEST_WAC YEAR 1 17266 5995LAXLos Angeles, CA 91 ICN Seoul, South Korea 778 2010 2 16934 5995LAXLos Angeles, CA 91 ICN Seoul, South Korea 778 2010 3 15470 5995LAXLos Angeles, CA 91 ICN Seoul, South Korea 778 2010 4 13997 5995ICN Seoul, South Korea778 LAXLos Angeles, CA 91 2010 5 13738 5995LAXLos Angeles, CA 91 ICN Seoul, South Korea 778 2010 6 13682 5995LAXLos Angeles, CA 91 ICN Seoul, South Korea 778 2010 7 13187 5995ICN Seoul, South Korea778 LAXLos Angeles, CA 91 2010 8 13051 5995LAXLos Angeles, CA 91 ICN Seoul, South Korea 778 2010 9 12761 1940SPN Saipan, TT 5 ICN Seoul, South Korea 778 2010 10 12419 5995ICN Seoul, South Korea778 LAXLos Angeles, CA 91 2010 Thanks, Aaron -Original Message- From: jim holtman [mailto:jholt...@gmail.com] Sent: Monday, October 24, 2011 11:58 AM To: asindc Cc: r-help@r-project.org Subject: Re: [R] How to selectively sum rows [Beginner question] It would be good to follow the posting guide and at least supply a sample of the data. Most likely 'tapply' is one way of doing it: tapply(df$passenger, list(df$orig, df$dest), sum) On Mon, Oct 24, 2011 at 11:27 AM, asindc wrote: > Hi, I am new to R so I would appreciate any help. I have some data that has > passenger flight data between city pairs. The way I got the data, there are > multiple rows of data for each city pair; the number of passengers needs to > be summed to get a TOTAL annual passenger count for each city pair. > > So my question is: how do I create a new table (or data frame) that > selectively sums > > My initial thought would be to iterate through each row with the following > logic: > > 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add > them to the table > 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the > passengers (and do not add a new row) > > Is this logical? If so, I think I just need some help on syntax (or do I use > a script?). Thanks. > > The first few rows of data look like this: > > > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question- tp3933512p3933512.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to selectively sum rows [Beginner question]
The count() function in the plyr package works beautifully. Thanks to Jim, Rainer and Dennis for your help. Best. -Original Message- From: Dennis Murphy [mailto:djmu...@gmail.com] Sent: Monday, October 24, 2011 12:05 PM To: asindc Cc: r-help@r-project.org Subject: Re: [R] How to selectively sum rows [Beginner question] See the count() function in the plyr package; it does fast summation. Something like library('plyr') count(passengerData, c('ORIGIN_WAC', 'DEST_WAC'), 'npassengers') HTH, Dennis On Mon, Oct 24, 2011 at 8:27 AM, asindc wrote: > Hi, I am new to R so I would appreciate any help. I have some data that has > passenger flight data between city pairs. The way I got the data, there are > multiple rows of data for each city pair; the number of passengers needs to > be summed to get a TOTAL annual passenger count for each city pair. > > So my question is: how do I create a new table (or data frame) that > selectively sums > > My initial thought would be to iterate through each row with the following > logic: > > 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add > them to the table > 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the > passengers (and do not add a new row) > > Is this logical? If so, I think I just need some help on syntax (or do I use > a script?). Thanks. > > The first few rows of data look like this: > > > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question- tp3933512p3933512.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dependency-aware scripting tools for R
shameless self-plug: we break out of R to do this, and after many painful years developing and maintaining idiosyncratic Makefiles, we are now using Taverna to (visually) glue together UNIX commands (including R scripts) -- the benefits of which (over make and brethren) is that you can actually *see* the dependencies and overall workflow (nesting workflows also makes it easier to manage complexity). see TavernaPBS: http://cphg.virginia.edu/mackey/projects/sequencing-pipelines/tavernapbs/ while designed to automate job submission to a PBS queuing system, you can also use it to simply execute non-PBS jobs. -- Aaron J. Mackey, PhD Assistant Professor Center for Public Health Genomics University of Virginia amac...@virginia.edu http://www.cphg.virginia.edu/mackey On Thu, Apr 19, 2012 at 3:27 PM, Sean Davis wrote: > There are numerous tools like scons, make, ruffus, ant, rake, etc. > that can be used to build complex pipelines based on task > dependencies. These tools are written in a variety of languages, but > I have not seen such a thing for R. Is anyone aware of a package > available? The goal is to be able to develop robust bioinformatic > pipelines driven by scripts written in R. > > Thanks, > Sean > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MLE Estimation of Gamma Distribution Parameters for data with 'zeros'
Greetings, all I am having difficulty getting the fitdistr() function to return without an error on my data. Specifically, what I'm trying to do is get a parameter estimation for fracture intensity data in a well / borehole. Lower bound is 0 (no fractures in the selected data interval), and upper bound is ~ 10 - 50, depending on what scale you are conducting the analysis on. I read in the data from a text file, convert it to numerics, and then calculate initial estimates of the shape and scale parameters for the gamma distribution from moments. I then feed this back into the fitdistr() function. R code (to this point): data.raw=c(readLines("FSM_C_9m_ENE.inp")) data.num <- as.numeric(data.raw) data.num library(MASS) shape.mom = ((mean(data.num))/ (sd(data.num))^2 shape.mom med.data = mean(data.num) sd.data = sd(data.num) med.data sd.data shape.mom = (med.data/sd.data)^2 shape.mom scale.mom = (sd.data^2)/med.data scale.mom fitdistr(data.num,"gamma",list(shape=shape.mom, scale=scale.mom),lower=0) fitdistr() returns the following error: " Error in optim(x = c(0.402707037, 0.40348, 0.404383704, 2.432626667, : L-BFGS-B needs finite values of 'fn'" Next thing I tried was to manually specify the negative log-likelihood function and pass it straight to mle() (the method specified in Ricci's tutorial on fitting distributions with R). Basically, I got the same result as using fitdistr(). Finally I tried using some R code I found from someone with a similar problem back in 2003 from the archives of this mailing list: R code gamma.param1 <- shape.mom gamma.param2 <- scale.mom log.gamma.param1 <- log(gamma.param1) log.gamma.param2 <- log(gamma.param2) gammaLoglik <- function(params, negative=TRUE){ lglk <- sum(dgamma(data, shape=exp(params[1]), scale=exp(params[2]), log=TRUE)) if(negative) return(-lglk) else return(lglk) } optim.list <- optim(c(log.gamma.param1, log.gamma.param2), gammaLoglik) gamma.param1 <- exp(optim.list$par[1]) gamma.param2 <- exp(optim.list$par[2]) # If I test this function using my sample data and the estimates of shape and scale derived from the method of moments, gammaLogLike returns as INF. I suspect the problem is that the zeros in the data are causing the optim solver problems when it attempts to minimize the negative log-likelihood function. Can anyone suggest some advice on a work-around? I have seen suggestions online that a 'censoring' algorithm can allow one to use MLE methods to estimate the gamma distribution for data with zero values (Wilkes, 1990, Journal of Climate). I have not, however, found R code to implement this, and, frankly, am not smart enough to do it myself... :-) Any suggestions? Has anyone else run up against this and written code to solve the problem? Thanks in advance! Aaron Fox Senior Project Geologist, Golder Associates +1 425 882 5484 || +1 425 736 3958 (mobile) [EMAIL PROTECTED] || www.fracturedreservoirs.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Types in grouped multi-panel (lattice) xyplot
Apologetic prologue: I've looked through the mailing list for an answer to this (since I'm sure it's trivial) but I have not been able to find a fix. So the problem is that I want each group to have a different type of plot. "Probes" should be points and "Segments" should be lines (preferably using the segment plot command, but I've just been trying -- unsuccessfully -- to get lines to work). To be exact, the data looks like: loc val valtype mouse 1428 0.1812367 Probes 2 1439 -0.4534155 Probes 2 1499 -0.4957303 Probes 2 1559 0.2448838 Probes 2 1611 -0.2030937 Probes 2 1788 -0.2235331 Probes 2 1428 0.5Segment 2 1439 0.5Segment 2 1499 0.5Segment 2 1559 0.5Segment 2 1611 0.5Segment 2 1788 0.5Segment 2 1428 0.1812367 Probes 1 1439 -0.4534155 Probes 1 1499 -0.4957303 Probes 1 1559 0.2448838 Probes 1 1611 -0.2030937 Probes 1 1788 -0.2235331 Probes 1 1428 0.5Segment 1 1439 0.5Segment 1 1499 0.5Segment 1 1559 0.1Segment 1 1611 0.1Segment 1 1788 0.1Segment 1 * loc is the x-axis location * val is the y-axis value * valtype is equal to "which" had I been smart and used make.groups * mouse is the 'cond' variable The plot command I'm currently using is, xyplot(val ~ loc | mouse, data = df, groups=valtype aspect=0.5, layout=c(3,3), lty=0, lwd=3, type="p", col=c("black", "blue"), as.table = TRUE) which gives me black and blue points for the probes/segments (I've infered alphabetical order for the groups colors). When I change the type to c("p", "l"), I get xyplot(val ~ loc | mouse, data = df, groups=valtype aspect=0.5, layout=c(3,3), lty=0, lwd=3, type=c("p","l"), col=c("black", "blue"), as.table = TRUE) I get the exact same plot. I've tried using a few of the panel functions I found on the list (I was particularly hopeful for http://tolstoy.newcastle.edu.au/R/help/06/07/30363.html) but I've either been misusing them, or they are not right for what I want to do. If anyone knows how to get points and lines in the same panel for the two different groups (probes/segments), I would love to hear about it. If you further know how to use the 'segment' plot in panels for the segments, I would really love to hear about it. Thanks in advance! Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Types in grouped multi-panel (lattice) xyplot
On Thu, 10 Apr 2008, Deepayan Sarkar wrote: > On 4/10/08, Deepayan Sarkar <[EMAIL PROTECTED]> wrote: >> On 4/10/08, Aaron Arvey <[EMAIL PROTECTED]> wrote: >> > Apologetic prologue: I've looked through the mailing list for an answer to >> > this (since I'm sure it's trivial) but I have not been able to find a fix. >> > >> > So the problem is that I want each group to have a different type of plot. >> > "Probes" should be points and "Segments" should be lines (preferably using >> > the segment plot command, but I've just been trying -- unsuccessfully -- >> > to get lines to work). >> > >> > To be exact, the data looks like: >> > >> > loc val valtype mouse >> > 1428 0.1812367 Probes 2 >> > 1439 -0.4534155 Probes 2 >> > 1499 -0.4957303 Probes 2 >> > 1559 0.2448838 Probes 2 >> > 1611 -0.2030937 Probes 2 >> > 1788 -0.2235331 Probes 2 >> > 1428 0.5Segment 2 >> > 1439 0.5Segment 2 >> > 1499 0.5Segment 2 >> > 1559 0.5Segment 2 >> > 1611 0.5Segment 2 >> > 1788 0.5Segment 2 >> > 1428 0.1812367 Probes 1 >> > 1439 -0.4534155 Probes 1 >> > 1499 -0.4957303 Probes 1 >> > 1559 0.2448838 Probes 1 >> > 1611 -0.2030937 Probes 1 >> > 1788 -0.2235331 Probes 1 >> > 1428 0.5Segment 1 >> > 1439 0.5Segment 1 >> > 1499 0.5Segment 1 >> > 1559 0.1Segment 1 >> > 1611 0.1Segment 1 >> > 1788 0.1Segment 1 >> > >> > >> >* loc is the x-axis location >> >* val is the y-axis value >> >* valtype is equal to "which" had I been smart and used make.groups >> >* mouse is the 'cond' variable >> > >> > >> > The plot command I'm currently using is, >> > >> > xyplot(val ~ loc | mouse, data = df, >> > groups=valtype >> > aspect=0.5, layout=c(3,3), >> > lty=0, lwd=3, type="p", >> > col=c("black", "blue"), >> > as.table = TRUE) >> > >> > which gives me black and blue points for the probes/segments (I've infered >> > alphabetical order for the groups colors). When I change the type to >> > c("p", "l"), I get >> > >> > xyplot(val ~ loc | mouse, data = df, >> > groups=valtype >> > aspect=0.5, layout=c(3,3), >> > lty=0, lwd=3, type=c("p","l"), >> > col=c("black", "blue"), >> > as.table = TRUE) >> >> >> Try >> >> >> xyplot(val ~ loc | mouse, data = df, >> >>groups=valtype, >>type=c("p","l"), >>## distribute.type = TRUE, > > Sorry, that should be > > distribute.type = TRUE, > >>col=c("black", "blue")) That did exactly what I was looking for! I now have a very nice lattice plot with points and lines! >> > I get the exact same plot. I've tried using a few of the panel functions >> > I found on the list (I was particularly hopeful for >> > http://tolstoy.newcastle.edu.au/R/help/06/07/30363.html) but I've either >> > been misusing them, or they are not right for what I want to do. >> > >> > If anyone knows how to get points and lines in the same panel for the two >> > different groups (probes/segments), I would love to hear about it. >> > >> > If you further know how to use the 'segment' plot in panels for the >> > segments, I would really love to hear about it. >> >> >> Well, panel.segments() draws segments, but you need your data in the >> form (x1, y1, x2, y2) for that. With your setup, it's probably easier >> to have lines with some NA-s inserted wherever you want line breaks. That works perfectly! I was just planning on reformating the data, but this makes life even easier! Thanks! Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] variable names when using S3 methods
I'm seeing some funny behavior when using methods (the older S3 type) and having variables that start with the same letter. I have a vague recollection of reading something about this once but now can't seem to find anything in the documentation. Any explanation, or a link to the proper documentation, if it does exist, would be appreciated. Thanks, Aaron Rendahl University of Minnesota School of Statistics # set up two function that both use method "foo" but with different variable names fooA<-function(model,...) UseMethod("foo") fooB<-function(Bmodel,...) UseMethod("foo") # now set up two methods (default and character) that have an additional variable foo.character <- function(model, m=5,...) cat("foo.character: m is", m, "\n") foo.default <- function(model, m=5,...) cat("foo.default: m is", m, "\n") # both of these use foo.character, as expected fooA("hi") fooB("hi") # but here, fooA uses foo.default instead fooA("hi",m=1) fooB("hi",m=1) # additionally, these use foo.character, as expected fooA("hi",1) fooA(model="hi",m=1) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R crash using rpanel on mac os x
Hello, I've recently discovered a persistent issue with rpanel when running R.app (2.6.1) on Mac OS X 10.4.11. tcltk and rpanel load without any apparent error, and the interactive panels appear to work as expected, however upon closing the panels rpanel has created I get catastrophic errors and R crashes completely. For the most part R manages to crash with dignity and work can be saved, but sometimes it will crash straight out. Below is an example of an entire work session (only base packages loaded) with the crash at the end typical of those encountered: > library(tcltk) Loading Tcl/Tk interface ... done > library(rpanel) Package `rpanel', version 1.0-4 type help(rpanel) for summary information > density.draw <- function(panel) { + plot(density(panel$x, bw = panel$h)) + panel + } > panel <- rp.control(x = rnorm(50)) > rp.slider(panel, h, 0.5, 5, log = TRUE, action = density.draw) *** caught bus error *** address 0x0, cause 'non-existent physical address' Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace All packages that are required are up to date, and I can find no evidence of similar issues from searching the mailing lists. Any suggestions would be appreciated. Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with lm and multiple linear regression?
Hello, I'm new to R, but I've read the intro to R and successfully connected it to an instance of mysql. I'm trying to perform multiple linear regression, but I'm having trouble using the lm function. To start, I have read in a simply y matrix of values(dependent variable) and x matrix of independent variables. It says both are data frames, but lm is giving me an error that my y variable is a list. Any suggestions on how to do this? It's not clear to me what the problem is as they're both data frames. My actual problem will use a much wider matrix of coefficients, I've only included two for illustration. Additionally, I'd actually like to weight the observations. How would I go about doing that? I also have that as a separate column vector. Thanks, Aaron Here's my session: > margin margin 166.67 2 -58.33 3 100.00 4 -33.33 5 200.00 6 -83.33 7 -100.00 8 0.00 9 100.00 10 -18.18 11 -55.36 12 -125.00 13 -33.33 14 -200.00 150.00 16 -100.00 17 75.00 180.00 19 -200.00 20 35.71 21 100.00 22 50.00 23 -86.67 24 165.00 > personcoeff Person1 Person2 1 -1 1 2 -1 1 3 -1 1 4 -1 1 5 -1 1 6 -1 1 70 0 80 0 90 1 10 -1 1 11 -1 1 12 -1 1 13 -1 1 14 -1 0 15 0 0 16 0 0 17 0 1 18 -1 1 19 -1 1 20 -1 1 21 -1 1 22 -1 1 23 -1 1 24 -1 1 > class(margin) [1] "data.frame" > class(personcoeff) [1] "data.frame" > lm(margin~personcoeff) Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid type (list) for variable 'margin' Be a better friend, newshound, and [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with lm and multiple linear regression? (Plain Text version)
(Apologies the previous version was sent as rich text) Hello, I'm new to R, but I've read the intro to R and successfully connected it to an instance of mysql. I'm trying to perform multiple linear regression, but I'm having trouble using the lm function. To start, I have read in a simply y matrix of values(dependent variable) and x matrix of independent variables. It says both are data frames, but lm is giving me an error that my y variable is a list. Any suggestions on how to do this? It's not clear to me what the problem is as they're both data frames. My actual problem will use a much wider matrix of coefficients, I've only included two for illustration. Additionally, I'd actually like to weight the observations. How would I go about doing that? I also have that as a separate column vector. Thanks, Aaron Here's my session: > margin margin 166.67 2 -58.33 3 100.00 4 -33.33 5 200.00 6 -83.33 7 -100.00 8 0.00 9 100.00 10 -18.18 11 -55.36 12 -125.00 13 -33.33 14 -200.00 150.00 16 -100.00 17 75.00 180.00 19 -200.00 20 35.71 21 100.00 22 50.00 23 -86.67 24 165.00 > personcoeff Person1 Person2 1 -1 1 2 -1 1 3 -1 1 4 -1 1 5 -1 1 6 -1 1 70 0 80 0 90 1 10 -1 1 11 -1 1 12 -1 1 13 -1 1 14 -1 0 15 0 0 16 0 0 17 0 1 18 -1 1 19 -1 1 20 -1 1 21 -1 1 22 -1 1 23 -1 1 24 -1 1 > class(margin) [1] "data.frame" > class(personcoeff) [1] "data.frame" > lm(margin~personcoeff) Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : invalid type (list) for variable 'margin' Be a better friend, newshound, and __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with lm and multiple linear regression? (Plain Text version)
Tim (and others who responded privately), Thanks for the help, this approach did work. I have also reread ?lm a little more closely, I do see the weights functionality. I have one last question: Now that I understand how to call this function and review the results, I want to extend it to my much larger real problem, with 100s of columns. Is there a way to call the function in more of a matrix algebra syntax, where I would list the matrix(e.g. personcoeff) rather than the individual column names? It seems like I might need to use lm.wfit, but per the help I'd rather use lm. Thanks, Aaron - Original Message From: Tim Calkins <[EMAIL PROTECTED]> To: Aaron Barzilai <[EMAIL PROTECTED]> Cc: r-help@r-project.org Sent: Thursday, December 27, 2007 6:55:57 PM Subject: Re: [R] Help with lm and multiple linear regression? (Plain Text version) consider merging everything into a singe dataframe. i haven't tried it, but something like the following could work: > reg.data <- cbind(margin, personcoeff) > names(reg.data) <- c('margin', 'p1', 'p2') > lm(margin~p1+p2, data = reg.data) the idea here is that by specifying the data frame with the data argument in lm, R looks for the columns of the names specified in the formula. for weights, see ?lm and look for the weights argument. cheers, tc On Dec 28, 2007 10:22 AM, Aaron Barzilai <[EMAIL PROTECTED]> wrote: > (Apologies the previous version was sent as rich text) > > Hello, > I'm new to R, but I've read the intro to R and successfully connected it to > an instance of mysql. I'm trying to perform multiple linear regression, but > I'm having trouble using the lm function. To start, I have read in a simply > y matrix of values(dependent variable) and x matrix of independent variables. > It says both are data frames, but lm is giving me an error that my y > variable is a list. > > Any suggestions on how to do this? It's not clear to me what the problem is > as they're both data frames. My actual problem will use a much wider matrix > of coefficients, I've only included two for illustration. > > Additionally, I'd actually like to weight the observations. How would I go > about doing that? I also have that as a separate column vector. > > Thanks, > Aaron > > Here's my session: > > margin >margin > 166.67 > 2 -58.33 > 3 100.00 > 4 -33.33 > 5 200.00 > 6 -83.33 > 7 -100.00 > 80.00 > 9 100.00 > 10 -18.18 > 11 -55.36 > 12 -125.00 > 13 -33.33 > 14 -200.00 > 150.00 > 16 -100.00 > 17 75.00 > 180.00 > 19 -200.00 > 20 35.71 > 21 100.00 > 22 50.00 > 23 -86.67 > 24 165.00 > > personcoeff >Person1 Person2 > 1 -1 1 > 2 -1 1 > 3 -1 1 > 4 -1 1 > 5 -1 1 > 6 -1 1 > 70 0 > 80 0 > 90 1 > 10 -1 1 > 11 -1 1 > 12 -1 1 > 13 -1 1 > 14 -1 0 > 15 0 0 > 16 0 0 > 17 0 1 > 18 -1 1 > 19 -1 1 > 20 -1 1 > 21 -1 1 > 22 -1 1 > 23 -1 1 > 24 -1 1 > > class(margin) > [1] "data.frame" > > class(personcoeff) > [1] "data.frame" > > lm(margin~personcoeff) > Error in model.frame(formula, rownames, variables, varnames, extras, > extranames, : >invalid type (list) for variable 'margin' > > > > > Be a better friend, newshound, and > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Tim Calkins 0406 753 997 Be a better friend, newshound, and __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gumbell distribution - minimum case
If you mean you want an EVD with a fat left tail (instead of a fat right tail), then can;t you just multiply all the values by -1 to "reverse" the distribution? A new location parameter could then shift the distribution wherever you want along the number line ... -Aaron On Mon, Sep 8, 2008 at 5:22 PM, Richard Gwozdz <[EMAIL PROTECTED]> wrote: > Hello, > > I would like to sample from a Gumbell (minimum) distribution. I have > installed package {evd} but the Gumbell functions there appear to refer to > the maximum case. Unfortunately, setting the scale parameter negative does > not appear to work. > > Is there a separate package for the Gumbell minimum? > > > -- > _ > Rich Gwozdz > Fire and Mountain Ecology Lab > College of Forest Resources > University of Washington > cell: 206-769-6808 office: 206-543-9138 > [EMAIL PROTECTED] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
I would load your set of userid's into a temporary table in oracle, then join that table with the rest of your SQL query to get only the matching rows out. -Aaron On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony <[EMAIL PROTECTED]> wrote: > > Dear R list, > > What is the best way to efficiently marry an R dataset with a very large > (Oracle) database table? > > The goal is to only return Oracle table rows that match IDs present in the R > dataset. > I have an R data frame with 2000 user IDs analogous to: r = > data.frame(userid=round(runif(2000)*10,0)) > > ...and I need to pull data from an Oracle table only for these 2000 IDs. The > Oracle table is quite large. Additionally, the sql query may need to join to > other tables to bring in ancillary fields. > > I currently connect to Oracle via odbc: > > library(RODBC) > connection <- odbcConnect("", uid="", pwd="") > d = sqlQuery(connection, "select userid, x, y, z from largetable where > timestamp > sysdate -7") > > ...allowing me to pull data from the database table into the R object "d" and > then use the R merge function. The problem however is that if "d" is too > large it may fail due to memory limitations or be inefficient. I would like > to push the merge portion to the database and it would be very convenient if > it were possible to request that the query look to the R object for the ID's > to which it should restrict the output. > > Is there a way to do this? > Something like the following fictional code: > d = sqlQuery(connection, "select t.userid, x, y, z from largetable t where > r$userid=t.userid") > > Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? > This would be convenient and help me avoid needing to create a temporary > table to store the R data, join via sql, then return the data back to R. > > I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . > Thanks for your comments, ideas, recommendations. > > > -Avram > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
Sorry, I see now you want to avoid this, but you did ask what was the "best way to efficiently ...", and the temp. table solution certainly matches your description. What's wrong with using a temporary table? -Aaron On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey <[EMAIL PROTECTED]> wrote: > I would load your set of userid's into a temporary table in oracle, > then join that table with the rest of your SQL query to get only the > matching rows out. > > -Aaron > > On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony <[EMAIL PROTECTED]> wrote: >> >> Dear R list, >> >> What is the best way to efficiently marry an R dataset with a very large >> (Oracle) database table? >> >> The goal is to only return Oracle table rows that match IDs present in the R >> dataset. >> I have an R data frame with 2000 user IDs analogous to: r = >> data.frame(userid=round(runif(2000)*10,0)) >> >> ...and I need to pull data from an Oracle table only for these 2000 IDs. >> The Oracle table is quite large. Additionally, the sql query may need to >> join to other tables to bring in ancillary fields. >> >> I currently connect to Oracle via odbc: >> >> library(RODBC) >> connection <- odbcConnect("", uid="", pwd="") >> d = sqlQuery(connection, "select userid, x, y, z from largetable where >> timestamp > sysdate -7") >> >> ...allowing me to pull data from the database table into the R object "d" >> and then use the R merge function. The problem however is that if "d" is >> too large it may fail due to memory limitations or be inefficient. I would >> like to push the merge portion to the database and it would be very >> convenient if it were possible to request that the query look to the R >> object for the ID's to which it should restrict the output. >> >> Is there a way to do this? >> Something like the following fictional code: >> d = sqlQuery(connection, "select t.userid, x, y, z from largetable t where >> r$userid=t.userid") >> >> Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? >> This would be convenient and help me avoid needing to create a temporary >> table to store the R data, join via sql, then return the data back to R. >> >> I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . >> Thanks for your comments, ideas, recommendations. >> >> >> -Avram >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
I guess I'd do it something like this: dbGetQuery(con, "CREATE TEMPORARY TABLE foo ( etc etc)") sapply(@userids, function (x) { dbGetQuery(con, paste("INSERT INTO foo (userid) VALUES (", x, ")")) }) then later: dbGetQuery(con, "DROP TABLE foo"); -Aaron On Thu, Sep 11, 2008 at 3:21 PM, Avram Aelony <[EMAIL PROTECTED]> wrote: > > Perhaps I will need to create a temp table, but I am asking if there is a way > to avoid it. It would be great if there were a way to tie the R data frame > temporarily to the query in a transparent fashion. If not, I will see if I > can create/drop the temp table directly from sqlQuery. > -Avram > > > > On Thursday, September 11, 2008, at 12:07PM, "Aaron Mackey" <[EMAIL > PROTECTED]> wrote: >>Sorry, I see now you want to avoid this, but you did ask what was the >>"best way to efficiently ...", and the temp. table solution certainly >>matches your description. What's wrong with using a temporary table? >> >>-Aaron >> >>On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey <[EMAIL PROTECTED]> wrote: >>> I would load your set of userid's into a temporary table in oracle, >>> then join that table with the rest of your SQL query to get only the >>> matching rows out. >>> >>> -Aaron >>> >>> On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony <[EMAIL PROTECTED]> wrote: >>>> >>>> Dear R list, >>>> >>>> What is the best way to efficiently marry an R dataset with a very large >>>> (Oracle) database table? >>>> >>>> The goal is to only return Oracle table rows that match IDs present in the >>>> R dataset. >>>> I have an R data frame with 2000 user IDs analogous to: r = >>>> data.frame(userid=round(runif(2000)*10,0)) >>>> >>>> ...and I need to pull data from an Oracle table only for these 2000 IDs. >>>> The Oracle table is quite large. Additionally, the sql query may need to >>>> join to other tables to bring in ancillary fields. >>>> >>>> I currently connect to Oracle via odbc: >>>> >>>> library(RODBC) >>>> connection <- odbcConnect("", uid="", pwd="") >>>> d = sqlQuery(connection, "select userid, x, y, z from largetable where >>>> timestamp > sysdate -7") >>>> >>>> ...allowing me to pull data from the database table into the R object "d" >>>> and then use the R merge function. The problem however is that if "d" is >>>> too large it may fail due to memory limitations or be inefficient. I >>>> would like to push the merge portion to the database and it would be very >>>> convenient if it were possible to request that the query look to the R >>>> object for the ID's to which it should restrict the output. >>>> >>>> Is there a way to do this? >>>> Something like the following fictional code: >>>> d = sqlQuery(connection, "select t.userid, x, y, z from largetable t where >>>> r$userid=t.userid") >>>> >>>> Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, >>>> how? This would be convenient and help me avoid needing to create a >>>> temporary table to store the R data, join via sql, then return the data >>>> back to R. >>>> >>>> I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . >>>> Thanks for your comments, ideas, recommendations. >>>> >>>> >>>> -Avram >>>> >>>> __ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >> >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] XML package help
Please consider this: http://www.w3.org/2001/XMLSchema-instance"; > ./XYZ 10 ./XYZ/ I am attempting to use XML package and xpathSApply() to extract, say, the eValue attribute for eName=='0ne' for all nodes that have ==10. I try the following, amoung several things: doc<-xmlInternalTreeParse(Manifest) Root = xmlRoot(doc) xpathSApply(Root, "//File[FileTypeId=10]/PatientCharacteristics/[...@ename='one']", xmlAttrs) and it does not work. Might somebody help me with the syntax here? Thanks a lot!! Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML package help
Thanks! Works like a charm. -Aaron From: Duncan Temple Lang [dun...@wald.ucdavis.edu] Sent: Friday, January 23, 2009 6:48 PM To: Skewes,Aaron Cc: r-help@r-project.org Subject: Re: [R] XML package help Skewes,Aaron wrote: > Please consider this: > > http://www.w3.org/2001/XMLSchema-instance"; > > > > ./XYZ > > > 10 > ./XYZ/ > > > > > > > > I am attempting to use XML package and xpathSApply() to extract, say, the > eValue attribute for eName=='0ne' for all nodes that have > ==10. I try the following, amoung several things: > getNodeSet(doc, "//File[FileTypeId/text()='10']/patientcharacteristi...@ename='one']/@eValue") should do it. You need to compare the text() of the FileTypeId node. And the / after the PatientCharacterstics and before the [] will cause trouble. HTH, D. > doc<-xmlInternalTreeParse(Manifest) > Root = xmlRoot(doc) > xpathSApply(Root, > "//File[FileTypeId=10]/PatientCharacteristics/[...@ename='one']", xmlAttrs) > > and it does not work. > > Might somebody help me with the syntax here? > > Thanks a lot!! > Aaron > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] XML package- accessing nodes based on attributes
Hi, I have a rather complex xml document that I am attempting to parse based on attributes: http://www.w3.org/2001/XMLSchema-instance";> D:\CN_data\Agilent\Results\ File> My requirement is to access eValues at each node based on FileTypeId. For example: How can I get the eValue of eName="PatientReference" for all Type="Patient" ,where the ? i.e. "TCGA-06-0875-01A" and "TCGA-06-0875-02A" For the life of me, I can not get this to work! Thanks, -Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] an S idiom for ordering matrix by columns?
There's got to be a better way to use order() on a matrix than this: > y 2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4 3L-173-2 3981 1 221 12 2 8571 1 221 22 2 9111 1 221 22 2 3831 1 221 12 2 6391 2 212 21 2 7561 2 212 21 2 3L-186-1 3R-013-7 3R-032-1 3R-169-10 X-002 X-087 398122 2 1 2 857122 2 1 2 911122 2 1 2 383122 2 1 2 639221 2 1 2 756221 2 1 2 > y[order(y[,1],y[,2],y[,3],y[,4],y[,5],y[,6],y[,7],y[,8],y[,9],y[,10],y[,11],y[,12],y[,13],y[,14]),] 2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4 3L-173-2 3981 1 221 12 2 3831 1 221 12 2 8571 1 221 22 2 9111 1 221 22 2 6391 2 212 21 2 7561 2 212 21 2 3L-186-1 3R-013-7 3R-032-1 3R-169-10 X-002 X-087 398122 2 1 2 383122 2 1 2 857122 2 1 2 911122 2 1 2 639221 2 1 2 756221 2 1 2 Thanks for any suggestions! -Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] an S idiom for ordering matrix by columns?
Thanks to all, "do.call(order, as.data.frame(y))" was the idiom I was missing! -Aaron On Thu, Feb 19, 2009 at 11:52 AM, Gustaf Rydevik wrote: > On Thu, Feb 19, 2009 at 5:40 PM, Aaron Mackey wrote: > > There's got to be a better way to use order() on a matrix than this: > > > >> y > >2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4 > > 3L-173-2 > > 3981 1 221 12 > > 2 > > 8571 1 221 22 > > 2 > > 9111 1 221 22 > > 2 > > 3831 1 221 12 > > 2 > > 6391 2 212 21 > > 2 > > 7561 2 212 21 > > 2 > >3L-186-1 3R-013-7 3R-032-1 3R-169-10 X-002 X-087 > > 398122 2 1 2 > > 857122 2 1 2 > > 911122 2 1 2 > > 383122 2 1 2 > > 639221 2 1 2 > > 756221 2 1 2 > > > >> > > > y[order(y[,1],y[,2],y[,3],y[,4],y[,5],y[,6],y[,7],y[,8],y[,9],y[,10],y[,11],y[,12],y[,13],y[,14]),] > >2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4 > > 3L-173-2 > > 3981 1 221 12 > > 2 > > 3831 1 221 12 > > 2 > > 8571 1 221 22 > > 2 > > 9111 1 221 22 > > 2 > > 6391 2 212 21 > > 2 > > 7561 2 212 21 > > 2 > >3L-186-1 3R-013-7 3R-032-1 3R-169-10 X-002 X-087 > > 398122 2 1 2 > > 383122 2 1 2 > > 857122 2 1 2 > > 911122 2 1 2 > > 639221 2 1 2 > > 756221 2 1 2 > > > > Thanks for any suggestions! > > > > -Aaron > > > > > You mean something like this: > > test<-matrix(sample(1:4,100,replace=T),ncol=10) > > test[do.call(order,data.frame(test)),] > > ? > > Regards, > > Gustaf > > > -- > Gustaf Rydevik, M.Sci. > tel: +46(0)703 051 451 > address:Essingetorget 40,112 66 Stockholm, SE > skype:gustaf_rydevik > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with R and MySQL
Hello, This forum has been very helpful to me in the past, and I've run out of ideas on how to solve my problem. I had been using R and MySQL (and Perl) together for quite some time successfully on my Windows XP machine. However, I recently had some problems with MySQL (the ibdata file had become 35GB on my hard drive, turns out it's a known bug with InnoDB), and ultimately the way I fixed my problem with MySQL was to upgrade it. It's working fine now, I can use MySQL however I'd like. I'm sticking to MyISAM tables for now, though. However, I had set up my system so I did a linear regression in R. Originally, this was done in R 2.5.0, I would load in the tables from MySQL to R and then conduct the regression in R. However, after solving my MySQL problem, I ran into a strange error in R (and DBI/RMySQL). R connected to the database just fine, and I could even show the tables in the database and load two of them into R. However, the tables I loaded successfully were only a single column. Every time I tried to load in a recordset that was multiple columns, I got a relatively nondescript Windows error("R for Windows terminal front-end has encountered a problem and needs to close. We are sorry for the inconvenience."). To verify that it wasn't a memory issue, I even tried "rs <- dbSendQuery(con, "select 'a', 'b'")". This statement causes the error as well. I tried upgrading the packages, and upgrading R from 2.5.0 to 2.8.1. However, I still get the same errors. Has anyone run into this problem before? Any suggestions on how to solve it? Thanks in advance, Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with R and MySQL
Thanks Jeff, that was exactly the problem. When I unzipped the version at the page below for my version of MySQL (5.1), it worked fine. The version I downloaded through install.packages() must have been for 5.0. Thanks so much for the help and quick response, Aaron From: Jeffrey Horner Cc: R-help@r-project.org Sent: Monday, February 23, 2009 10:10:02 AM Subject: Re: [R] Help with R and MySQL Aaron Barzilai wrote: > Hello, > > This forum has been very helpful to me in the past, and I've run out of ideas > on how to solve my problem. > > I had been using R and MySQL (and Perl) together for quite some time > successfully on my Windows XP machine. However, I recently had some problems > with MySQL (the ibdata file had become 35GB on my hard drive, turns out it's > a known bug with InnoDB), and ultimately the way I fixed my problem with > MySQL was to upgrade it. It's working fine now, I can use MySQL however I'd > like. I'm sticking to MyISAM tables for now, though. > > However, I had set up my system so I did a linear regression in R. > Originally, this was done in R 2.5.0, I would load in the tables from MySQL > to R and then conduct the regression in R. However, after solving my MySQL > problem, I ran into a strange error in R (and DBI/RMySQL). R connected to > the database just fine, and I could even show the tables in the database and > load two of them into R. However, the tables I loaded successfully were only > a single column. Every time I tried to load in a recordset that was multiple > columns, I got a relatively nondescript Windows error("R for Windows terminal > front-end has encountered a problem and needs to close. We are sorry for the > inconvenience."). To verify that it wasn't a memory issue, I even tried "rs > <- dbSendQuery(con, "select 'a', 'b'")". This statement causes the error as > well. > > I tried upgrading the packages, and upgrading R from 2.5.0 to 2.8.1. > However, I still get the same errors. Has anyone run into this problem > before? Any suggestions on how to solve it? Hi Aaron, Be sure to read the details of the RMySQL web page: http://biostat.mc.vanderbilt.edu/RMySQL You need to make sure and match the version of your MySQL client library (not the running MySQL server) with the RMySQL binary that you choose from the web page above. Best, Jeff -- http://biostat.mc.vanderbilt.edu/JeffreyHorner [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ordering
Hello, I would like to order a matrix by a specific column. For instance: > test [,1] [,2] [,3] [1,]1 100 21 [2,]23 22 [3,]3 100 23 [4,]4 60 24 [5,]5 55 25 [6,]6 45 26 [7,]7 75 27 [8,]8 12 28 [9,]9 10 29 [10,] 10 22 30 > test[order(test[,2]),] [,1] [,2] [,3] [1,]23 22 [2,]9 10 29 [3,]8 12 28 [4,] 10 22 30 [5,]6 45 26 [6,]5 55 25 [7,]4 60 24 [8,]7 75 27 [9,]1 100 21 [10,]3 100 23 This works well and good in the above example matrix. However in the matrix that I actually want to sort (derived from a function that I wrote) I get something like this: > test[order(as.numeric(test[,2])),] ### First column is row.names f con f.1 cov f.2 minimum f.3 maximum f.4 cl asahi* 100 * 1 * 0.1 * 2 * test castet * 100 * 2 * 0.1 * 5 * test clado* 100 * 1 * 0.7 * 2 * test aulac* 33 * 0 * 0.1 * 0.1 * test buell* 33 * 0 * 0.1 * 0.1 * test camlas * 33 * 0 * 0.1 * 0.1 * test carbig * 33 * 1 * 1 * 1 * test poaarc * 67 * 0 * 0.1 * 0.1 * test polviv * 67 * 0 * 0.1 * 0.1 * test where R interprets 100 to be the lowest value and orders increasing from there. > is.numeric(test[,2]) [1] FALSE > is.double(test[,2]) [1] FALSE > is.integer(test[,2]) [1] FALSE > is.real(test[,2]) [1] FALSE My questions are: Why is this happening? and How do I fix it? Thanks in advance! Aaron Wells _ cns!503D1D86EBB2B53C!2285.entry?ocid=TXT_TAGLM_WL_UGC_Contacts_032009 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ordering
Thanks Peter, that did the trick. I'll modify my function so that the numeric conversion is done automatically thus saving me the extra step of converting later on. Aaron Wells > Subject: RE: [R] ordering > Date: Wed, 11 Mar 2009 08:41:50 +1300 > From: palsp...@hortresearch.co.nz > To: awell...@hotmail.com; r-help@r-project.org > > Kia ora Aaron > > As you have identified, test[,2] is not numeric - it is probably factor. > Your function must have made the conversion, so you may want to modify > that. Alternative, try: > > test[order(as.numeric(as.character(test[,2]))),] > > BTW, str(test) is a good way to find out more about the structure of > your object. > > HTH > > Peter Alspach > > > > > > -Original Message- > > From: r-help-boun...@r-project.org > > [mailto:r-help-boun...@r-project.org] On Behalf Of aaron wells > > Sent: Wednesday, 11 March 2009 8:30 a.m. > > To: r-help@r-project.org > > Subject: [R] ordering > > > > > > Hello, I would like to order a matrix by a specific column. > > For instance: > > > > > > > > > test > > [,1] [,2] [,3] > > [1,] 1 100 21 > > [2,] 2 3 22 > > [3,] 3 100 23 > > [4,] 4 60 24 > > [5,] 5 55 25 > > [6,] 6 45 26 > > [7,] 7 75 27 > > [8,] 8 12 28 > > [9,] 9 10 29 > > [10,] 10 22 30 > > > > > > > test[order(test[,2]),] > > [,1] [,2] [,3] > > [1,] 2 3 22 > > [2,] 9 10 29 > > [3,] 8 12 28 > > [4,] 10 22 30 > > [5,] 6 45 26 > > [6,] 5 55 25 > > [7,] 4 60 24 > > [8,] 7 75 27 > > [9,] 1 100 21 > > [10,] 3 100 23 > > > > > > This works well and good in the above example matrix. > > However in the matrix that I actually want to sort (derived > > from a function that I wrote) I get something like this: > > > > > > > > > test[order(as.numeric(test[,2])),] ### First column is row.names > > > > > > f con f.1 cov f.2 minimum f.3 maximum f.4 cl > > asahi * 100 * 1 * 0.1 * 2 * test > > castet * 100 * 2 * 0.1 * 5 * test > > clado * 100 * 1 * 0.7 * 2 * test > > aulac * 33 * 0 * 0.1 * 0.1 * test > > buell * 33 * 0 * 0.1 * 0.1 * test > > camlas * 33 * 0 * 0.1 * 0.1 * test > > carbig * 33 * 1 * 1 * 1 * test > > poaarc * 67 * 0 * 0.1 * 0.1 * test > > polviv * 67 * 0 * 0.1 * 0.1 * test > > > > > > > > > > where R interprets 100 to be the lowest value and orders > > increasing from there. > > > > > > > > > is.numeric(test[,2]) > > [1] FALSE > > > is.double(test[,2]) > > [1] FALSE > > > is.integer(test[,2]) > > [1] FALSE > > > is.real(test[,2]) > > [1] FALSE > > > > > > > > > > My questions are: Why is this happening? and How do I fix it? > > > > > > > > Thanks in advance! > > > > > > > > Aaron Wells > > > > _ > > > > > > cns!503D1D86EBB2B53C!2285.entry?ocid=TXT_TAGLM_WL_UGC_Contacts_032009 > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > The contents of this e-mail are confidential and may be subject to legal > privilege. > If you are not the intended recipient you must not use, disseminate, > distribute or > reproduce all or any part of this e-mail or attachments. If you have received > this > e-mail in error, please notify the sender and delete all material pertaining > to this > e-mail. Any opinion or views expressed in this e-mail are those of the > individual > sender and may not represent those of The New Zealand Institute for Plant and > Food Research Limited. _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] geometric mean of probability density functions
Hi, This is my first time posting to the mailing list, so if I'm doing something wrong, just let me know. I've taken ~1000 samples from 8 biological replicates, and I want to somehow combine the density functions of the replicates. Currently, I can plot the density function for each biological replicate, and I'd like to see how pool of replicates compares to a simulation I conducted earlier. I can compare each replicate to the simulation, but there's a fair amount of variability between replicates. I'd like to take the geometric mean of the density functions at each point along the x-axis, but when I compute: > a<-density(A[,1][A[,1]>=0], n=2^15) > b<-density(A[,3][A[,3]>=0], n=2^15) > a$x[1] [1] -70.47504 > b$x[1] [1] -69.28902 So I can't simply compute the mean across y-values, because the x-values don't match. Is there a way to set the x-values to be the same for multiple density plots? Also, there are no negative values in the dataset, so I'd like to bound the x-axis at 0 if at all possible? Is there a standard way to combine density functions? Thanks for the advice. -Aaron Spivak ps. I thought about just pooling all measurements, but I don't think that's appropriate because they are from different replicates and the smoothing kernel depends on the variance in the sample to calculate the distribution. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Automatic command line options
Hello users, Does anyone know how to turn off the automatic double quoting and bracketing on the command line that appeared in R 2.6.x (OS X). It's driving me nuts! Many thanks, Aaron. ---- M. Aaron MacNeil __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.table not clearing colClasses
I am attempting to created colClasses for several tables, then read only specific columns. There are two different table layouts that I am working with. If I use exclusively one layout, the script works perfectly, but if I mix the layouts, it fails to grab the correct columns form layout that is read in second. It appears that colClasses fails to adopt the new structure after the first iteration. Is there some way to clear colClasses of flush the write buffer between iterations? Thanks, Aaron for(i in 1:length(fullnames.in)) { cnames<- read.table(fullnames.in[i], header=FALSE, sep="\t", na.strings="", nrows=1, row.names = NULL , skip=9, fill=TRUE, quote="") #initialize col.classes to NULL vector seq(1,length(cnames))->column.classes column.classes[1:length(cnames)]="NULL" #find where the desired columns are idx<-which(cnames=="Row") column.classes[idx]="integer" idx<-which(cnames=="Col") column.classes[idx]="integer" idx<-which(cnames=="ControlType") column.classes[idx]="integer" idx<-which(cnames=="ProbeName") column.classes[idx]="character" idx<-which(cnames=="GeneName") column.classes[idx]="character" idx<-which(cnames=="SystematicName") column.classes[idx]="character" idx<-which(cnames=="LogRatio") column.classes[idx]="numeric" idx<-which(cnames=="gMeanSignal") column.classes[idx]="numeric" idx<-which(cnames=="rMeanSignal") column.classes[idx]="numeric" idx<-which(cnames=="gBGMeanSignal") column.classes[idx]="numeric" idx<-which(cnames=="rBGMeanSignal") column.classes[idx]="numeric" print(fullnames.in[i]) print("Reading file, this could take a few minutes") #read all rows of selected columns into data.frame d <- read.table(fullnames.in[1], header=TRUE, sep="\t", na.strings="", nrows=number.rows, colClasses=column.classes, row.names = NULL , skip=9, fill=TRUE, quote="") print("Writing file, this could take a few minutes") #write all rows of selected columns into file write.table(d, fullnames.out[i], sep="\t", quote=FALSE, row.names=FALSE) rm(cnames, column.classes, d, idx) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2D density tophat
Hello R users, I have successfully created a square (or more generally, rectangular) tophat smoothing routine based on altering the already available KDE2D. I would be keen to implement a circular tophat routine also, however this appears to be much more difficult to write efficiently (I have a routine, but it's very slow). I tried to create one based on using crossdist to create a distance matrix between my data and the sampling grid, but it doesn't take a particularly large amount of data (or hi res grid) for memory to be a big problem. The 2D density routines I have been able to find either don't support a simple tophat, or don't use the absolute distances between the sampling grid and the data. Should anyone know of more general 2D density routines that might support circular tophats, or know of a simple and efficient method for creating them, I would be very grateful. Aaron [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2D density tophat
Hello R users, I have successfully created a square (or more generally, rectangular) tophat smoothing routine based on altering the already available KDE2D. I would be keen to implement a circular tophat routine also, however this appears to be much more difficult to write efficiently (I have a routine, but it's very slow). I tried to create one based on using crossdist (in spatstat) to create a distance matrix between my data and the sampling grid, but it doesn't take a particularly large amount of data (or hi-res grid) for memory to be a big problem. The 2D density routines I have been able to find either don't support a simple tophat, or don't use the absolute distances between the sampling grid and the data. Should anyone know of more general 2D density routines that might support circular tophats, or know of a simple and efficient method for creating them, I would be very grateful. Thanks for your time, Aaron PS: I tried sending this on Friday originally, but as far as I know that didn't work, so should another post appear from me asking the same thing I apologise in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] making sense of posterior statistics in the deal package
Hello, I'm doing bayesian network analyses with the deal package. I am at a loss for how to interpret output from the analysis (i.e. what is a good score, what is a bad score, which stats tell me what about the network edges/nodes). Here is an example node with its posterior scores for all parent nodes. Conditional Posterior: Yp1| 3 4 5 6 9 11 12 15 18 [[1]] [[1]]$tau [,1][,2] [,3][,4] [,5][,6] [1,] 138.000 -201.944190 -61.827901 -29.5419149 11.7780877 -56.1691436 [2,] -201.9441898 379.014299 101.336606 49.2886631 -9.5976678 99.0119458 [3,] -61.8279013 101.336606 55.301879 18.3175413 0.4718180 31.7741275 [4,] -29.5419149 49.288663 18.317541 18.5074653 0.7297184 14.7963722 [5,] 11.7780877 -9.597668 0.471818 0.7297184 11.9705940 -0.1152971 [6,] -56.1691436 99.011946 31.774127 14.7963722 -0.1152971 33.0750507 [7,] 11.8398168 -11.819652 2.372613 2.4241871 8.3525307 -0.5909911 [8,] -15.8233513 27.136706 13.261521 10.3380918 5.2238205 10.7721059 [9,] -63.0844071 112.477658 36.867027 18.7342207 1.8345119 32.6573681 [10,] -0.91256760.892410 3.995155 3.3759532 5.2495044 4.8010982 [,7] [,8] [,9] [,10] [1,] 11.8398168 -15.823351 -63.084407 -0.9125676 [2,] -11.8196521 27.136706 112.477658 0.8924099 [3,] 2.3726129 13.261521 36.867027 3.9951552 [4,] 2.4241871 10.338092 18.734221 3.3759532 [5,] 8.3525307 5.223821 1.834512 5.2495044 [6,] -0.5909911 10.772106 32.657368 4.8010982 [7,] 11.7576987 5.339882 1.364748 4.5801216 [8,] 5.3398823 17.269931 14.659995 6.8871204 [9,] 1.3647480 14.659995 43.586099 4.5549556 [10,] 4.5801216 6.887120 4.554956 11.1188844 [[1]]$phi [1] 5.395758 [[1]]$mu [1] -0.151400686 0.459786917 -0.091988847 -0.009952914 0.074523419 [6] 0.215198198 -0.010968581 -0.026347501 0.423837846 -0.018999184 [[1]]$rho [1] 147 Any help you can give me is greatly appreciated. Aaron Tarone __ Aaron Tarone Postdoctoral Research Associate Molecular and Computational Biology Program University of Southern California [EMAIL PROTECTED] (213) 740-3063 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2D density tophat
In case anyone other than me was interested, a pretty efficient circular tophat can be made using the fields function fields.rdist.near: CircHat=function (x, y, h=1, gridres = c((max(x)-min(x))/25,(max(y)- min(y))/25), lims = c(range(x), range(y)),density=FALSE) { require(fields) nx <- length(x) ny <- length(y) n=c(1+(lims[2]-lims[1])/gridres[1],1+(lims[4]-lims[3])/gridres[2]) if (length(y) != nx) stop("data vectors must be the same length") if (any(!is.finite(x)) || any(!is.finite(y))) stop("missing or infinite values in the data are not allowed") if (any(!is.finite(lims))) stop("only finite values are allowed in 'lims'") gx <- seq(lims[1], lims[2], by = gridres[1]) gy <- seq(lims[3], lims[4], by = gridres[2]) fullgrid=expand.grid(gx,gy) if (missing(h)) h <- c(bandwidth.nrd(x), bandwidth.nrd(y)) temp = table (fields .rdist .near (as .matrix (fullgrid ),as.matrix(cbind(x,y)),mean.neighbor=ceiling(length(x)*pi*h^2/ ((lims[2]-lims[1])*(lims[4]-lims[3]))),delta=h)$ind[,1]) pad=rep(0,length(gx)*length(gy)) pad[as.numeric(names(temp))]=as.numeric(temp) z <- matrix(pad, length(gx), length(gy)) if(density){z=z/(nx*pi*h^2)} return=list(x = gx, y = gy, z = z) } It works in more or less the same way as kde2d, but by default it returns counts not densities Aaron On 1 Dec 2008, at 11:46, Aaron Robotham wrote: Hello R users, I have successfully created a square (or more generally, rectangular) tophat smoothing routine based on altering the already available KDE2D. I would be keen to implement a circular tophat routine also, however this appears to be much more difficult to write efficiently (I have a routine, but it's very slow). I tried to create one based on using crossdist (in spatstat) to create a distance matrix between my data and the sampling grid, but it doesn't take a particularly large amount of data (or hi-res grid) for memory to be a big problem. The 2D density routines I have been able to find either don't support a simple tophat, or don't use the absolute distances between the sampling grid and the data. Should anyone know of more general 2D density routines that might support circular tophats, or know of a simple and efficient method for creating them, I would be very grateful. Thanks for your time, Aaron PS: I tried sending this on Friday originally, but as far as I know that didn't work, so should another post appear from me asking the same thing I apologise in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] very long integers
A quick question really: I have a database with extremely long integer IDs (eg 588848900971299297), which is too big for R to cope with internally (it appears to store as a double), and when I do any frequency tables erroneous results appear. Does anyone know of a package that extends internal storage up to LONG, or is the only solution to read it in as a character from the original data? In case anyone is curious, I didn't create the IDs, and in some form I must conserve all of the ID information for later use. Thanks, Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] very long integers
A quick question really: I have a database with extremely long integer IDs (eg 588848900971299297), which is too big for R to cope with internally (it appears to store as a double), and when I do any frequency tables erroneous results appear. Does anyone know of a package that extends internal storage up to LONG, or is the only solution to read it in as a character from the original data? In case anyone is curious, I didn't create the IDs, and in some form I must conserve all of the ID information for later use. Thanks, Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coxph frailty model counting process error X matrix deemed singular
Hello, I am currently trying to simulate data and analyze it using the frailty option in the coxph function. I am working with recurrent event data, using counting process notation. Occasionally, (about 1 in every 100 simulations) I get the following warning: Error in coxph(Surv(start, end, censorind) ~ binary + uniform + frailty(subject, : X matrix deemed to be singular; variable 2 My data is structured as follows: I have a Bernoulli random variable (parameter=0.5) (labeled "binary") and a second variable which was generated as seq(0.02, 1, 0.02), which is labeled as "uniform". There are 50 individual subjects. Recurrent events are then generated as rexp(1, 0.2*frailparm[j]*exp(mydata[j,1]*alpha[1]+mydata[j,2]*alpha[2])) where mydata is the cbind of the data just mentioned, alpha are the parameters for the recurrent events (here I am using c(1,1)) and frailparm is the frailty term for subject {j}. I generate recurrent events until the sum of the times is greater than the terminal time or censoring time, and keep the previous highest time as my final recurrent time, with one additional time which is censored at the minimum of the terminal event time and the censoring time. I then repeat for each subject. I then try to analyze the data like this: coxph(Surv(start,end,censorind)~binary+uniform+frailty(subject,distribution="gauss", method="reml"), method="breslow", singular.ok=FALSE, data=fulldata) Where start is the previous recurrent time, end is the current recurrent time, censorind is the censoring indicator for the current recurrent time, and subject is the current observation. There does not appear to be an issue with the binary variable taking a particular value for every observed event time, nor does there appear to be perfect correlation between the variable "uniform" and the survival time. Any help would be much appreciated. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] which rows are duplicates?
I would like to know which rows are duplicates of each other, not simply that a row is duplicate of another row. In the following example rows 1 and 3 are duplicates. > x <- c(1,3,1) > y <- c(2,4,2) > z <- c(3,4,3) > data <- data.frame(x,y,z) x y z 1 1 2 3 2 3 4 4 3 1 2 3 I can't figure out how to get R to tell me that observation 1 and 3 are the same. It seems like the "duplicated" and "unique" functions should be able to help me out, but I am stumped. For instance, if I use "duplicated" ... > duplicated(data) [1] FALSE FALSE TRUE it tells me that row 3 is a duplicate, but not which row it matches. How do I figure out WHICH row it matches? And If I use "unique"... > unique(data) x y z 1 1 2 3 2 3 4 4 I see that rows 1 and 2 are unique, leaving me to infer that row 3 was a duplicate, but again it doesn't tell me which row it was a duplicate of (as far as I can tell). Am I missing something? How can I determine that row 3 is a duplicate OF ROW 1? Thanks, Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confusion over "names" of lm.influence()$hat
76994995 0.04149530 0.04125143 0.06158475 I only noticed this problem because several times the observation in question wasn't even a part of the hat matrix output... Am I incorrect in assuming that the output from print(which(housedata$w>0)) should be the same as the "names" from print(lm.influence(result.b)$hat). Both have the same length (in this case 88 observations, but they don't appear to be the same observations. Thanks for anyone who can help me clear this up, Aaron __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Save a graph file use jpeg(file=file)
The simple solution for Windows is to use (windows icon) + shift + s. You then select a portion of your screen and it gets copied to your clipboard. You can then paste that into your document. Of course this will not work if it is important that the reader is able to rotate the graphic. Tim -Original Message- From: R-help On Behalf Of Sorkin, John Sent: Wednesday, January 5, 2022 2:46 PM To: r-help@r-project.org (r-help@r-project.org) Subject: [R] Save a graph file use jpeg(file=file) [External Email] I am trying to create a 3-D graph (using scatter3d) and save the graph to a file so I can insert the graph into a manuscript. I am able to create the graph. When I run the code below an RGL window opens that has the graph. The file is saved to disk after dev.odd() runs. Unfortunately, when I open the saved file, all I see is a white window. Can someone tell me how to have the file so I can subsequently read and place the file in a paper? The problem occurs regardless of the format in which I try to save the file, e.g. png, tiff. x <- 1:10 y <- 2:11 z <- y+rnorm(10) ForGraph<-data.frame(x=x,y=y,z=z) ForGraph gpathj <- file.path("C:","LAL","test.jpeg") gpathj jpeg(file = gpathj) par(mai = c(0.5, 0.5, 0.5, 0.5)) scatter3d(z=ForGraph$x, y=ForGraph$y, x=ForGraph$z, surface=FALSE,grid=TRUE,sphere.size=4 ,xlab="Categories",ylab="ScoreRange", zlab="VTE Rate (%)",axis.ticks=TRUE) dev.off() Thank you, John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9AMmnr8IGTA1cG1beZmaYb3IDwiObdbc6OI3PkjbwbniR_W4i9hcMzYbyzYTE-gS&s=2IkUWGiufhME4qqOuIRPGShSOMTsStDDwSkLoTt4zdM&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9AMmnr8IGTA1cG1beZmaYb3IDwiObdbc6OI3PkjbwbniR_W4i9hcMzYbyzYTE-gS&s=NHEqSQ0hFwKq_5lC0CqVYr53rYXLTpdWVNXZMCEMLLI&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NAs are removed
Hi Neha, You used a variable named "fraction" so we took a guess. However, as another pointed out 1/0 does not give NA in R. number/0 returns Inf except 0/0 which returns NaN. So 1/0 <= 1 returns FALSE and 0/0 <= 1 returns NA. A great deal of the behavior of your program hinges on what "fraction" is in your program. Tim -Original Message- From: R-help On Behalf Of Neha gupta Sent: Friday, January 14, 2022 4:50 PM To: Jim Lemon Cc: r-help mailing list Subject: Re: [R] NAs are removed [External Email] Hi Jim and Ebert How I am using divide by zero, I did not understand? I am using caret and AUC metric. If I do, what is the solution? On Fri, Jan 14, 2022 at 9:41 PM Jim Lemon wrote: > Hi Neha, > You're using the argument "na.omit" in what function? My blind guess > is that there's a divide by zero shooting you from behind. > > Jim > > On Sat, Jan 15, 2022 at 6:32 AM Neha gupta > wrote: > > > > Hi everyone > > > > I use na.omit to remove NAs but still it gives me error > > > > Error in if (fraction <= 1) { : missing value where TRUE/FALSE > > needed > > > > My data is: > > > > data.frame': 340 obs. of 15 variables: > > $ DepthTree: num 1 1 1 1 1 1 1 1 1 1 ... > > $ NumSubclass : num 0 0 0 0 0 0 0 0 0 0 ... > > $ McCabe : num 1 1 1 1 1 1 3 3 3 3 ... > > $ LOC : num 3 4 3 3 4 4 10 10 10 10 ... > > $ DepthNested : num 1 1 1 1 1 1 2 2 2 2 ... > > $ CA : num 1 1 1 1 1 1 1 1 1 1 ... > > $ CE : num 2 2 2 2 2 2 2 2 2 2 ... > > $ Instability : num 0.667 0.667 0.667 0.667 0.667 0.667 0.667 > > 0.667 > > 0.667 0.667 ... > > $ numCovered : num 0 0 0 0 0 0 0 0 0 0 ... > > $ operator : Factor w/ 16 levels "T0","T1","T2",..: 2 2 4 13 13 13 > 1 3 > > 4 7 ... > > $ methodReturn : Factor w/ 22 levels "I","V","Z","method",..: 2 2 2 > > 2 2 > 2 > > 2 2 2 2 ... > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_ma > > ilman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2k > > VeAsRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06S > > kfQKtGPM2OWM5&s=LqAgI3qNLyTF5KeFM9sT9jTT4rkvlcJa1V9CIW_SVy4&e= > > PLEASE do read the posting guide > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or > g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA > sRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06SkfQKt > GPM2OWM5&s=hUs_mg3eaWhd-I4H-9rKF6C4B7CFwLsuBkx3Qv68_o0&e= > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06SkfQKtGPM2OWM5&s=LqAgI3qNLyTF5KeFM9sT9jTT4rkvlcJa1V9CIW_SVy4&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06SkfQKtGPM2OWM5&s=hUs_mg3eaWhd-I4H-9rKF6C4B7CFwLsuBkx3Qv68_o0&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NAs are removed
I don’t see any. To support this claim I tried it (but no dataframe): CA = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2) prot <- ifelse(CA == '2', 0, 1) print(prot) R responds: [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [37] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [73] 0 0 0 0 0 0 You can check other statements in the same way. That said, in a huge dataset you might want to ask if the data provided match what you are assuming is there. If you type unique(ts$CA) do you get anything other than 1 and 2? This is a common task of figuring out if the problem is with the code or the data. Tim From: Neha gupta Sent: Friday, January 14, 2022 5:11 PM To: Ebert,Timothy Aaron Cc: Jim Lemon ; r-help mailing list Subject: Re: [R] NAs are removed [External Email] I have a variable in dataset "CA", which has the following values: [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [40] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 then I used this statement prot <- ifelse(ts$CA == '2', 0, 1) Is the problem exist here? On Fri, Jan 14, 2022 at 11:02 PM Ebert,Timothy Aaron mailto:teb...@ufl.edu>> wrote: Hi Neha, You used a variable named "fraction" so we took a guess. However, as another pointed out 1/0 does not give NA in R. number/0 returns Inf except 0/0 which returns NaN. So 1/0 <= 1 returns FALSE and 0/0 <= 1 returns NA. A great deal of the behavior of your program hinges on what "fraction" is in your program. Tim -Original Message- From: R-help mailto:r-help-boun...@r-project.org>> On Behalf Of Neha gupta Sent: Friday, January 14, 2022 4:50 PM To: Jim Lemon mailto:drjimle...@gmail.com>> Cc: r-help mailing list mailto:r-help@r-project.org>> Subject: Re: [R] NAs are removed [External Email] Hi Jim and Ebert How I am using divide by zero, I did not understand? I am using caret and AUC metric. If I do, what is the solution? On Fri, Jan 14, 2022 at 9:41 PM Jim Lemon mailto:drjimle...@gmail.com>> wrote: > Hi Neha, > You're using the argument "na.omit" in what function? My blind guess > is that there's a divide by zero shooting you from behind. > > Jim > > On Sat, Jan 15, 2022 at 6:32 AM Neha gupta > mailto:neha.bologn...@gmail.com>> > wrote: > > > > Hi everyone > > > > I use na.omit to remove NAs but still it gives me error > > > > Error in if (fraction <= 1) { : missing value where TRUE/FALSE > > needed > > > > My data is: > > > > data.frame': 340 obs. of 15 variables: > > $ DepthTree: num 1 1 1 1 1 1 1 1 1 1 ... > > $ NumSubclass : num 0 0 0 0 0 0 0 0 0 0 ... > > $ McCabe : num 1 1 1 1 1 1 3 3 3 3 ... > > $ LOC : num 3 4 3 3 4 4 10 10 10 10 ... > > $ DepthNested : num 1 1 1 1 1 1 2 2 2 2 ... > > $ CA : num 1 1 1 1 1 1 1 1 1 1 ... > > $ CE : num 2 2 2 2 2 2 2 2 2 2 ... > > $ Instability : num 0.667 0.667 0.667 0.667 0.667 0.667 0.667 > > 0.667 > > 0.667 0.667 ... > > $ numCovered : num 0 0 0 0 0 0 0 0 0 0 ... > > $ operator : Factor w/ 16 levels "T0","T1","T2",..: 2 2 4 13 13 13 > 1 3 > > 4 7 ... > > $ methodReturn : Factor w/ 22 levels "I","V","Z","method",..: 2 2 2 > > 2 2 > 2 > > 2 2 2 2 ... > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To > > UNSUBSCRIBE and more, see > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_ma > > ilman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2k > > VeAsRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06S > > kfQKtGPM2OWM5&s=LqAgI3qNLyTF5KeFM9sT9jTT4rkvlcJa1V9CIW_SVy4&e= > > PLEASE do read the posting guide > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or > g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA > sRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06SkfQKt > GPM2OWM5&s=hUs_mg3eaWhd-I4H-9rKF6C4B7CFwLsuBkx3Qv68_o0&e= > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.or
Re: [R] [External] Weird behaviour of order() when having multiple ties
Dat1 <- c(0.6, 0.5, 0.3, 0.2, 0.1, 0.1, 0.2) print(order(Dat1)) print(sort(Dat1)) Compare output -Original Message- From: R-help On Behalf Of Martin Maechler Sent: Monday, January 31, 2022 9:04 AM To: Stefan Fleck Cc: r-help@r-project.org Subject: Re: [R] [External] Weird behaviour of order() when having multiple ties [External Email] > Stefan Fleck > on Sun, 30 Jan 2022 21:07:19 +0100 writes: > it's not about the sort order of the ties, shouldn't all the 1s in > order(c(2,3,4,1,1,1,1,1)) come before 2,3,4? because that's not what > happening aaah.. now we are getting somewhere: It looks you have always confused order() with sort() ... have you ? > On Sun, Jan 30, 2022 at 9:00 PM Richard M. Heiberger wrote: >> when there are ties it doesn't matter which is first. >> in a situation where it does matter, you will need a tiebreaker column. >> -- >> *From:* R-help on behalf of Stefan Fleck < >> stefan.b.fl...@gmail.com> >> *Sent:* Sunday, January 30, 2022 4:16:44 AM >> *To:* r-help@r-project.org >> *Subject:* [External] [R] Weird behaviour of order() when having multiple >> ties >> >> I am experiencing a weird behavior of `order()` for numeric vectors. I >> tested on 3.6.2 and 4.1.2 for windows and R 4.0.2 on ubuntu. Can anyone >> confirm? >> >> order( >> c( >> 0.6, >> 0.5, >> 0.3, >> 0.2, >> 0.1, >> 0.1 >> ) >> ) >> ## Result [should be in order] >> [1] 5 6 4 3 2 1 >> >> The sort order is obviously wrong. This only occurs if i have multiple >> ties. The problem does _not_ occur for decreasing = TRUE. >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__nam10.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fstat.ethz.ch-252Fmailman-252Flistinfo-252Fr-2Dhelp-26amp-3Bdata-3D04-257C01-257Crmh-2540temple.edu-257Cbae20314c2314a5cc7cd08d9e429e33f-257C716e81efb52244738e3110bd02ccf6e5-257C0-257C0-257C637791692024451993-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26amp-3Bsdata-3DO6R-252FNM6IdPzP8RY3JIWfLgmkE-252B0KcVyYBxoRMo8v2dk-253D-26amp-3Breserved-3D0&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=kydE98W9Su8vCPoxYcigO1iYSHVO2pjdbYqF8z4CEwo&e= >> PLEASE do read the posting guide >> https://urldefense.proofpoint.com/v2/url?u=https-3A__nam10.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fwww.r-2Dproject.org-252Fposting-2Dguide.html-26amp-3Bdata-3D04-257C01-257Crmh-2540temple.edu-257Cbae20314c2314a5cc7cd08d9e429e33f-257C716e81efb52244738e3110bd02ccf6e5-257C0-257C0-257C637791692024451993-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26amp-3Bsdata-3D6hlfMjZLzopVzGnFVWlGnoEqvZBQwXPlxMuZ2sglEUk-253D-26amp-3Breserved-3D0&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=_xSJacXhmOM-JE0jBCZ62UPEgerWHVqFkW2aXuIekvY&e= >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=eoBL8fgGe-j3eEYAo1fT5-oVM-5twH3nn5iTJ3Dh6vc&e= > PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=6QEl5w7lJHJJELW6QwypJN8KK64mDcTZXg5yoLs9Wu4&e= > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=eoBL8fgGe-j3eEYAo1fT5-oVM-5twH3nn5iTJ3Dh6vc&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=6QEl5w7lJHJJELW6QwypJN8KK64mDcTZXg5yoLs9Wu4&e= and provide commented, minimal, self-contained,
Re: [R] Convert a character string to variable names
"A variable in R can refer to many things, ..." I agree. "It absolutely _can_ refer to a list, ..." I partly agree. In R as a programming language I agree. In R as a statistical analysis tool then only partly. Typically one would need to limit the list so each variable would be of the same length and all values within the variable be of the same data type (integer, real, factor, character). As a programmer yes, as a statistician not really unless you always qualify the type of list considered and that gets tiresome. R does name individual elements using numeric place names: hence df[row, column]. Each element must have a unique address, and that is true in all computer languages. A dataframe is a list of columns of the same length containing the same data type within a column. mtcars$disp does not have a value (a value is one number). With 32 elements I can calculate a mean and the mean is a value. 32 numbers is not a value. I suppose a single value could be the starting memory address of the name, but I don't see how that distinction helps unless one is doing Assembly or Machine language programming. I have never used get(), so I will keep that in mind. I agree that it makes life much easier to enter the data in the way it will be analyzed. -Original Message- From: Jeff Newmiller Sent: Tuesday, February 8, 2022 10:10 PM To: r-help@r-project.org; Ebert,Timothy Aaron ; Richard O'Keefe ; Erin Hodgess Cc: r-help@r-project.org Subject: Re: [R] Convert a character string to variable names [External Email] A variable in R can refer to many things, but it cannot be an element of a vector. It absolutely _can_ refer to a list, a list of lists, a function, an environment, and any of the various kinds of atomic vectors that you seem to think of as variables. (R does _not_ name individual elements of vectors, unlike many other languages.) The things you can do with the mtcars object may be different than the things you can do with the object identified by the expression mtcars$disp, but the former has a variable name in an environment while the latter is embedded within the former. mtcars$disp is shorthand for the expression mtcars[[ "disp" ]] which searches the names attribute of the mtcars list (a data frame is a list of columns) to refer to that object. R allows non-standard evaluation to make elements of lists accessible as though they were variables in an environment, such as with( mtcars, disp ) or various tidyverse evaluation conventions. But while the expression mtcars$disp DOES have a value( it is an atomic vector of 32 integer elements) it is not a variable so get("mtcars$disp") cannot be expected to work (as it does not). You may be confusing "variable" with "object" ... lots of objects have no variable names. I have done all sorts of complicated data manipulations in R, but I have never found a situation where a use of get() could not be replaced with a clearer way to get the job done. Using lists is central to this... avoid making distinct variables in the first place if you plan to be retrieving them later indirectly like this. On February 8, 2022 5:45:39 PM PST, "Ebert,Timothy Aaron" wrote: > >I had thought that mtcars in "mtcars$disp" was the name of a dataframe and >that "disp" was the name of a column in the dataframe. If I would make a model >like horse power = displacement then "disp" would be a variable in the model >and I can find values for this variable in the "disp" column in the "mtcars" >dataframe. I am not sure how I would use "mtcars" as a variable. >"mtcars$disp" has no specific value, though it will have a specific value for >any given row of data (assuming rows are observations). > >Tim > > >-Original Message- >From: R-help On Behalf Of Richard >O'Keefe >Sent: Tuesday, February 8, 2022 8:17 PM >To: Erin Hodgess >Cc: r-help@r-project.org >Subject: Re: [R] Convert a character string to variable names > >[External Email] > >"mtcars$disp" is not a variable name. >"mtcars" is a variable name, and >get("mtcars") will get the value of that variable assign("mtcars", >~~whatever~~) will set it. >mtcars$disp is an *expression*, >where $ is an indexing operator >https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.o >rg_doc_manuals_r-2Drelease_R-2Dlang.html-23Indexing&d=DwICAg&c=sJ6xIWYx >-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=CI-7ZdIwlhUvhmOkVD7KJkv3IvSSW >y4ix2Iz1netW81V-NUV8aOVVqyn5-fmD6cf&s=RjRC5kve6D8k59qZQYcX-PR-aA4TTu1yf >LPBhHxSlWk&e= >so what you want is >> mtcars <- list(cyl=4, disp=1.8) >> eval(parse(text="mtcars$disp")) >[1] 1.8 > >Th
Re: [R] Convert a character string to variable names
How does “a value” differ from “an object?” From: Richard O'Keefe Sent: Friday, February 11, 2022 12:25 AM To: Ebert,Timothy Aaron Cc: Jeff Newmiller ; r-help@r-project.org; Erin Hodgess Subject: Re: [R] Convert a character string to variable names [External Email] You wrote "32 numbers is not a value". It is, it really is. When you have a vector like x <- 1:32 you have a simple variable (x) referring to an immutable value (1, 2, ..., 32). A vector in R is NOT a collection of mutable boxes, it is a collection of *numbers* (or strings). The vector itself is a good a value as ever twanged. You cannot change it. A statement like x[i] <- 77 is just shorthand for x <- "[<-"(x, i, 77) which constructs a whole new 32-number value and assigns that to x. (The actual implementation is cleverer when it can be, but often it cannot be clever.) Pure values like vectors can be shared: if x is a vector, then y <- x is a constant time operation. If you then change y, you only change y, not the vector. x is unchanged. On Wed, 9 Feb 2022 at 17:06, Ebert,Timothy Aaron mailto:teb...@ufl.edu>> wrote: "A variable in R can refer to many things, ..." I agree. "It absolutely _can_ refer to a list, ..." I partly agree. In R as a programming language I agree. In R as a statistical analysis tool then only partly. Typically one would need to limit the list so each variable would be of the same length and all values within the variable be of the same data type (integer, real, factor, character). As a programmer yes, as a statistician not really unless you always qualify the type of list considered and that gets tiresome. R does name individual elements using numeric place names: hence df[row, column]. Each element must have a unique address, and that is true in all computer languages. A dataframe is a list of columns of the same length containing the same data type within a column. mtcars$disp does not have a value (a value is one number). With 32 elements I can calculate a mean and the mean is a value. 32 numbers is not a value. I suppose a single value could be the starting memory address of the name, but I don't see how that distinction helps unless one is doing Assembly or Machine language programming. I have never used get(), so I will keep that in mind. I agree that it makes life much easier to enter the data in the way it will be analyzed. -Original Message- From: Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>> Sent: Tuesday, February 8, 2022 10:10 PM To: r-help@r-project.org<mailto:r-help@r-project.org>; Ebert,Timothy Aaron mailto:teb...@ufl.edu>>; Richard O'Keefe mailto:rao...@gmail.com>>; Erin Hodgess mailto:erinm.hodg...@gmail.com>> Cc: r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: [R] Convert a character string to variable names [External Email] A variable in R can refer to many things, but it cannot be an element of a vector. It absolutely _can_ refer to a list, a list of lists, a function, an environment, and any of the various kinds of atomic vectors that you seem to think of as variables. (R does _not_ name individual elements of vectors, unlike many other languages.) The things you can do with the mtcars object may be different than the things you can do with the object identified by the expression mtcars$disp, but the former has a variable name in an environment while the latter is embedded within the former. mtcars$disp is shorthand for the expression mtcars[[ "disp" ]] which searches the names attribute of the mtcars list (a data frame is a list of columns) to refer to that object. R allows non-standard evaluation to make elements of lists accessible as though they were variables in an environment, such as with( mtcars, disp ) or various tidyverse evaluation conventions. But while the expression mtcars$disp DOES have a value( it is an atomic vector of 32 integer elements) it is not a variable so get("mtcars$disp") cannot be expected to work (as it does not). You may be confusing "variable" with "object" ... lots of objects have no variable names. I have done all sorts of complicated data manipulations in R, but I have never found a situation where a use of get() could not be replaced with a clearer way to get the job done. Using lists is central to this... avoid making distinct variables in the first place if you plan to be retrieving them later indirectly like this. On February 8, 2022 5:45:39 PM PST, "Ebert,Timothy Aaron" mailto:teb...@ufl.edu>> wrote: > >I had thought that mtcars in "mtcars$disp" was the name of a dataframe and >that "disp" was the name of a column in the dataframe. If I would make a model >like horse power = displacement then "disp" would be a variable in the model >and I can find values
Re: [R] confusion matrix like detail with continuous data?
In your prediction you will have a target level of accuracy. Something like "I need to predict the slope of the regression to within 1%." You break your data into a training and testing data sets, then for the testing data set you ask is the prediction within 1% of the observed value. That is about as close as I can come as I have trouble thinking how to get a false positive out of a regression with a continuous dependent variable. Of course, you have to have enough data that splitting the data set into two pieces leaves enough observations to make a reasonable model. Tim -Original Message- From: R-help On Behalf Of Ivan Krylov Sent: Wednesday, February 16, 2022 5:00 AM To: r-help@r-project.org Subject: Re: [R] confusion matrix like detail with continuous data? [External Email] On Tue, 15 Feb 2022 22:17:42 +0100 Neha gupta wrote: > (1) Can we get the details like the confusion matrix with continuous > data? I think the closest you can get is a predicted-reference plot. That is, plot true values on the X axis and the corresponding predicted values on the Y axis. Unsatisfying option: use cut() to transform a continuous variable into a categorical variable and make a confusion matrix out of that. > (2) How can we get the mean absolute error for an individual instance? > For example, if the ground truth is 4 and our model predicted as 6, > how to find the mean absolute error for this instance? Mathematically speaking, mean absolute error of an individual instance would be just the absolute value of the error in that instance, but that's probably not what you're looking for. If you need some kind of confidence bands for the predictions, it's the model's responsibility to provide them. There's lots of options, ranging from the use of the loss function derivative around the optimum to Monte-Carlo simulations. For examples, see the confint() method. -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=n0Pz_t-BEeazrrz7r5DIs0qGgfyJ0E0_F5sGlJyjhnwJRydXFvfNs1g5Pe25PGK0&s=ZeN73VTXr4Z-qwxODgOWPyhqtvKWIXp6xVsLle-eWYA&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=n0Pz_t-BEeazrrz7r5DIs0qGgfyJ0E0_F5sGlJyjhnwJRydXFvfNs1g5Pe25PGK0&s=CqgXaJDSeFk1kD9-xcMjcbZYWKXSCkuJZodGf0yvRDk&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with data distribution
You pipe the filter but do not save the result. A reproducible example might help. Tim -Original Message- From: R-help On Behalf Of Neha gupta Sent: Thursday, February 17, 2022 1:55 PM To: r-help mailing list Subject: [R] Problem with data distribution [External Email] Hello everyone I have a dataset with output variable "bug" having the following values (at the bottom of this email). My advisor asked me to provide data distribution of bugs with 0 values and bugs with more than 0 values. data = readARFF("synapse.arff") data2 = readARFF("synapse.arff") data$bug library(tidyverse) data %>% filter(bug == 0) data2 %>% filter(bug >= 1) boxplot(data2$bug, data$bug, range=0) But both the graphs are exactly the same, how is it possible? Where I am doing wrong? data$bug [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 0 4 1 0 [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 7 0 0 1 [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 0 1 0 0 [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 0 0 0 1 [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with data distribution
Maybe what you want is to recode your data differently. One data set has bug versus no bug. What is the probability of having one or more bugs? The other data set has bugs only. Given that I have bugs how many will I get? Tim -Original Message- From: R-help On Behalf Of Neha gupta Sent: Thursday, February 17, 2022 4:54 PM To: Bert Gunter Cc: r-help mailing list Subject: Re: [R] Problem with data distribution [External Email] :) :) On Thu, Feb 17, 2022 at 10:37 PM Bert Gunter wrote: > imo, with such simple data, a plot is mere chartjunk. A simple table(= > the distribution) would suffice and be more informative: > > > table(bug) ## bug is a vector. No data frame is needed > > 0 1 23 4 5 7 ## bug count > 162 40 9 7 2 1 1 ## nmbr of cases with the given count > > You or others may disagree, of course. > > Bert Gunter > > > > On Thu, Feb 17, 2022 at 11:56 AM Neha gupta > wrote: > > > > Ebert and Rui, thank you for providing the tips (in fact, for > > providing > the > > answer I needed). > > > > Yes, you are right that boxplot of all zero values will not make sense. > > Maybe histogram will work. > > > > I am providing a few details of my data here and the context of the > > question I asked. > > > > My data is about bugs/defects in different classes of a large > > software system. I have to predict which class will contain bugs and > > which will be free of bugs (bug=0). I trained ML models and predict > > but my advisor > asked > > me to provide first the data distribution about bugs e.g details of > > how many classes with bugs (bug > 0) and how many are free of bugs (bug=0). > > > > That is why I need to provide the data distribution of both types of > values > > (i.e. bug=0 and bug >0) > > > > Thank you again. > > > > On Thu, Feb 17, 2022 at 8:28 PM Rui Barradas > wrote: > > > > > Hello, > > > > > > In your original post you read the same file "synapse.arff" twice, > > > apparently to filter each of them by its own criterion. You don't > > > need to do that, read once and filter that one by different criteria. > > > > > > As for the data as posted, I have read it in with the following code: > > > > > > > > > x <- " > > > 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 > > > 0 0 0 > > > 4 1 0 > > > 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 > > > 0 0 0 > > > 0 0 0 > > > 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 > > > 0 0 7 > > > 0 0 1 > > > 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 > > > 0 0 0 > > > 1 0 0 > > > 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 > > > 1 0 0 > > > 0 0 1 > > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 " > > > bug <- scan(text = x) > > > data <- data.frame(bug) > > > > > > > > > This is not the right way to post data, the posting guide asks to > > > post the output of > > > > > > > > > dput(data) > > > structure(list(bug = c(0, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, > > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, > > > 4, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 3, 2, 0, 0, 0, 0, 3, 0, 0, > > > 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, > > > 2, 1, 0, 1, 0, 0, 0, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, > > > 0, 1, 0, 0, 5, 0, 0, 0, 0, 0, 0, 7, 0, 0, 1, 0, 1, 1, 0, 2, 0, 3, > > > 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 3, 2, 1, 1, > > > 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, > > > 0, 0, 3, 0, 0, 1, 0, 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 4, 1, 1, > > > 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > > > 0, 0, 3, 0, 1, 0, 0, 0, 0, 0)), class = "data.frame", row.names = > > > c(NA, -222L)) > > > > > > > > > > > > This can be copied into an R session and the data set recreated > > > with > > > > > > data <- structure(etc) > > > > > > > > > Now the boxplots. > > > > > > (Why would you want to plot a vector of all zeros, btw?) > > > > > > > > > > > > library(dplyr) > > > > > > boxplot(filter(data, bug == 0))# nonsense > > > boxplot
Re: [R] conditional filling of data.frame - improve code
You could try some of the "join" commands from dplyr. https://dplyr.tidyverse.org/reference/mutate-joins.html https://statisticsglobe.com/r-dplyr-join-inner-left-right-full-semi-anti Regards, Tim -Original Message- From: R-help On Behalf Of Jeff Newmiller Sent: Thursday, March 10, 2022 11:25 AM To: r-help@r-project.org; Ivan Calandra ; R-help Subject: Re: [R] conditional filling of data.frame - improve code [External Email] Use merge. expts <- read.csv( text = "expt,sample ex1,sample1-1 ex1,sample1-2 ex2,sample2-1 ex2,sample2-2 ex2,sample2-3 ", header=TRUE, as.is=TRUE ) mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", "sample1-1", "sample1-1", "sample2-1")) merge( mydata, expts, by="sample", all.x=TRUE ) On March 10, 2022 7:50:23 AM PST, Ivan Calandra wrote: >Dear useRs, > >I would like to improve my ugly (though working) code, but I think I >need a completely different approach and I just can't think out of my box! > >I have some external information about which sample(s) belong to which >experiment. I need to get that manually into R (either typing directly >in a script or read a CSV file, but that makes no difference): >exp <- list(ex1 = c("sample1-1", "sample1-2"), ex2 = c("sample2-1", >"sample2-2" , "sample2-3")) > >Then I have my data, only with the sample IDs: >mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", >"sample1-1", "sample1-1", "sample2-1")) > >Now I want to add a column to mydata with the experiment ID. The best I >could find is that: >for (i in names(exp)) mydata[mydata[["sample"]] %in% exp[[i]], >"experiment"] <- i > >In this example, the experiment ID could be extracted from the sample >IDs, but this is not the case with my real data so it really is a >matter of matching. Of course I also have other columns with my real data. > >I'm pretty sure the last line (with the loop) can be improved in terms >of readability (speed is not an issue here). I have close to no >constraints on 'exp' (here I chose a list, but anything could do), the >only thing that cannot change is the format of 'mydata'. > >Thank you in advance! >Ivan > -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=4HazMU4Mqs2oOcAkBrZd0VGrHX_lw6J1XozQNQ9RsHk&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=LdQqnVBkEAmRk7baBZLPs2svUpN6DIYaznrka_X8maI&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How important is set.seed
If you are using the program for data analysis then set.seed() is not necessary unless you are developing a reproducible example. In a standard analysis it is mostly counter-productive because one should then ask if your presented results are an artifact of a specific seed that you selected to get a particular result. However, in cases where you need a reproducible example, debugging a program, or specific other cases where you might need the same result with every run of the program then set.seed() is an essential tool. Tim -Original Message- From: R-help On Behalf Of Jeff Newmiller Sent: Monday, March 21, 2022 8:41 PM To: r-help@r-project.org; Neha gupta ; r-help mailing list Subject: Re: [R] How important is set.seed [External Email] First off, "ML models" do not all use random numbers (for prediction I would guess very few of them do). Learn and pay attention to what the functions you are using do. Second, if you use random numbers properly and understand the precision that your specific use case offers, then you don't need to use set.seed. However, in practice, using set.seed can allow you to temporarily avoid chasing precision gremlins, or set up specific test cases for testing code, not results. It is your responsibility to not let this become a crutch... a randomized simulation that is actually sensitive to the seed is unlikely to offer an accurate result. Where to put set.seed depends a lot on how you are performing your simulations. In general each process should set it once uniquely at the beginning, and if you use parallel processing then use the features of your parallel processing framework to insure that this happens. Beware of setting all worker processes to use the same seed. On March 21, 2022 5:03:30 PM PDT, Neha gupta wrote: >Hello everyone > >I want to know > >(1) In which cases, we need to use set.seed while building ML models? > >(2) Which is the exact location we need to put the set.seed function i.e. >when we split data into train/test sets, or just before we train a model? > >Thank you > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm >an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz >sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf >0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= >PLEASE do read the posting guide >https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org >_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR >zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm >f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How important is set.seed
That approach would start the trainControl method at set.seed(123) and it would start ran_search at set.seed(123). I am not sure it would be good or not – especially in this context. I am not clear on how the results are being compared, but I could get some differences if one method had a few extra calls to an RNG (random number generator). I would think it makes more sense to ask how approach 1 differs from approach 2 over a wide range of seeds. You are not testing the RNG, and I am not sure using the same seed for each model makes a difference unless the analysis is a paired samples approach. Might it be more effective to remove the initial set.seed() and then replace the second set.seed with set.seed(NULL) ? Otherwise wrap this into a loop N1=100 set.seed(123) seed1<- runif(100, min=20, max=345689) for (I in 1:100){ set.seed(seed1[i] code set.seed(seed1[i] } Or use set.seed(NULL) between the models. You will need some variable to store the relevant results from each model, and some code do display the results. In the former I suggest setting up a matrix or two that can be indexed using the for loop index. Tim From: Neha gupta Sent: Tuesday, March 22, 2022 12:03 PM To: Ebert,Timothy Aaron Cc: Jeff Newmiller ; r-help@r-project.org Subject: Re: How important is set.seed [External Email] Thank you again Tim d=readARFF("my data") set.seed(123) tr <- d[index, ] ts <- d[-index, ] ctrl <- trainControl(method = "repeatedcv",number=10) set.seed(123) ran_search <- train(lneff ~ ., data = tr, method = "mlp", tuneLength = 30, metric = "MAE", preProc = c("center", "scale", "nzv"), trControl = ctrl) getTrainPerf(ran_search) Would it be good? On Tue, Mar 22, 2022 at 4:34 PM Ebert,Timothy Aaron mailto:teb...@ufl.edu>> wrote: My inclination is to follow Jeff’s advice and put it at the beginning of the program. You can always experiment: set.seed(42) rnorm(5,5,5) rnorm(5,5,5) runif(5,0,3) As long as the commands are executed in the order they are written, then the outcome is the same every time. Set seed is giving you reproducible outcomes. However, the second rnorm() does not give you the same outcome as the first. So set seed starts at the same point but if you want the first and second rnorm() call to give the same results you will need another set.seed(42). Note also, that it does not matter if you pause: run the above code as a chunk, or run each command individually you get the same result (as long as you do it in the sequence written). So, if you set seed, run some code, take a break, come back write some more code you might get in trouble because R is still using the original set.seed() command. To solve this issue use set.seed(Sys.time()) Or set.seed(NULL) Some of this is just good programming style workflow: Import data Declare variables and constants (set.seed() typically goes here) Define functions Body of code Generate output Clean up ( set.seed(NULL) would go here, along with removing unused variables and such) Regards, Tim From: Neha gupta mailto:neha.bologn...@gmail.com>> Sent: Tuesday, March 22, 2022 10:48 AM To: Ebert,Timothy Aaron mailto:teb...@ufl.edu>> Cc: Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>>; r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: How important is set.seed [External Email] Hello Tim In some of the examples I see in the tutorials, they put the random seed just before the model training e.g train function in case of caret library. Should I follow this? Best regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron mailto:teb...@ufl.edu>> wrote: Ah, so maybe what you need is to think of “set.seed()” as a treatment in an experiment. You could use a random number generator to select an appropriate number of seeds, then use those seeds repeatedly in the different models to see how seed selection influences outcomes. I am not quite sure how many seeds would constitute a good sample. For me that would depend on what I find and how long a run takes. In parallel processing you set seed in master and then use a random number generator to set seeds in each worker. Tim From: Neha gupta mailto:neha.bologn...@gmail.com>> Sent: Tuesday, March 22, 2022 6:33 AM To: Ebert,Timothy Aaron mailto:teb...@ufl.edu>> Cc: Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>>; r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: How important is set.seed [External Email] Thank you all. Actually I need set.seed because I have to evaluate the consistency of features selection generated by different models, so I think for this, it's recommended to use the seed. Warm regards On Tuesday, March 22, 2022, Ebert,Timothy Aaron mailto:teb...@ufl.edu>> wrote: If you are
Re: [R] How important is set.seed
Not wrong, just mostly different words. 1) I think of reproducible code as something for teaching or sharing. It can be useful in debugging if I want help (one reason for sharing). In solo debugging my code, I have not used set.seed() -- at least not yet. However, my programs are all small, mostly less than 100 lines of code. 2) Agreed. 3) Agreed -- one needs to be very clear on why one is using set seed(). In many situations it is undoing the purpose of using a random number generator. 4) Agreed -- this is why it is so important to publish the version of R and the package used when presenting results. A great deal of effort has gone into building and selecting a good RNG. Depending on how the RNG is used, a basic understanding of what defines "good" is valuable. If there are huge numbers of calls to the RNG then periodicity in the RNG may start making a difference. Random.org might be another place for the OP to explore. Tim -Original Message- From: Bert Gunter Sent: Tuesday, March 22, 2022 12:12 PM To: Neha gupta Cc: Ebert,Timothy Aaron ; r-help@r-project.org Subject: Re: [R] How important is set.seed [External Email] OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But... 1. set.seed() is used to make any procedure that uses R's pseudo-random number generator -- including, for example, sampling from a distribution, random data splitting, etc. -- "reproducible". That is, if the procedure is repeated *exactly,* by invoking set.seed() with its original argument values (once!) *before* the procedure begins, exactly the same results should be produced by the procedure. Full stop. It does not matter how many times random number generation occurs within the procedure thereafter -- R preserves the state of the rng between invocations (but see the notes in ?set.seed for subtle qualifications of this claim). 2. Hence, if no (pseudo-) random number generation is used, set.seed() is irrelevant. Full stop. 3. Hence, if you don't care about reproducibility (you should! -- if for no other reason than debugging), you don't need set.seed() 4. The "randomness" of any sequence of results from any particular set.seed() arguments (including further calls to the rng) is a complex issue. ?set.seed has some discussion of this, but one needs considerable expertise to make informed choices here. As usual, we untutored users should be guided by the expert recommendations of the Help file. *** If anything I have said above is wrong, I would greatly appreciate a public response here showing my error.*** Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Mar 22, 2022 at 7:48 AM Neha gupta wrote: > > Hello Tim > > In some of the examples I see in the tutorials, they put the random > seed just before the model training e.g train function in case of caret > library. > Should I follow this? > > Best regards > On Tuesday, March 22, 2022, Ebert,Timothy Aaron wrote: > > > Ah, so maybe what you need is to think of “set.seed()” as a > > treatment in an experiment. You could use a random number generator > > to select an appropriate number of seeds, then use those seeds > > repeatedly in the different models to see how seed selection > > influences outcomes. I am not quite sure how many seeds would > > constitute a good sample. For me that would depend on what I find and how > > long a run takes. > > > > In parallel processing you set seed in master and then use a > > random number generator to set seeds in each worker. > > > > Tim > > > > > > > > *From:* Neha gupta > > *Sent:* Tuesday, March 22, 2022 6:33 AM > > *To:* Ebert,Timothy Aaron > > *Cc:* Jeff Newmiller ; > > r-help@r-project.org > > *Subject:* Re: How important is set.seed > > > > > > > > *[External Email]* > > > > Thank you all. > > > > > > > > Actually I need set.seed because I have to evaluate the consistency > > of features selection generated by different models, so I think for > > this, it's recommended to use the seed. > > > > > > > > Warm regards > > > > On Tuesday, March 22, 2022, Ebert,Timothy Aaron wrote: > > > > If you are using the program for data analysis then set.seed() is > > not necessary unless you are developing a reproducible example. In a > > standard analysis it is mostly counter-productive because one should > > then ask if your presented results are an artifact of a specific > > seed that you selected to get a particular result. However, in cases >
Re: [R] How important is set.seed
I would also disagree with your rephrasing. What is the point in characterizing if there is no understanding? What one wants is to understand the variability in outcome caused by including a random element in the model if the focus is on the random numbers. It may also be that one wants to understand the variability in outcome if one were to repeat an experiment. One approach is to split a dataset into testing and training sets, and use the RNG to decide which observation goes into which set. However, every run will give a slightly different answer. The random number generator is then used in place of a permutation test where the number of permutations is too large for current computational effort. I assume what the OP was asking is whether the conclusion(s) of two (or more) models were the same given the range in outcomes produced by the random number generator(s). The only way to address this is to characterize the distribution of model outcomes from different runs with different random seeds. Examine that characterization and hope for understanding. Tim From: Bert Gunter Sent: Tuesday, March 22, 2022 2:03 PM To: Ebert,Timothy Aaron Cc: Neha gupta ; r-help@r-project.org Subject: Re: [R] How important is set.seed [External Email] "rather to understand how the choice of seed influences final model output." No! Different seeds just produce different streams of (pseudo)-random numbers. Hence there cannot be any "understanding" of how "choice of seed" influences results. Presumably, what you meant is to characterize the variability in results from the procedure due to its incorporation of randomness in what it does. Re-read Jeff's last post. This does *not* require set.seed() at all. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Mar 22, 2022 at 9:55 AM Ebert,Timothy Aaron mailto:teb...@ufl.edu>> wrote: So step 1 is not to compare models, rather to understand how the choice of seed influences final model output. Once you have a handle on this issue, then work at comparing models. Tim From: Neha gupta mailto:neha.bologn...@gmail.com>> Sent: Tuesday, March 22, 2022 12:19 PM To: Bert Gunter mailto:bgunter.4...@gmail.com>> Cc: Ebert,Timothy Aaron mailto:teb...@ufl.edu>>; r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: [R] How important is set.seed [External Email] I read a paper two days ago (and that's why I then posted here about set.seed) which used interpretable machine learning. According to the authors, different explanations (of the black-box models) will be produced by the ML models if different seeds are used or never used. On Tue, Mar 22, 2022 at 5:12 PM Bert Gunter mailto:bgunter.4...@gmail.com>> wrote: OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But... 1. set.seed() is used to make any procedure that uses R's pseudo-random number generator -- including, for example, sampling from a distribution, random data splitting, etc. -- "reproducible". That is, if the procedure is repeated *exactly,* by invoking set.seed() with its original argument values (once!) *before* the procedure begins, exactly the same results should be produced by the procedure. Full stop. It does not matter how many times random number generation occurs within the procedure thereafter -- R preserves the state of the rng between invocations (but see the notes in ?set.seed for subtle qualifications of this claim). 2. Hence, if no (pseudo-) random number generation is used, set.seed() is irrelevant. Full stop. 3. Hence, if you don't care about reproducibility (you should! -- if for no other reason than debugging), you don't need set.seed() 4. The "randomness" of any sequence of results from any particular set.seed() arguments (including further calls to the rng) is a complex issue. ?set.seed has some discussion of this, but one needs considerable expertise to make informed choices here. As usual, we untutored users should be guided by the expert recommendations of the Help file. *** If anything I have said above is wrong, I would greatly appreciate a public response here showing my error.*** Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Mar 22, 2022 at 7:48 AM Neha gupta mailto:neha.bologn...@gmail.com>> wrote: > > Hello Tim > > In some of the examples I see in the tutorials, they put the random seed > just before the model training e.g train function in case of caret library. > Should I follow this? > > Best regards > On Tuesday, March 22, 2022, Ebert,Timothy Aaron >
Re: [R] What is the intended behavior, when subsetting using brackets [ ], when the subset criterion has NA's?
I get an error with this: my_subset_criteria <- c( F, F, T, NA, NA) my_subset_criteria Tim -Original Message- From: R-help On Behalf Of Kelly Thompson Sent: Wednesday, April 6, 2022 4:13 PM To: r-help@r-project.org Subject: [R] What is the intended behavior, when subsetting using brackets [ ], when the subset criterion has NA's? [External Email] I noticed that I get different results when subsetting using subset, compared to subsetting using "brackets" when the subset criteria have NA's. Here's an example #START OF EXAMPLE my_data <- 1:5 my_data my_subset_criteria <- c( F, F, T, NA, NA) my_subset_criteria #subsetting using subset returns the data where my_subset_criteria equals TRUE my_data[my_subset_criteria == T] #subsetting using brackets returns the data where my_subset_criteria equals TRUE, and also NA where my_subset_criteria is NA subset(my_data, my_subset_criteria == T) #END OF EXAMPLE This behavior is also mentioned here https://urldefense.proofpoint.com/v2/url?u=https-3A__statisticaloddsandends.wordpress.com_2018_10_07_subsetting-2Din-2Dthe-2Dpresence-2Dof-2Dnas_&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=NoPFkG0n9RFRaacmiiQ9Hp1cGniz9ED5YGN11-Jh6rD_zkTTE8e5egsKqzQDMSEW&s=5lgkxT5A_MSfElILNk1ZM3RGpcBWpMBu713av1DH1mk&e= Q. Is this the intended behavior when subsetting with brackets? Thank you! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=NoPFkG0n9RFRaacmiiQ9Hp1cGniz9ED5YGN11-Jh6rD_zkTTE8e5egsKqzQDMSEW&s=g9IzSC3WrXPLYjys_RdYSmgUoFFjsbwRJZZodqtDRa0&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=NoPFkG0n9RFRaacmiiQ9Hp1cGniz9ED5YGN11-Jh6rD_zkTTE8e5egsKqzQDMSEW&s=uy6rCSNVehGynLn3ZCpLp_r2gHhoGcya4dbRe-tqQRc&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error with text analysis data
Is this a different question from the original post? It would be better to keep threads separate. Always pre-process the data. Clean the data of obvious mistakes. This can be simple typographical errors or complicated like an author that wrote too when they intended two or to. In old English texts spelling was not standardized and the same word could have multiple spellings within one book or chapter. Removing punctuation is probably a part of this, though a program like Grammarly would not work very well if it removed punctuation. After that it depends on what you are trying to accomplish. Are you interested in the number of times an author used the word “a” or “the” and is “The” different from “the?” Are you modeling word use frequency or comparing vocabulary between texts. Too many choices. Tim From: Neha gupta Sent: Wednesday, April 13, 2022 2:49 PM To: Bill Dunlap Cc: Ebert,Timothy Aaron ; r-help mailing list Subject: Re: Error with text analysis data [External Email] Someone just told me that you need to pre process the data before model construction. For instance, make the text to lower case, remove punctuation, symbols etc and tokenize the text (give number to each word). Then create word of bags model (not sure about it), and then create a model. Is it true to perform all these steps? Best regards On Wednesday, April 13, 2022, Bill Dunlap mailto:williamwdun...@gmail.com>> wrote: > I would always suggest working until the model works, no errors and no NA > values We agree on that. However, the error gives you no hint about which variables are causing the problem. If it did, then it could only tell about the first variable with the problem. I think you would get to your working model faster if you got NA's for the constant columns and then could drop them all at once (or otherwise deal with them). -Bill On Wed, Apr 13, 2022 at 9:40 AM Ebert,Timothy Aaron mailto:teb...@ufl.edu>> wrote: I suspect that it is because you are looking at two types of error, both telling you that the model was not appropriate. In the “error in contrasts” there is nothing to contrast in the model. For a numerical constant the program calculates the standard deviation and ends with a division by zero. Division by zero is undefined, or NA. I would always suggest working until the model works, no errors and no NA values. The reason is that I can get NA in several ways and I need to understand why. If I just ignore the NA in my model I may be assuming the wrong thing. Tim From: Bill Dunlap mailto:williamwdun...@gmail.com>> Sent: Wednesday, April 13, 2022 12:23 PM To: Ebert,Timothy Aaron mailto:teb...@ufl.edu>> Cc: Neha gupta mailto:neha.bologn...@gmail.com>>; r-help mailing list mailto:r-help@r-project.org>> Subject: Re: [R] Error with text analysis data [External Email] Constant columns can be the model when you do some subsetting or are exploring a new dataset. My objection is that constant columns of numbers and logicals are fine but those of characters and factors are not. -Bill On Wed, Apr 13, 2022 at 9:15 AM Ebert,Timothy Aaron mailto:teb...@ufl.edu>> wrote: What is the goal of having a constant in the model? To me that seems pointless. Also there is no variability in sexCode regardless of whether you call it integer or factor. So the model y ~ sexCode is just a strange way to look at the variability in y and it would be better to do something like summarize(y) or mean(y) if that was the goal. Tim -Original Message- From: R-help mailto:r-help-boun...@r-project.org>> On Behalf Of Bill Dunlap Sent: Wednesday, April 13, 2022 9:56 AM To: Neha gupta mailto:neha.bologn...@gmail.com>> Cc: r-help mailing list mailto:r-help@r-project.org>> Subject: Re: [R] Error with text analysis data [External Email] This sounds like what I think is a bug in stats::model.matrix.default(): a numeric column with all identical entries is fine but a constant character or factor column is not. > d <- data.frame(y=1:5, sex=rep("Female",5)) d$sexFactor <- > factor(d$sex, levels=c("Male","Female")) d$sexCode <- > as.integer(d$sexFactor) d ysex sexFactor sexCode 1 1 FemaleFemale 2 2 2 FemaleFemale 2 3 3 FemaleFemale 2 4 4 FemaleFemale 2 5 5 FemaleFemale 2 > lm(y~sex, data=d) Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels > lm(y~sexFactor, data=d) Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels > lm(y~sexCode, data=d) Call: lm(formula = y ~ sexCode, data = d) Coefficients: (Intercept) sexCode 3 NA Calling traceback() after the error would clarify this. -Bill On Tue, Apr 12, 2022 at 3:12 PM Neha
Re: [R] Symbol/String comparison in R
https://en.wikipedia.org/wiki/ASCII There is a table towards the end of the document. Some of the other pieces may be of interest and/or relevant. Tim -Original Message- From: R-help On Behalf Of Kristjan Kure Sent: Wednesday, April 13, 2022 10:06 AM To: r-help@r-project.org Subject: [R] Symbol/String comparison in R [External Email] Hi! Sorry, I am a beginner in R. I was not able to find answers to my questions (tried Google, Stack Overflow, etc). Please correct me if anything is wrong here. When comparing symbols/strings in R - raw numeric values are compared symbol by symbol starting from left? If raw numeric values are not used is there an ASCII / Unicode table where symbols have values/ranking/order and R compares those values? *2) Comparing symbols* Letter "a" raw value is 61, letter "b" raw value is 62? Is this correct? # Raw value for "a" = 61 a_raw <- charToRaw("a") a_raw # Raw value for "b" = 62 b_raw <- charToRaw("b") b_raw # equals TRUE "a" < "b" Ok, so 61 is less than 62 so it's TRUE. Is this correct? *3) Comparing strings #1* "1040" <= "12000" raw_1040 <- charToRaw("1040") raw_1040 #31 *30* (comparison happens with the second symbol) 34 30 raw_12000 <- charToRaw("12000") raw_12000 #31 *32* (comparison happens with the second symbol) 30 30 30 The symbol in the second position is 30 and it's less than 32. Equals to true. Is this correct? *4) Comparing strings #2* "1040" <= "1" raw_1040 <- charToRaw("1040") raw_1040 #31 30 *34* (comparison happens with third symbol) 30 raw_1 <- charToRaw("1") raw_1 #31 30 *30* (comparison happens with third symbol) 30 30 The symbol in the third position is 34 is greater than 30. Equals to false. Is this correct? *5) Problem - Why does this equal FALSE?* *"A" < "a"* 41 < 61 # FALSE? # Raw value for "A" = 41 A_raw <- charToRaw("A") A_raw # Raw value for "a" = 61 a_raw <- charToRaw("a") a_raw Why is capitalized "A" not less than lowercase "a"? Based on raw values it should be. What am I missing here? Thanks Kristjan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9E-P8HOWO0s4h1p__tW4o8QGtge3bJ9VUJEDH-e-U_8OKRu2p1zazebKjPltKrWM&s=rhYKCkMRBFMzOVf8rVaRiO1Puh-rTSWAS8P6hoSzdgc&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9E-P8HOWO0s4h1p__tW4o8QGtge3bJ9VUJEDH-e-U_8OKRu2p1zazebKjPltKrWM&s=fI_1ZAYJFp1nrJkOV4i4ueqf4o1MD1gKHzb6AyciJUc&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Code Execution taking forever
1) Does it run perfectly with num_tirals_6 <- 100 ? 2) Rework the code to remove as much as possible from loops. Renaming column names each time through the loop seems pointless. Is the nested for loops converting the dice roll to person name necessary within the while loop? 3) Stop all other apps on the computer. 4) Consider rewriting to take advantage of multiple cores in your system in parallel processing (this might or might not help much). 5) Rerun with num_trials_6 set to different values 10, 100, 1000, and 1. Linear regression with run time and trial size should let you estimate run time for 1 million. Tim -Original Message- From: R-help On Behalf Of Rui Barradas Sent: Sunday, April 24, 2022 5:44 AM To: Paul Bernal ; R Subject: Re: [R] R Code Execution taking forever [External Email] Hello, I'm having trouble running the code, where does function dice come from? CRAN package dice only has two functions, getEventProb getSumProbs not a function dice. Can you post a link to where the package/function can be found? Rui Barradas Às 02:00 de 24/04/2022, Paul Bernal escreveu: > Dear R friends, > > Hope you are doing great. The reason why I am contacting you all, is > because the code I am sharing with you takes forever. It started > running at > 2:00 AM today, and it's 7:52 PM and is still running (see code at the > end of this mail). > > I am using Rx64 4.1.2, and the code is being executed in RStudio. The > RStudio version I am currently using is Version 2022.02.0 Build 443 > "Prairie Trillium" Release (9f796939, 2022-02-16) for Windows. > > My PC specs: > Processor: Intel(R) Core(TM) i5-10310U CPU @ 1.70 GHz Installed RAM: > 16.0 GB (15.6 GB usable) System type: 64-bit operating system, > x64-based processor Local Disc(C:) Free Space: 274 GB > > I am wondering if there is/are a set of system variable(s) or > something I could do to improve the performance of the program. > > It is really odd this code has taken this much (and it is still running). > > Any help and/or guidance would be greatly appreciated. > > Best regards, > Paul > > > > > #performing 1,000,000 simulations 10 times > num_trials_6 = 100 > dice_rolls_6 = num_trials_6*12 > num_dice_6 = 1 > dice_sides_6 = 6 > > prob_frame_6 <- data.frame(matrix(ncol = 10, nrow = 1)) > > k <- 0 > while(k < 10){ >dice_simul_6 = data.frame(dice(rolls = dice_rolls_6, ndice = > num_dice_6, sides = dice_sides_6, plot.it = FALSE)) > >#constructing matrix containing results of all dice rolls by month >prob_matrix_6 <- data.frame(matrix(dice_simul_6[,1], ncol = 12, > byrow = > TRUE)) > >#naming each column by it's corresponding month name >colnames(prob_matrix_6) <- > c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","D > ec") > > >#assigning each person´s name depending on the number showed in the > dice once rolled >for (i in 1:nrow(prob_matrix_6)){ > for (j in 1:ncol(prob_matrix_6)){ >if (prob_matrix_6[i,j] == 1){ > prob_matrix_6[i,j] = "Alice" >} >if (prob_matrix_6[i,j] == 2){ > prob_matrix_6[i,j] = "Bob" >} >if (prob_matrix_6[i,j] == 3){ > prob_matrix_6[i,j] = "Charlie" >} >if (prob_matrix_6[i,j] == 4){ > prob_matrix_6[i,j] = "Don" >} >if (prob_matrix_6[i,j] == 5){ > prob_matrix_6[i,j] = "Ellen" >} >if (prob_matrix_6[i,j] == 6){ > prob_matrix_6[i,j] = "Fred" >} > > } >} > >#calculating column which will have a 1 if trial was successful > and a 0 otherwise >prob_matrix_6['success'] <- for (i in 1:nrow(prob_matrix_6)){ > if (("Alice" %in% prob_matrix_6[i,]) & ("Bob" %in% > prob_matrix_6[i,]) & ("Charlie" %in% prob_matrix_6[i,]) & ("Don" %in% > prob_matrix_6[i,]) & ("Ellen" %in% prob_matrix_6[i,]) & ("Fred" %in% > prob_matrix_6[i,])){ >prob_matrix_6[i,13] = 1 > }else{ >prob_matrix_6[i,13] = 0 > } >} > >#relabeling column v13 so that its new name is success >colnames(prob_matrix_6)[13] <- "success" > > >#calculating probability of success > >p6 = sum(prob_matrix_6$success)/nrow(prob_matrix_6) >prob_frame_6 <- cbind(prob_frame_6, p6) > >k = k + 1 > > } > > prob_frame_6 <- prob_frame_6[11:20] > colnames(prob_frame_6) <- > c("p1","p2","p3","p4","p5","p6","p7","p8","p9","p10") > average_prob_frame_6 <- rowMeans(prob_frame_6) trial_100_10_frame > <- cbind(prob_frame_6, average_prob_frame_6) > final_frame_6 <- trial_100_10_frame > colnames(final_frame_6) <- > c("p1","p2","p3","p4","p5","p6","p7","p8","p9","p10", > "avg_prob_frame_5") > > write.csv(final_frame_6, "OneMillion_Trials_Ten_Times_Results.csv") > print(final_frame_6) > print(paste("The average probability of success when doing 1,000,000 > trials > 10 times is:", average_prob_frame_6)) > > [[alternative HTML version deleted]] > > __
Re: [R] Confusing fori or ifelse result in matrix manipulation
A <- matrix(1:9,ncol=3) x <- c(0,1,0) M <- matrix(ncol=3,nrow=3) M<-A for(i in 1:3) { if(x[i]){ M[,i] <-0 } } } M The outcome you want is to set all of the middle column values to zero. So I used x as a logical in an if test and when true everything in that column is set to zero. Your approach also works but you must go through each element explicitly. A <- matrix(1:9,ncol=3) x <- c(0,1,0) M <- matrix(ncol=3,nrow=3) for(j in 1:3){ for(i in 1:3){ ifelse(x[i]==1, M[j,i]<-0, M[j,i]<-A[j,i]) } } M Tim -Original Message- From: R-help On Behalf Of Uwe Freier Sent: Sunday, April 24, 2022 11:06 AM To: r-help@r-project.org Subject: [R] Confusing fori or ifelse result in matrix manipulation [External Email] Hello, sorry for the newbie question but I can't find out where I'm wrong. A <- matrix(1:9,ncol=3) x <- c(0,1,0) M <- matrix(ncol=3,nrow=3) for(i in 1:3) { M[,i] <- ifelse(x[i] == 0, A[,i], 0) } expected: > M [,1] [,2] [,3] [1,]107 [2,]208 [3,]309 but the result is: > M [,1] [,2] [,3] [1,]107 [2,]107 [3,]107 If I do it "manually": > M[,1] <- A[,1] > M[,2] <- 0 > M[,3] <- A[,3] M is as expected, where is my misconception? Thanks for any hint and best regards, Uwe __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=eyJm06tVDfKvtMDgz6oIWM-WVdoW3Szzb5G6rq0cCO_cB6ljj2x80E4oRkt3Vgba&s=K2RWPvtxaxwigGGH2oOrg8qiDWC5KTu60b8Wjybwsg4&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=eyJm06tVDfKvtMDgz6oIWM-WVdoW3Szzb5G6rq0cCO_cB6ljj2x80E4oRkt3Vgba&s=L9VXAAYzIzrG2h17hBO-Qfg_EoS2mRQbjs3sRESp62Q&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert one rectangle into two rectangles
One thought was that these were farm fields in satellite images. I just had no clue why one would subdivide them. Maybe a training and testing portion of these polygons? At a few dozen meters the curvature might be negligible on a sufficiently large spheroid. If 45-55% split is approximate then go through the entire polygon area pixel by pixel and use rnorm() to generate a random value that puts the pixel in A if <0.4 and in B it >0.4. This has a better chance of working if the polygon is centered in the image so that curvature weighs equally. For this post my thought was that it was either homework because the question was so simple, or it was not providing enough information to answer the real question. The generalized question is to divide an n-sided polygon mapped to the surface of a spheroid with semi-axes A, B, C into 45-55% split into two parts 45 and 55%. If multiple polygons were spread over the image and the image curvature was large relative to the image size I don't see how you could recover polygons close to the edge of the spheroid unless you knew beforehand the curvature at that point. Possibly in the most general sense the problem is unsolvable because angles near the horizon will be nearly straight lines. Another part of asking questions is to provide enough detail that others may arrive at a creative answer. How about a mechanical solution. Print the polygon onto paper. Cut the polygon out. Cut the polygon in half. Weigh the halves to get actual split. Scan each piece in. You can print several copies and repeat until you get a 45-55 split as close as your equipment will measure. I have solved the problem as asked, though it did not involve R. I split the polygon problem solved or do I need to do something with the pieces? I had posted a simple R solution, but was then asked if the solution would work for a trapezoid with 21 bottom and 18 top. My solution was not generalizable but the question was for a rectangle (four 90 degree angles and parallel opposite sides) with sides 18 and 200 meters. You could draw polygons on a balloon using a sharpie marker (or similar). Photograph the balloon at different distances, and process the images. At least this way you have a system to test that part of the program. Correct pixel area for curvature, estimate area of polygon, and then figure out what it means to split the polygon. If the non-regular polygon is not evenly divisible 45-55 what happens to the remainder? Is this problem better handled using GIS techniques? Tim -Original Message- From: R-help On Behalf Of Avi Gross via R-help Sent: Wednesday, April 27, 2022 11:27 AM To: r-help@r-project.org Subject: Re: [R] Convert one rectangle into two rectangles [External Email] Just FYI, Jim, I was sent private mail by Javad that turns the problem aroundso not only are there no rectangles, but the problem is not 2-D. He is working with Spatial Polygons representing areas on the surface of a sphere (presumably Earth) and wants to subdivide them. This is a VERY different topic and there are packages and other references that might apply to his needs. I have no idea why the 60/40 split by area. Yes, as a very simplified idea, I understand why he proposed dividing a rectangle proportionately but realistically what he is working with is a bit more like a trapezoid which is also bent into a third dimension. So those of us wanting to help on the original problem can stop and, speaking for myself, I am going to approach many questions posed here more carefully to see if they are well-thought-out or are some kind of fishing expedition. And, since it has been made very clear multiple times that the scope of this forum is narrow and not meant to help with HW, and I have no idea how to verify it is not, ... -Original Message- From: Jim Lemon To: javad bayat ; r-help mailing list Sent: Wed, Apr 27, 2022 5:59 am Subject: Re: [R] Convert one rectangle into two rectangles Hi javad, Let's think about it a little. There are only two ways you can divide a rectangle into two other rectangles. The dividing line must be straight and parallel to one side or the other or you won't get two rectangles. If there are no other constraints, you can take your pick. A EB - | | | - CF D AE = CF = 0.45 * AB Alternatively the same equation can be used on the shorter sides. If you want to cut it across the length, you can do it with an abacus. While the IrregLong package can produce an abacus plot, I was unable to find an R-based abacus. For that I suggest: https://urldefense.proofpoint.com/v2/url?u=https-3A__toytheater.com_abacus_&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Tbmiv3MGuN8f7gYSHUOKBKkBAx5goWhCbuyiJMDVP2c3HV-_FfjZ8__7NHiO0K9V&s=Fz2RhXtevzROx
Re: [R] Is there a canonical way to pronounce CRAN?
It would be nice in some ways if everyone would pronounce the same word in the same way, but then we could not argue over the correct pronunciation of words like tomato or aluminium/aluminum. I think of cran as "Kran". While I had German in high school I didnn't remember the German word for crane, so I did not consciously make any connection. I thought more of words like crunch, crouch, or crayfish to help pronounce cran. "Sea-Ran" also makes some sense, but it makes me wonder if the tide is going out or coming back in. Possibly many think of this like C-Span (C-Span.org). That would make even more sense if we had C-Ran rather than CRAN. I'll just leave "Sea-Run" alone. Tim -Original Message- From: R-help On Behalf Of Kevin Thorpe Sent: Wednesday, May 4, 2022 10:38 AM To: Roland Rau Cc: R Help Mailing List Subject: Re: [R] Is there a canonical way to pronounce CRAN? [External Email] Interesting. I have always pronounced it as See-ran. This probably stems from my exposure to other archive like CPAN (perl) and CTAN (TeX) that I have been exposed to. Obviously the latter two acronyms are unpronounceable as words so I generalized the approach to CRAN. Kevin > On May 4, 2022, at 7:20 AM, Roland Rau via R-help > wrote: > > Dear all, > > I talked with colleagues this morning and we realized that some people (=me) > pronounce CRAN like the German word "Kran" (probably pronounced like "cruhn" > in English -- if it was a word). > My colleague pronounced it as "Sea-Ran" or "Sea-Run". The colleague was a > student and has worked at the same institution as an R Core Developer and > heard it from him personally. > > So now I am puzzled. Have I been wrong about 43% of my life? ;-) > > Honestly: Is there a unique way how the core developers prounounce CRAN? > > Not an urgent question at all but maybe interesting to many of us. > > Thanks, > Roland > > -- > This mail has been sent through the MPI for Demographic > ...{{dropped:2}} > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail > man_listinfo_r-2Dhelp&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs > Rzsn7AkP-g&m=anAxC6nLaRTnsHblKIhnuFXziz9rGvolLryp1bna2ydLx6PBZI_yOmVZr > RnwMSEM&s=RO9LQR9lV58rbI7LC8B_rUTsesnj1D_xV8ovrzBn4jI&e= > PLEASE do read the posting guide > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or > g_posting-2Dguide.html&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA > sRzsn7AkP-g&m=anAxC6nLaRTnsHblKIhnuFXziz9rGvolLryp1bna2ydLx6PBZI_yOmVZ > rRnwMSEM&s=Wlt5ym2p1MJIbbvh0nwyvPunaZJTWyZBx0o6qhe6zQo&e= > and provide commented, minimal, self-contained, reproducible code. -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael’s Hospital Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.tho...@utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=anAxC6nLaRTnsHblKIhnuFXziz9rGvolLryp1bna2ydLx6PBZI_yOmVZrRnwMSEM&s=RO9LQR9lV58rbI7LC8B_rUTsesnj1D_xV8ovrzBn4jI&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=anAxC6nLaRTnsHblKIhnuFXziz9rGvolLryp1bna2ydLx6PBZI_yOmVZrRnwMSEM&s=Wlt5ym2p1MJIbbvh0nwyvPunaZJTWyZBx0o6qhe6zQo&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and .asc file extension
A google search returned a stack overflow page that might help. stackoverflow.com/questions/20177581/reading-an-asc-file-into-r (add the https part to get a functional link.) I would also try looking at the file using something like notebook, or any program that is a plain text editor. That way I can see exactly what the file contains. Tim -Original Message- From: R-help On Behalf Of Thomas Subia via R-help Sent: Friday, May 20, 2022 9:27 AM To: r-help@r-project.org Subject: [R] R and .asc file extension [External Email] Colleagues, I have data which has a .asc file extension. Can R read that file extension? All the best, Thomas Subia Statistician __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=OuZijT4yJ8dfQLVPVj4YWSCPkpUW4wkGCSHim27tbH9qYMjXbuMSxybv7i0cvWHl&s=7eUP-gSu0ZAEL_iEdxyJwCEEE7PQF15Fplf_dGvzMfM&e= PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=OuZijT4yJ8dfQLVPVj4YWSCPkpUW4wkGCSHim27tbH9qYMjXbuMSxybv7i0cvWHl&s=0Y81eoA64cMK6zoqQGZoDecTemarTEV4UxqvF-FSuMA&e= and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Suggestions as to how to proceed would be appreciated...............
Would lm, nls, or nlme work for what you need? Tim -Original Message- From: R-help On Behalf Of Bernard Comcast Sent: Sunday, May 22, 2022 3:01 PM To: Bert Gunter Cc: R-help@r-project.org Subject: Re: [R] Suggestions as to how to proceed would be appreciated... [External Email] Its simply a query to know what tools/packages R has for correlating single values with multivalued vectors. If that is outside the scope of the PG then so be it. Bernard Sent from my iPhone so please excuse the spelling!" > On May 22, 2022, at 1:52 PM, Bert Gunter wrote: > > > Please read the posting guide(PG) inked below. Your query sounds more like a > project that requires a paid consultant; if so, this is way beyond the scope > of this list as described in the PG. So don't be too surprised if you don't > get a useful response, which this isn't either of course. > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > >> On Sun, May 22, 2022 at 10:40 AM Bernard McGarvey >> wrote: >> I work in aspects of Cold Chain transportation in the pharmaceutical >> industry. These shippers are used to transport temperature sensitive >> products by surrounding the product load box with insulating materials of >> various sorts. The product temperature has lower and upper allowed limits so >> that when the product temperature hits one of these limits, the shipper >> fails and this failure time is teh shipper duration. If the shipper is >> exposed to very low or very high ambient temperatures during a shipment then >> we expect the duration of the shipper to be low. >> >> The particular problem I am currently undertaking is to create a fast way to >> predict the duration of a shipping container when it is exposed to a given >> ambient temperature. >> >> Currently we have the ability to predict such durations using a calibrated >> 3D model (typically a finite element or finite volume transient >> representation of the heat transfer equations). These models can predict the >> temperature of the pharmaceutical product within the shipper over time as it >> is exposed to an external ambient temperature profile. . >> >> The problem with the 3D model is that it takes significant CPU time and the >> software is specialized. What I would like to do is to be able to enter the >> ambient profile into a spreadsheet and then be able to predict the expected >> duration of the shipper using a simple calculation that can be implemented >> in the spreadsheet environment. The idea I had was as follows: >> >> 1. Create a selection of ambient temperature profiles covering a wide range >> of ambient behavior. Ensure the profiles are long enough so that the shipper >> is sure to fail at some time during the ambient profile. >> >> 2. Use the 3D model to predict the shipper duration for the selection of >> ambient temperature profiles in (1). Each ambient temperature will have its >> own duration. >> >> 3. Since only the ambient temperatures up to the duration time are relevant, >> truncate each ambient profile for times greater than the duration. >> >> 4. Step (3) means that the ambient temperature profiles will have different >> lengths corresponding to the different durations. >> >> 5. Use the truncated ambient profiles and their corresponding durations to >> build some type of empirical model relating the duration to the >> corresponding ambient profile. >> >> Some other notes: >> >> a. We know from our understanding of how the shippers are constructed and >> the laws of heat transfer that some sections of the ambient profile will >> have more of an impact on determining the duration that other sections. >> b. Just correlating the duration with the average temperature of the profile >> can predict the duration for that profile to within 10-15%. We are looking >> for the ability to get within 2% of the shipper duration predicted by the 3D >> model. >> >> What I am looking for is suggestions as to how to approach step (5) with >> tools/packages available in R. >> >> Thanks in advance >> >> Bernard McGarvey, Ph.D. >> >> Technical Advisor >> Parenteral Supply Chain LLC >> >> bernard.first.princip...@gmail.com >> mailto:bernard.first.princip...@gmail.com >> >> (317) 627-4025 >> >> >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mai >> lman_listinfo_r-2Dhelp&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVe >> AsRzsn7AkP-g&m=Pp0V5t70tdzhilCe5gEo6fR1inb2-RkIPZG4jtPvyUSKaWIjhiPEuI >> 3ROOS_GDYh&s=O1gCEtqvVraPdsMGEWP_LcLa8IFCzatSw16BRhUsEF8&e= >> PLEASE do read the posting guide >> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.o >> rg_posting-2Dguide.h