from:"Aaron"

[R] New User Having Trouble Loading R Commander on Mac OS Yosemite

2015-04-30 Thread Aaron

I keep getting the same error message when trying to install R Commander.
My operating system is Mac OS Yosemite 10.10
I have installed R 3.2, Rstudio, XQuartz (X11), and tcltk-8.x.x-x11.dmg.
But I keep getting the following error:Loading required package: splinesLoading 
required package: RcmdrMiscLoading required package: carError in 
loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :   there is 
no package called ‘SparseM’Error: package ‘car’ could not be loaded 
  



--
View this message in context: 
http://r.789695.n4.nabble.com/New-User-Having-Trouble-Loading-R-Commander-on-Mac-OS-Yosemite-tp470.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with big data and parallel computing: 500, 000 x 4 linear models

2016-08-08 Thread Aaron Mackey

Don't run 500K separate models. Use the limma package to fit one model that
can learn the variance parameters jointly. Run it on your laptop. And don't
use %methylation as your Y variable, use logit(percent), i.e. the Beta
value.

-Aaron

On Mon, Aug 8, 2016 at 2:49 PM, Ellis, Alicia M 
wrote:

> I have a large dataset with ~500,000 columns and 1264 rows.  Each column
> represents the percent methylation at a given location in the genome.  I
> need to run 500,000 linear models for each of 4 predictors of interest in
> the form of:
> Methylation.stie1 ~ predictor1 + covariate1+ covariate2 + ... covariate9
> ...and save only the pvalue for the predictor
>
> The original methylation data file had methylation sites as row labels and
> the individuals as columns so I read the data in chunks and transposed it
> so I now have 5 csv files (chunks) with columns representing methylation
> sites and rows as individuals.
>
> I was able to get results for all of the regressions by running each chunk
> of methylation data separately on our supercomputer using the code below.
> However, I'm going to have to do this again for another project and I would
> really like to accomplish two things to make the whole process more
> computationally efficient:
>
>
> 1)  Work with data.tables instead of data.frames (reading and
> manipulating will be much easier and faster)
>
> 2)  Do the work in parallel using say 12 cores at once and having the
> program divide the work up on the cores rather than me having to split the
> data and run 5 separate jobs on the supercomputer.
>
> I have some basic knowledge of the data.table package but I wasn't able to
> modify the foreach code below to get it to work and the code using
> data.frames didn't seem to be using all 12 cores that I created in the
> cluster.
>
> Can anyone suggest some modifications to the foreach code below that will
> allow me to do this in parallel with datatables and not have to do it in
> chunks?
>
>
> # Set up cluster
> clus = makeCluster(12, type = "SOCK")
> registerDoSNOW(clus)
> getDoParWorkers()
> getDoParName()
>
>
> ### Following code needs to be modified to run the full
> dataset (batch1-batch5) in parallel
> ### Currently I read in the following chunks, and run each predictor
> separately for each chunk of data
>
> ### Methylation data in batches
> batch1=read.csv("/home/alicia.m.ellis/batch1.csv")  ## #Each batch
> has about 100,000 columns and 1264 rows; want to alter this to:
> ## batch1=fread(file= )
> batch2=read.csv(file="/home/alicia.m.ellis/batch2.csv")
> batch3=read.csv(file="/home/alicia.m.ellis/batch3.csv")
> batch4=read.csv(file="/home/alicia.m.ellis/batch4.csv")
> batch5=read.csv(file="/home/alicia.m.ellis/batch5.csv")
>
> predictors  ## this is a data.frame with 4 columns and 1264 rows
>
> covariates ## this is a data.frame with 9 columns and 1264 rows
>
> fits <- as.data.table(batch1)[, list(MyFits = lapply(1:ncol(batch1),
> function(x) summary(lm(batch1[, x] ~ predictors[,1] +
>
> covariates[,1]+
>
> covariates[,2]+
>
> covariates[,3]+
>
> covariates[,4]+
>
> covariates[,5]+
>
> covariates[,6]+
>
> covariates[,7]+
>
> covariates[,8]+
>
> covariates[,9]
> )
> )$coefficients[2,4]
> )
> )
> ]
>
>
> ##  This is what I was trying but
> wasn't having much luck
>  I'm having trouble getting the data merged as a single data.frame and
> the code below doesn't seem to be dividing the work among the 12 cores in
> the cluster
>
> all. fits = foreach (j=1:ncol(predictors), i=1:ncol(meth1),
> combine='rbind', .inorder=TRUE) %dopar% {
>
>   model = lm(meth[, i] ~ predictors[,j] +
>covariates[,1]+
>covariates[,2]+
>covariates[,3]+
>covariates[,4]+
>covariates[,5]+
>covariates[,6]+
>covariates[,7]+
>covariates[,8]+
>covariates[,9])
>   summary(model)$coefficients[2,4]
> }
>
>
> Alicia Ellis, Ph.D
> Biostatistician
> Pathology & Laboratory Medicine
> Colchester Research Facility
> 360 South Park Drive, Room 209C
> Colchester, VT  05446
> 802-656-9840
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.or

[R] Unexpected errors in sparse Matrix arithmetic with zero-length dimensions

2019-02-10 Thread Aaron Lun

Dear list,

The Matrix package exhibits some unexpected behaviour in its arithmetic
methods for the edge case of a sparse matrix with a dimension of zero
length. The example below is the most illustrative, where changing the
contents of the vector causes the subtraction to fail for a sparse
matrix with no columns: 

> library(Matrix)
> x <- rsparsematrix(10, 0, density=0.1)
> 
> x - rep(1, nrow(x)) # OK 
> x - rep(0, nrow(x)) # fails
Error in .Ops.recycle.ind(e1, len = l2) : 
  vector too long in Matrix - vector operation

This is presumably because Matrix recognizes that subtraction of zero
preserves sparsity and thus uses a different method in the second case.
However, I would have expected subtraction of a zero vector to work if
subtraction of a general vector is permissible. This is accompanied by
a host of related errors for sparsity-preserving arithmetic:

> x / 1 # OK
> x / rep(1, nrow(x)) # fails 
Error in .Ops.recycle.ind(e1, len = l2) : 
  vector too long in Matrix - vector operation
> 
> x * 1 # OK
> x * rep(1, nrow(x)) # fails
Error in .Ops.recycle.ind(e1, len = l2) : 
  vector too long in Matrix - vector operation
  
A different error is raised for a sparse matrix with no rows:

> y <- rsparsematrix(0, 10, density=0.1)
> 
> y - numeric(1) # OK
> y - numeric(0) # fails
Error in y - numeric(0) :  - numeric(0) is undefined

I would have expected to just get 'y' back, given that the same code
works fine for other Matrix classes:

> z <- as(y, "dgeMatrix")
> z - numeric(0) # OK

Correct behaviour of zero-dimension sparse matrices is practically
important to me; I develop a number of packages that rely on Matrix
classes, and in those packages, I do a lot of unit testing with zero-
dimension inputs. This ensures that my functions return sensible
results or fail gracefully in edge cases that might be encountered by
users. The current behaviour of sparse Matrix arithmetic causes my unit
tests to fail for no (obvious) good reason.

Best,

Aaron Lun

Research Associate
CRUK Cambridge Institute
University of Cambridge

> sessionInfo()
R Under development (unstable) (2019-01-14 r75992)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /home/cri.camres.org/lun01/Software/R/trunk/lib/libRblas.so
LAPACK: /home/cri.camres.org/lun01/Software/R/trunk/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices
utils datasets  methods   base 

other attached packages:
[1] Matrix_1.2-15

loaded via a namespace (and not attached):
[1] compiler_3.6.0  grid_3.6.0  lattice_0.20-38

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lmer syntax, matrix of (grouped) covariates?

2008-08-18 Thread Aaron Mackey

I have a fairly large model:

> length(Y)
[1] 3051
> dim(covariates)
[1] 3051  211

All of these 211 covariates need to be nested hierarchically within a
grouping "class", of which there are 8.  I have an accessory vector, "
cov2class" that specifies the mapping between covariates and the 8 classes.

Now, I understand I can break all this information up into individual
vectors (cov1, cov2, ..., cov211, class1, class2, ..., class8), and do
something like this:

model <- lmer(Y ~ 1 + cov1 + cov2 + ... + cov211 +
  (cov1 + cov2 + ... | class1) +
  (...) +
  (... + cov210 + cov211 | class8)

But I'd like keep things syntactically simpler, and use the covariates
and cov2class
variables directly.  I haven't been able to find the right syntactic sugar
to get this done.

Thanks for any help,

-Aaron

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] hex2RGB back to hex not the same?

2008-08-28 Thread Aaron Mackey

Witness this oddity (to me):

> rainbow_hcl(10)[1]
[1] "#E18E9E"
> d <- attributes(hex2RGB(rainbow_hcl(10)))$coords[1,]
> rgb(d[1], d[2], d[3])
[1] "#C54D5F"

What happened?  FYI, this came up as I'm trying to reuse the RGB values I
get from rainbow_hcl in a call to rgb() where I can also set alpha
transparency levels ...

-Aaron

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] hmm.discnp or other?

2009-08-12 Thread Aaron Mackey

(I think) I'd like to use the hmm.discnp package for a simple discrete,
two-state HMM, but my training data is irregularly shaped (i.e. the
observation chains are of varying length).  Additionally, I do not see how
to label the state of the observations given to the hmm() function.
Ultimately, I'd like to 1) train the hmm on labeled data, 2) use viterbi()
to calculate optimal labeling of unlabeled observations.

More concretely, I have labeled data that looks something like:

11212321221223121221112233222122112
ABA

21221223121221112233222122112
 ABAAA

 3121221112233222122112
  BB

from which I'd like to build the two hidden state (A and B) hmm that emits
observed 1, 2, or 3 at probabilities dictated by the hidden state, with
transition probabilities between the two states.  Given the trained HMM, I
then wish to label new sequences via viterbi().

Am I missing the purpose of this package?  I also read through the msm
package docs, but my data doesn't really have a time coordinate on which the
data should be "aligned".

Thanks for any pointers,

-Aaron



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] importing S4 methods using a namespace

2010-02-23 Thread Aaron Rendahl

I want to call summary on a mer object (from lme4) within my package
but I can't seem to get the namespace to import the necessary method.

I've simplified my package to this one function:
---
ss <- function(m) {
  summary(m)
}
---

And my namespace file looks like this, where I've attempted to follow
the instructions in "Writing R Extensions"
http://cran.r-project.org/doc/manuals/R-exts.html#Name-spaces-with-S4-classes-and-methods
---
import(lme4)
importMethodsFrom(lme4, "summary")
export("ss")
---

But when I call my new function, I get the summary.default method
instead of the mer method.

> m <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
> ss(m)
Length  Class   Mode
 1mer S4


Thanks,


-- 
Aaron Rendahl, Ph.D.
Statistical Consulting Manager
School of Statistics, University of Minnesota

NEW OFFICE (as of June 2009):
48C McNeal Hall, St. Paul Campus
612-625-1062
www.stat.umn.edu/consulting

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] importing S4 methods using a namespace

2010-02-24 Thread Aaron Rendahl

Thanks very much!  Importing from Matrix as you suggest fixes it.

-- 
Aaron Rendahl, Ph.D.
Statistical Consulting Manager
School of Statistics, University of Minnesota

NEW OFFICE (as of June 2009):
48C McNeal Hall, St. Paul Campus
612-625-1062
www.stat.umn.edu/consulting

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to read a list into R??

2009-06-30 Thread Aaron McDaid

Hi,
You should not use the 'sink' function to save these data files. If
you want a readable format, you should look at 'dump' instead. If you
simply want to save your data structures then 'save' might be best.

'sink' is not appropriate for data saving. It's simply a convenient
way to log what you see in the terminal.

Aaron

On Mon, Jun 29, 2009 at 23:58, Li,Hua wrote:
> Dear R helpers:
>      I have tried many times to find some way to read a list into R. But I 
> faid. Here is an example:
>      I have a file 'List.txt' which includes data as follows:
> [[1]]
>  [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0
>  [19] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
>
> [[2]]
>  [1] 0.000 0.500 0.000 0.000 0.500 0.000 0.000
>  [8] 0.000 0.000 0.000
>
> [[3]]
>  [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
>  [19] 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0
>
> 'List.txt' was output by 'sink' from R.
> I try to read 'List.txt' into R.
> First I tried 'dget', I got
>> dget('Vlist300.txt')
> Error in parse(file = file) : Vlist300.txt: unexpected '[[' at
> 1: [[
>
> Then I tried 'scan',
>>scan('List.txt', what='list')
> Read 86 items
>  [1] "[[1]]"     "[1]"       "0.0"       "0.0"       "0.0"       "0.0"
>  [7] "0.0"       "0.0"       "0.0"       "0.0"       "0.0"       "0.0"
> [13] "0.0"       "0.0"       "0.0"       "0.0"       "0.5"       "0.0"
> [19] "0.0"       "0.0"       "[19]"      "0.0"       "0.0"       "0.0"
> [25] "0.0"       "0.0"       "0.0"       "0.0"       "0.0"       "0.0"
> [31] "0.0"       "0.0"       "0.0"       "0.0"       "0.0"       "0.0"
> [37] "0.0"       "0.0"       "[[2]]"     "[1]"       "0.000" "0.500"
> [43] "0.000" "0.000" "0.500" "0.000" "0.000" "[8]"
> [49] "0.000" "0.000" "0.000" "[[3]]"     "[1]"       "0.0"
> [55] "0.0"       "0.0"       "0.0"       "0.0"       "0.0"       "0.0"
> [61] "0.0"       "0.5"       "0.0"       "0.0"       "0.0"       "0.0"
> [67] "0.0"       "0.0"       "0.0"       "0.0"       "0.0"       "[19]"
> [73] "0.0"       "0.0"       "0.0"       "0.0"       "0.5"       "0.0"
> [79] "0.0"       "0.5"       "0.0"       "0.5"       "0.0"       "0.0"
> [85] "0.0"       "0.0"
>
> Unfortunately I can't find any function to read 'List.txt' into R and to give 
> me the right format as in List.txt. Do you know if there's a function that 
> can read 'List.txt' into R and keep the format as follows?
>
> [[1]]
>  [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0
>  [19] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
>
> [[2]]
>  [1] 0.000 0.500 0.000 0.000 0.500 0.000 0.000
>  [8] 0.000 0.000 0.000
>
> [[3]]
>  [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
>  [19] 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.5 0.0 0.5 0.0 0.0 0.0 0.0
>
>  I appreciate any help!!
> Best,
> Hua
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Specify CRAN repository from command line

2009-07-26 Thread Aaron Hicks

Hi,

It feels like I should be able to do something like:

R CMD INSTALL lib='/usr/lib64/R/library' repos='http://proxy.url/cran' package

We have a bunch of servers (compute nodes in a Rocks cluster) in an isolated 
subnet, there is a basic pass-through proxy set up on the firewall (the head 
node) which just passes HTTP requests through to our nearest CRAN mirror.

when using install. packages it's easy to make R install from the repository 
with the repos='address' option, but I can't figure out how do this from the 
command line.

Is there a command line option for this? Currently I'm doing it using an R 
script, but that's causing issues because it's not 'visible' to the installer.

This would greatly streamline R installation with a standard package set.

Regards,

Aaron Hicks

Please consider the environment before printing this email
Warning:  This electronic message together with any attachments is 
confidential. If you receive it in error: (i) you must not read, use, disclose, 
copy or retain it; (ii) please contact the sender immediately by reply email 
and then delete the emails.
The views expressed in this email may not be those of Landcare Research New 
Zealand Limited. http://www.landcareresearch.co.nz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error using getBM() to query BioMart archives

2009-06-16 Thread Aaron Wolen

I'm trying to identify the positions of all genes within a specific  
chromosomal region using biomart. When using the current biomart  
database I'm able to do this without issue. However, I need to use  
build 36 of the mouse genome which was last included in ensembl mart  
46. I selected this mart and the mouse dataset as follows:


mart<-useMart(biomart="ensembl_mart_46", host="www.biomart.org",  
path="/biomart/martservice", port=80, archive=TRUE)


mart<-useDataset("mmusculus_gene_ensembl", mart=mart)

I'm able to list the available attributes and filters just fine, but  
when I attempt to actually retrieve data using getBM() I receive the  
following error:


> genes<-getBM(attributes=c("ensembl_gene_id", "external_gene_id",  
"description", "chromosome_name", "start_position", "transcript_start"),

+ filters=c("chromosome_name","start","end"),
+ values=list(12,4000,7000),
+ mart=mart)
Error in listFilters(mart, what = "type") :
  The function argument 'what' contains an invalid value: type
Valid are: name, description, options, fullDescription

The same error is returned if I check to see what value type is  
required for a particular filter:


> filterType("chromosome_name", mart=mart)
Error in listFilters(mart, what = "type") :
  The function argument 'what' contains an invalid value: type
Valid are: name, description, options, fullDescription

I'd really appreciate some help with this issue.

Cheers,
Aaron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] deleting/removing previous warning message in loop

2009-03-27 Thread aaron wells


Hello R Users,

 

I am having difficulty deleting the last warning message in a loop so that the 
only warning that is produced is that from the most recent line of code.  I 
have tried options(warn=1), rm(last.warning), and resetting the last.warning 
using something like: 

 

> warning("Resetting warning message")

 

This problem has been addressed in a previous listserve string, however I do 
not follow the advice given.  See the below web link.  Any help would be 
greatly appreciated.

 

Thanks!

  Aaron Wells

 

https://stat.ethz.ch/pipermail/r-help/2008-October/176765.html

 

A general example is first, followed by an example with the loop.

 

-

Example 1:



> demo.glm<-glm(test.data[,1]~c(1:38)+I(c(1:38)^2),family=binomial) ### 
> Generalized linear model run on the first column of my example data


> warnings()   ### no warnings reported
NULL


> demo.glm<-glm(test.data[,9]~c(1:38)+I(c(1:38)^2),family=binomial)   ### 
> Generalized linear model run on the 9th column of my example data
Warning messages:
1: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = 
etastart,  :
  algorithm did not converge
2: In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = 
etastart,  :
  fitted probabilities numerically 0 or 1 occurred


> warnings()### the model with column 9 as data produces warnings

Warning messages:
1: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
  algorithm did not converge
2: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
  fitted probabilities numerically 0 or 1 occurred


> demo.glm<-glm(test.data[,1]~c(1:38)+I(c(1:38)^2),family=binomial)  ### Re-run 
> the model with column 1 as data


> warnings()   ### reports the same warnings from the column 9 model, ideally 
> it would report the actual warning message for the column 1 model ("NULL") as 
> above
Warning messages:
1: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
  algorithm did not converge
2: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
  fitted probabilities numerically 0 or 1 occurred


--

Example 2: Loop



###In the below example I have reset warnings() before each iteration by using 
warning("Resetting warning message").  I would like the warnings to somehow be

consolidated into a list that I could later examine to determine which model 
iterations ran with and without warnings.  The below code doesn't work because 
the functions are

being run in the loop environment, and not the base environment.  



> test.warn<-rep(0,ncol(test.data));test.warn<-as.list(test.warn)
> 
> 
> for (i in 1:ncol(test.data)) {warn.reset<-warning("Resetting warning message")
+ demo.glm<-glm(test.data[,i]~c(1:38)+I(c(1:38)^2),family=binomial)
+ warn.new<-warnings()
+ cbind.warn<-cbind(warn.reset,warn.new)
+ test.warn[[i]]<-cbind.warn
+ test.warn
+ }
There were 38 warnings (use warnings() to see them)


> test.warn
[[1]]
  warn.reset  warn.new
Resetting warning message "Resetting warning message" NULL

[[2]]
  warn.reset  warn.new
Resetting warning message "Resetting warning message" NULL

.

.

.









Aaron F. Wells, PhD
Senior Scientist
ABR, Inc.
2842 Goldstream Road
Fairbanks, AK 99709









_
[[elided Hotmail spam]]
plorer 8.

[[elided Hotmail spam]]
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] deleting/removing previous warning message in loop

2009-03-27 Thread aaron wells


William, The function keepWarnings that you wrote did the trick. Thanks for the 
help!

 

  Aaron
 
> Subject: Re: [R] deleting/removing previous warning message in loop
> Date: Fri, 27 Mar 2009 13:33:51 -0700
> From: wdun...@tibco.com
> To: awell...@hotmail.com
> 
> You try a using function like the following (based
> on suppressWarnings):
> keepWarnings <- function(expr) {
> localWarnings <- list()
> value <- withCallingHandlers(expr,
> warning = function(w) {
> localWarnings[[length(localWarnings)+1]] <<- w
> invokeRestart("muffleWarning")
> })
> list(value=value, warnings=localWarnings)
> }
> It returns a 2-element list, the first being the value
> of the expression given to it and the second being a
> list of all the warnings. Your code can look through
> the list of warnings and decide which to omit. E.g.,
> 
> > d<-data.frame(x=1:10, y=rep(c(FALSE,TRUE),c(4,6)))
> > z <- keepWarnings(glm(y~x, data=d, family=binomial))
> > z$value
> 
> Call: glm(formula = y ~ x, family = binomial, data = d)
> 
> Coefficients:
> (Intercept) x
> -200.37 44.52
> 
> Degrees of Freedom: 9 Total (i.e. Null); 8 Residual
> Null Deviance: 13.46
> Residual Deviance: 8.604e-10 AIC: 4
> > z$warnings
> [[1]]
>  start, etastart = etastart, mustart = mustart, offset = offset,
> family = family, control = control, intercept = attr(mt,
> "intercept") > 0): algorithm did not converge>
> 
> [[2]]
>  start, etastart = etastart, mustart = mustart, offset = offset,
> family = family, control = control, intercept = attr(mt,
> "intercept") > 0): fitted probabilities numerically 0 or 1 occurred>
> 
> > str(z$warnings[[1]])
> List of 2
> $ message: chr "algorithm did not converge"
> $ call : language glm.fit(x = X, y = Y, weights = weights, start =
> start, etastart = etastart, mustart = mustart, offset = offset,
> family = family, control = control, ...
> - attr(*, "class")= chr [1:3] "simpleWarning" "warning" "condition"
> > sapply(z$warnings, function(w)w$message)
> [1] "algorithm did not converge"
> [2] "fitted probabilities numerically 0 or 1 occurred"
> 
> You can filter out the ones you don't want to hear about
> and recall warning() with the interesting ones or present
> them in some other way.
> 
> 
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com 
> 
> ---
> I am having difficulty deleting the last warning message in a loop so
> that the only warning that is produced is that from the most recent line
> of code. I have tried options(warn=1), rm(last.warning), and resetting
> the last.warning using something like: ...

_
[[elided Hotmail spam]]
plorer 8.

[[elided Hotmail spam]]
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Public R servers?

2009-04-01 Thread Aaron Barzilai

Hello,

Earlier I posted a question about memory usage, and the community's input was 
very helpful.  However, I'm now extending my dataset (which I use when running 
a regression using lm).  As a result, I am continuing to run into problems with 
memory usage, and I believe I need to shift to implementing the analysis on a 
different system..  

I know that R supports R servers through Rserve. Are there any public servers 
where I could upload my datasets (either as a text file, or through a 
connection to a SQL server), execute the analysis, then download the results?  
I identifed Wessa.net  
(http://www.wessa.net/mrc.wasp?outtype=Browser%20Blue%20-%20Charts%20White), 
but it's not clear it will meet my needs.  Can anyone suggest any other 
resources?

Thanks in advance,
Aaron Barzilai



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] function output with for loop and if statement

2009-04-22 Thread aaron wells


Hello all, turns out i'm having a bad R week.  I am at my wits end with a 
function that I am trying to write.  When I run the lines of code outside of a 
function, I get the desired output.  When I wrap the lines of code into a 
function it doesn't work as expected.  Not sure what is going on here.  I 
suspected that the syntax of the if statement with the for loop was the 
culprit, but when I only ran the part of the code with the for loop with no if 
statement I still had the above problem (works outside function, fails when 
wrapped into a function).  Below is the code and example output.  Please help!  

 

 Thanks, 

     Aaron

 

concov.test<-function(vegetation,specieslist)
{
 test.veg<-vegetation
 names(test.veg)<-specieslist$LifeForm
 tmp<-matrix(nrow=nrow(test.veg),ncol=length(unique(names(test.veg
 for (i in unique(names(test.veg))) 
{test.out<-apply(test.veg[,names(test.veg)==i],1,sum)
  tmp.match<-unique(names(test.veg))[unique(names(test.veg))==i]
  tmp.col<-match(tmp.match,unique(names(test.veg)))
  tmp[1:nrow(test.veg),tmp.col]<-test.out
  
tmp.out<-data.frame(row.names(test.veg),tmp,row.names=1);names(tmp.out)<-unique(names(test.veg))
  tmp.out
  tmp.out.sort<-tmp.out[,order(names(tmp.out))]
 }
 if(table(names(tmp.out))[i]==1)
  tmp.match2<-names(tmp.out.sort)[names(tmp.out.sort)==i]
  tmp.col2<-match(tmp.match2,names(tmp.out.sort))
  tmp.out.sort[1:nrow(test.veg),tmp.col2]<-test.veg[,names(test.veg)==i]
 return(tmp.out.sort)
 else return(tmp.out.sort)
}

 

Incorrect output when run as function-

 

> test<-concov.test(ansveg_all,spplist.class)
> test
Bare_Ground Deciduous_Shrubs Deciduous_Tree 
Evergreen_Shrubs Evergreen_Tree Forbs Grasses Lichens Mosses Sedges
ANSG_T01_01_2008  NA   NA NA   NA   
  NANA  NA  NA   95.0 NA
ANSG_T01_02_2008  NA   NA NA   NA   
  NANA  NA  NA   16.0 NA
ANSG_T01_03_2008  NA   NA NA   NA   
  NANA  NA  NA   71.0 NA
ANSG_T01_04_2008  NA   NA NA   NA   
  NANA  NA  NA   10.0 NA
ANSG_T02_01_2008  NA   NA NA   NA   
  NANA  NA  NA   92.2 NA
ANSG_T02_02_2008  NA   NA NA   NA   
  NANA  NA  NA   14.0 NA

.

.

.

 

Correct output when code is run outside of a function

 

> test.veg<-ansveg_all
> names(test.veg)<-spplist.class$LifeForm
> tmp<-matrix(nrow=nrow(test.veg),ncol=length(unique(names(test.veg
> 
> for (i in unique(names(test.veg))) 
> {test.out<-apply(test.veg[,names(test.veg)==i],1,sum)
+ tmp.match<-unique(names(test.veg))[unique(names(test.veg))==i]
+ tmp.col<-match(tmp.match,unique(names(test.veg)))
+ tmp[1:nrow(test.veg),tmp.col]<-test.out
+ 
tmp.out<-data.frame(row.names(test.veg),tmp,row.names=1);names(tmp.out)<-unique(names(test.veg))
+ tmp.out
+ tmp.out.sort<-tmp.out[,order(names(tmp.out))]
+ }
> if(table(names(tmp.out))[i]==1)
+ tmp.match2<-names(tmp.out.sort)[names(tmp.out.sort)==i]
> tmp.col2<-match(tmp.match2,names(tmp.out.sort))
> tmp.out.sort[1:nrow(test.veg),tmp.col2]<-test.veg[,names(test.veg)==i]
> return(tmp.out.sort)
> else return(tmp.out.sort)
> 
> 
> tmp.out.sort
 Bare_Ground Deciduous_Shrubs Deciduous_Tree Evergreen_Shrubs 
Evergreen_Tree Forbs Grasses Lichens Mosses Sedges
ANSG_T01_01_2008   0 57.01.0 40.0   
35.0  22.0 5.035.0   95.01.1
ANSG_T01_02_2008   0  0.00.0  0.0   
 0.0  34.0 0.0 0.0   16.0   24.0
ANSG_T01_03_2008   0 31.00.0 47.0   
 1.0   9.1 3.0 3.0   71.0   14.0
ANSG_T01_04_2008   0  0.00.0 12.0   
 0.0  13.2 0.0 0.0   10.0   16.0
ANSG_T02_01_2008   0 15.01.0 22.0   
36.0   9.2 2.038.0   92.20.1
ANSG_T02_02_2008   0 33.0   66.0 23.0   
 2.0   5.0 0.0 3.0   14.00.0
.

.

.


_
Rediscover Hotmail®: Get quick friend updates right in your inbox. 

Updates2_042009
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function output with for loop and if statement

2009-04-22 Thread aaron wells

Mark, thanks for the suggestions.  Unfortunately that did not fix the problem.  
I have experimented (with no success) with placing braces in different 
locations around the if/else statements and removing them all together.

 Thanks again,

 Aaron

Date: Wed, 22 Apr 2009 15:24:24 -0500
From: markle...@verizon.net
To: awell...@hotmail.com
Subject: Re: [R] function output with for loop and if statement

Hi Aaron: i just looked quickly because I have to go but try wrapping braces 
around the last if else like
below and see if that helps. if you have multiple statements in an if else, i 
think you need them
so I'm actually a little surpised that your function didn't give messages when 
you tried to run it ?

Also, braces in R can have some strange behavior ( because , if code is run at 
the prompt, and a statement can complete and there's no brace on that line then 
that statement is executed regardless f there's a brace later. that probably 
doesn't make much sense but it's kind of hard to explain  ) but I'm hoping that 
below fixes the problem. good luck.

function ( ) { # brace for beginning of function

.
.
.

if (table(names(tmp.out))[i]==1) {
   tmp.match2<-names(tmp.out.sort)[names(tmp.out.sort)==i]
   tmp.col2<-match(tmp.match2,names(tmp.out.sort))
   tmp.out.sort[1:nrow(test.veg),tmp.col2]<-test.veg[,names(test.veg)==i]
   return(tmp.out.sort) 
   } else {
   return(tmp.out.sort)
   }

} # brace for end of function 

On Apr 22, 2009, aaron wells  wrote: 

Hello all, turns out i'm having a bad R week. I am at my wits end with a 
function that I am trying to write. When I run the lines of code outside of a 
function, I get the desired output. When I wrap the lines of code into a 
function it doesn't work as expected. Not sure what is going on here. I 
suspected that the syntax of the if statement with the for loop was the 
culprit, but when I only ran the part of the code with the for loop with no if 
statement I still had the above problem (works outside function, fails when 
wrapped into a function). Below is the code and example output. Please help! 

Thanks, 

Aaron

concov.test<-function(vegetation,specieslist)
{
test.veg<-vegetation
names(test.veg)<-specieslist$LifeForm
tmp<-matrix(nrow=nrow(test.veg),ncol=length(unique(names(test.veg
for (i in unique(names(test.veg))) 
{test.out<-apply(test.veg[,names(test.veg)==i],1,sum)
tmp.match<-unique(names(test.veg))[unique(names(test.veg))==i]
tmp.col<-match(tmp.match,unique(names(test.veg)))
tmp[1:nrow(test.veg),tmp.col]<-test.out
tmp.out<-data.frame(row.names(test.veg),tmp,row.names=1);names(tmp.out)<-unique(names(test.veg))
tmp.out
tmp.out.sort<-tmp.out[,order(names(tmp.out))]
}
if(table(names(tmp.out))[i]==1)
tmp.match2<-names(tmp.out.sort)[names(tmp.out.sort)==i]
tmp.col2<-match(tmp.match2,names(tmp.out.sort))
tmp.out.sort[1:nrow(test.veg),tmp.col2]<-test.veg[,names(test.veg)==i]
return(tmp.out.sort)
else return(tmp.out.sort)
}

Incorrect output when run as function-

> test<-concov.test(ansveg_all,spplist.class)
> test
Bare_Ground Deciduous_Shrubs Deciduous_Tree Evergreen_Shrubs Evergreen_Tree 
Forbs Grasses Lichens Mosses Sedges
ANSG_T01_01_2008 NA NA NA NA NA NA NA NA 95.0 NA
ANSG_T01_02_2008 NA NA NA NA NA NA NA NA 16.0 NA
ANSG_T01_03_2008 NA NA NA NA NA NA NA NA 71.0 NA
ANSG_T01_04_2008 NA NA NA NA NA NA NA NA 10.0 NA
ANSG_T02_01_2008 NA NA NA NA NA NA NA NA 92.2 NA
ANSG_T02_02_2008 NA NA NA NA NA NA NA NA 14.0 NA

.

.

.

Correct output when code is run outside of a function

> test.veg<-ansveg_all
> names(test.veg)<-spplist.class$LifeForm
> tmp<-matrix(nrow=nrow(test.veg),ncol=length(unique(names(test.veg
> 
> for (i in unique(names(test.veg))) 
> {test.out<-apply(test.veg[,names(test.veg)==i],1,sum)
+ tmp.match<-unique(names(test.veg))[unique(names(test.veg))==i]
+ tmp.col<-match(tmp.match,unique(names(test.veg)))
+ tmp[1:nrow(test.veg),tmp.col]<-test.out
+ 
tmp.out<-data.frame(row.names(test.veg),tmp,row.names=1);names(tmp.out)<-unique(names(test.veg))
+ tmp.out
+ tmp.out.sort<-tmp.out[,order(names(tmp.out))]
+ }
> if(table(names(tmp.out))[i]==1)
+ tmp.match2<-names(tmp.out.sort)[names(tmp.out.sort)==i]
> tmp.col2<-match(tmp.match2,names(tmp.out.sort))
> tmp.out.sort[1:nrow(test.veg),tmp.col2]<-test.veg[,names(test.veg)==i]
> return(tmp.out.sort)
> else return(tmp.out.sort)
> 
> 
> tmp.out.sort
Bare_Ground Deciduous_Shrubs Deciduous_Tree Evergreen_Shrubs Evergreen_Tree 
Forbs Grasses Lichens Mosses Sedges
ANSG_T01_01_2008 0 57.0 1.0 40.0 35.0 22.0 5.0 35.0 95.0 1.1
ANSG_T01_02_2008 0 0.0 0.0 0.0 0.0 34.0 0.0 0.0 16.0 24.0
ANSG_T01_03_2008 0 31.0 0.0 47.0 1.0 9.1 3.0 3.0 71.0 14.0
ANSG_T01_04_2008 0 0.0 0.0 12.0 0.0 13.2 0.0 0.0 10.0 16.0
ANSG_T02_01_2008 0 15.0 1.0 22.0

Re: [R] function output with for loop and if statement

2009-04-23 Thread aaron wells


Gavin, thank you for the suggestions.  Unfortunately the function is still not 
working correctly.  Below are the dummy datasets that you requested. In the 
function dummy.vegdata = vegetation; and dummy.spplist = specieslist.  A little 
clarification on why the if statement is in the function.  I am using the apply 
function to sum columns of data that correspond with different lifeforms in 
order to derive a total cover value for each  lifeform in each plot (plt).  
When only one species occurs in a lifeform the apply function doesn't work 
since there is only one column of data. So, the if statement is an attempt to 
include the column of data from the dummy.vegdata into the output when there is 
only one species in a given lifeform.  Examples of this condition in the 
dummy.vegdata include water (Bare_Ground) and popbal (Deciduous_Tree).
 
 
 Aaron
 
 
> dummy.vegdata

plt water salarb salpul popbal leddec picgla picmar arcuva zygele epiang calpur 
poaarc pelaph flacuc tomnit hylspl carvag caraqu calcan carsax
T1  0  0   10.00.0 200.0 35  00.00.0  0 
   0.05.00.0 20 550.0  15.0  0
T2  0  00.00.0  30.0 62  00.00.0  0 
   0.08.02.0  5 650.0  03.0  0
T3  0  00.00.0  00.0  1 802.00.0  4 
   0.00.00.0  0  00.0  00.0  0
T4  0  10.0   30.0  00.1  0  00.00.0  0 
   0.00.00.0  0  00.0  00.0  0
T5  0  00.00.0  00.0  0  00.00.0  0 
   0.00.00.0  0  00.0  00.0  0
T6  0  00.00.0  0   20.0  0  00.01.0  0 
   0.02.00.1  0 200.0  00.0  0
T7  0  00.00.0  50.0 40  00.00.0  0 
   0.05.00.0  0 850.0  00.0  0
T8  0  03.00.0 400.0  5  00.00.0  0 
   0.00.00.0  1  00.0  20.0  0
T9  0  01.00.0 200.0 10  00.00.0  0 
   0.00.10.0  0  30.0  10.0  0
T10 0  00.0   65.0  02.0  0  00.00.0  0 
   0.00.00.0  0  00.0  05.0  0
T11 0  11.00.0 300.0 45  00.00.0  0 
   0.11.00.0  0 101.0  00.0  0
T12 0  57.00.1  03.0  5  00.00.0  0 
   0.00.00.0  0  51.0  02.0  5
T13 0  0   20.00.0  52.0  5  00.00.0  0 
   0.00.00.0 25  02.0  23.0 15
T14 0  0   35.00.0  00.0  0  00.00.0  0 
   0.00.00.0  0  00.1  03.0  2
T15 0  00.01.0  0   20.0  0  50.10.0  0 
   0.00.00.0  0 100.0  00.0  0
T16 0  00.00.0  1   80.0  0  00.00.1  0 
   0.01.00.0  0 500.0  00.1  0
T17   100  00.00.0  00.0  0  00.00.0  0 
   0.00.00.0  0  00.0  00.0  0
T18 0  00.00.0  0   15.0  0  50.02.0  0 
   0.01.00.0  0 200.0  01.0  0
T19 0  00.0   10.0  0   50.0  0  00.00.0  0 
   0.00.10.0  0 550.0  0   20.0  0
T20 0  00.00.0 450.0 15  00.00.0  0 
   0.01.00.0  0 500.0  00.0  0
T21 0  00.00.0  00.0  0  00.00.0  0 
   0.00.00.0  0  00.0  0   35.0  0
T22   100  00.00.0  00.0  0  00.00.0  0 
   0.00.00.0  0  00.0  00.0  0
T23 0  00.00.0  0   15.0  0  10.00.0  0 
   0.00.00.0  0  20.0  00.0  0
T24 0  00.10.0  00.0  0  00.00.0  0 
   0.00.00.0  0  00.0  0   25.0  0
T25 0  00.00.0  0   25.0  0  00.01.0  0 
   0.00.00.0  0 150.0  02.0  0
 
> dummy.spplist

code  SciName LifeFormClass
water   Water  Bare_GroundOther
salarbSalix_arbusculoides Deciduous_Shrubs Vascular
salpul   Salix_planifolia_pulchra Deciduous_Shrubs Vascular
popbalPopulus_balsamifera   Deciduous_Tree Vasc

[R] rmysql query help

2009-05-06 Thread Aaron Sims


R HELP,

I am trying to use an R script to connect to a mysql database.  I am 
having a problem using a variable in the where clause that contains a 
"space" in its value.
If I include the variable inside the quotes of the query - i think it is 
searching for the name of the variable in the database and not the value 
of the variable.
If I put it outside the quotes, then it complains about the space.  Are 
there special escape characters or something else Im missing?  This date 
format in a mysql table is pretty standard

Any ideas?

Thanks,
Aaron


require(RMySQL)
startdatetime<-"2009-04-04 01:00:00"
connect <- 
dbConnect(MySQL(),user="x",password="xx",dbname="x",host="xxx.xxx.xxx.xxx")


forecast <- dbSendQuery(connect, statement=paste("SELECT ICE FROM table1 
WHERE BEGTIME >= 'startdatetime'")) # doesnt read variable

or
forecast <- dbSendQuery(connect, statement=paste("SELECT ICE FROM table1 
WHERE BEGTIME >="startdatetime))  # space error


but this seems to work
forecast <- dbSendQuery(connect, statement=paste("SELECT ICE FROM table1 
WHERE BEGTIME >='2009-04-04 01:00:00'"))


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optimization challenge

2010-01-13 Thread Aaron Mackey

FYI, in bioinformatics, we use dynamic programming algorithms in similar
ways to solve similar problems of finding guaranteed-optimal partitions in
streams of data (usually DNA or protein sequence, but sometimes numerical
data from chip-arrays).  These "path optimization" algorithms are often
called Viterbi algorithms, a web search for which should provide multiple
references.

The solutions are not necessarily unique (there may be multiple
paths/partitions with identical integer maxima in some systems) and there is
much research on whether the optimal solution is actually the one you want
to work with (for example, there may be a fair amount of probability mass
within an area/ensemble of suboptimal solutions that overall have greater
posterior probabilities than does the optimal solution "singleton").  See
Chip Lawrence's PNAS paper for more erudite discussion, and references
therein: www.pnas.org/content/105/9/3209.abstract

-Aaron

P.S. Good to see you here Albyn -- I enjoyed your stat. methods course at
Reed back in 1993, which started me down a somewhat windy road to
statistical genomics!

--
Aaron J. Mackey, PhD
Assistant Professor
Center for Public Health Genomics
University of Virginia
amac...@virginia.edu


On Wed, Jan 13, 2010 at 5:23 PM, Ravi Varadhan  wrote:

> Greg - thanks for posting this interesting problem.
>
> Albyn - thanks for posting a solution.  Now, I have some questions: (1) is
> the algorithm guaranteed to find a "best" solution? (2) can there be
> multiple solutions (it seems like there can be more than 1 solution
> depending on the data)?, and (3) is there a good reference for this and
> similar algorithms?
>
> Thanks & Best,
> Ravi.
>
>
> 
> ---
>
> Ravi Varadhan, Ph.D.
>
> Assistant Professor, The Center on Aging and Health
>
> Division of Geriatric Medicine and Gerontology
>
> Johns Hopkins University
>
> Ph: (410) 502-2619
>
> Fax: (410) 614-9625
>
> Email: rvarad...@jhmi.edu
>
> Webpage:
>
> http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
> tml<http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h%0Atml>
>
>
>
>
> 
> 
>
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On
> Behalf Of Albyn Jones
> Sent: Wednesday, January 13, 2010 1:19 PM
> To: Greg Snow
> Cc: r-help@r-project.org
> Subject: Re: [R] optimization challenge
>
> The key idea is that you are building a matrix that contains the
> solutions to smaller problems which are sub-problems of the big
> problem.  The first row of the matrix SSQ contains the solution for no
> splits, ie SSQ[1,j] is just the sum of squares about the overall mean
> for reading chapters1 through j in one day.  The iteration then uses
> row m-1 to construct row m, since if SSQ[m-1,j] (optimal reading of j
> chapters in m-1 days) is part of the overall optimal solution, you
> have already computed it, and so don't ever need to recompute it.
>
>   TS = SSQ[m-1,j]+(SSQ1[j+1])
>
> computes the vector of possible solutions for SSQ[m,n] (n chapters in n
> days)
> breaking it into two pieces: chapters 1 to j in m-1 days, and chapters j+1
> to
> n in 1 day.  j is a vector in the function, and min(TS) is the minimum
> over choices of j, ie SSQ[m,n].
>
> At the end, SSQ[128,239] is the optimal value for reading all 239
> chapters in 128 days.  That's just the objective function, so the rest
> involves constructing the list of optimal cuts, ie which chapters are
> grouped together for each day's reading.  That code uses the same
> idea... constructing a list of lists of cutpoints.
>
> statisticians should study a bit of data structures and algorithms!
>
> albyn
>
> On Wed, Jan 13, 2010 at 10:45:11AM -0700, Greg Snow wrote:
> > WOW, your results give about half the variance of my best optim run
> (possibly due to my suboptimal use of optim).
> >
> > Can you describe a little what the algorithm is doing?
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.s...@imail.org
> > 801.408.8111
> >
> >
> > > -Original Message-
> > > From: Albyn Jones [mailto:jo...@reed.edu]
> > > Sent: Tuesday, January 12, 2010 5:31 PM
> > > To: Greg Snow
> > > Cc: r-help@r-project.org
> > > Subject: Re: [R] optimization challenge
> > >
> > > Greg
> > >
> > > Nice problem: I wa

[R] prop.test CI depends on null hypothesis?

2019-10-21 Thread Aaron Rendahl

Why does prop.test use continuity correction "only if it does not exceed
the difference between sample and null proportions in absolute value"?  I'm
referring here to the single group method, though I believe there is a
similar issue with the two group method.

What this means in practice is that the confidence interval changes
depending on the null hypothesis; see examples below. This is unexpected,
and I have been unable to find any documentation explaining why this is
done (see links below examples).

## when the null proportion is equal to the sample proportion, it does not
## use the continuity correction, even when one is asked for

  > prop.test(30,60,p=0.5, correct=TRUE)

1-sample proportions test without continuity correction

data:  30 out of 60, null probability 0.5
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.3773502 0.6226498
sample estimates:
  p
0.5

## however, when the null proportion is not equal to the sample proportion,
## it does use the continuity correction when it is asked for.

> prop.test(30,60,p=0.499, correct=TRUE)

1-sample proportions test with continuity correction

data:  30 out of 60, null probability 0.499
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: true p is not equal to 0.499
95 percent confidence interval:
 0.3764106 0.6235894
sample estimates:
  p
0.5


The documentation refers to Newcombe's 1998 Statistics in Medicine article;
I read through this and found nothing about not using the continuity
correction in this situation.
https://doi.org/10.1002/(SICI)1097-0258(19980430)17:8%3C857::AID-SIM777%3E3.0.CO;2-E

On this mailing list, there was a 2013 post "prop.test correct true and
false gives same answer", which was answered only with the quote from the
help page: https://stat.ethz.ch/pipermail/r-help/2013-March/350386.html

I also found several questions asking which Newcombe method is implemented,
which didn't elicit specific answers; here's one from 2011:
https://stat.ethz.ch/pipermail/r-help/2011-April/274086.html



--
Aaron Rendahl, Ph.D.
Assistant Professor of Statistics and Informatics
College of Veterinary Medicine, University of Minnesota
295L AS/VM, 612-301-2161

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] prop.test CI depends on null hypothesis?

2019-10-22 Thread Aaron Rendahl

I believe this is correct behavior for computing the p-value, though the
wording is awkward in that it implies that R is not implementing the
continuity correction in this situation, when in fact, this behavior is
part of how the continuity correction is defined. The correction simply
treats the normal approximation as appropriately discrete, so (to translate
to a binomial variable) computes P(X > 12) using P(X > 11.5). The case the
documentation discusses is simply the case where the null hypothesis falls
within the discrete band corresponding to the observed value; in this only
enough of the correction is used so that the test statistic is
appropriately zero and the p-value is 1.

However, this is not correct behavior for the confidence interval. There is
nothing in any of the listed documentation that would support such
behavior, and additionally, it doesn't make sense for a confidence interval
to depend on a null parameter. If continuity correction is desired, the
edges of the confidence bound should still be fully adjusted even when the
observed proportion is close to the null parameter. What's currently
happening is that it's not adjusted at all when the observed proportion
equals the null proportion, and in cases where it is not equal but still
close enough that the correction is adjusted, the confidence intervals are
neither "with" correction" or "without" correction but instead somewhere in
between!

An additional confusing matter is how R reports whether the test was
performed "with" or "without" continuity correction; this is determined in
code by whether or not the adjusted correction is zero or not. This happens
when the observed proportion equals the null proportion, so whenever that
happens, it's reported "without" continuity correction, so this "flips" on
the user in this case. Though oddly (to the user), changing the null p by a
tiny amount gives only tiny changes to the result but then it is reported
"with" correction.

This behavior has presumably been in R for a long time (though I haven't
checked the code history), so I would love to have feedback from the R-help
community about:
 * does the current behavior really make sense, and I've just misunderstood
something?
 * is there documentation or discussions about this behavior out there
somewhere that I've missed?
 * if this really is a "new" discovery, how best to bring it to the
attention of those who can decide what to do about it?

Thanks!

On Mon, Oct 21, 2019 at 11:33 AM Aaron Rendahl  wrote:

> Why does prop.test use continuity correction "only if it does not exceed
> the difference between sample and null proportions in absolute value"?  I'm
> referring here to the single group method, though I believe there is a
> similar issue with the two group method.
>
> What this means in practice is that the confidence interval changes
> depending on the null hypothesis; see examples below. This is unexpected,
> and I have been unable to find any documentation explaining why this is
> done (see links below examples).
>
> ## when the null proportion is equal to the sample proportion, it does not
> ## use the continuity correction, even when one is asked for
>
>   > prop.test(30,60,p=0.5, correct=TRUE)
>
> 1-sample proportions test without continuity correction
>
> data:  30 out of 60, null probability 0.5
> X-squared = 0, df = 1, p-value = 1
> alternative hypothesis: true p is not equal to 0.5
> 95 percent confidence interval:
>  0.3773502 0.6226498
> sample estimates:
>   p
> 0.5
>
> ## however, when the null proportion is not equal to the sample
> proportion,
> ## it does use the continuity correction when it is asked for.
>
> > prop.test(30,60,p=0.499, correct=TRUE)
>
> 1-sample proportions test with continuity correction
>
> data:  30 out of 60, null probability 0.499
> X-squared = 0, df = 1, p-value = 1
> alternative hypothesis: true p is not equal to 0.499
> 95 percent confidence interval:
>  0.3764106 0.6235894
> sample estimates:
>   p
> 0.5
>
>
> The documentation refers to Newcombe's 1998 Statistics in Medicine
> article; I read through this and found nothing about not using the
> continuity correction in this situation.
>
> https://doi.org/10.1002/(SICI)1097-0258(19980430)17:8%3C857::AID-SIM777%3E3.0.CO;2-E
>
> On this mailing list, there was a 2013 post "prop.test correct true and
> false gives same answer", which was answered only with the quote from the
> help page: https://stat.ethz.ch/pipermail/r-help/2013-March/350386.html
>
> I also found several questions asking which Newcombe method is
> implemented, which didn't elicit specific answers; here's one from 2011:
> https://stat.ethz.ch/p

Re: [R] XYZ data

2013-06-26 Thread Aaron Mackey

for plotting purposes, I typically jitter() the x's and y's to see the
otherwise overlapping data points
-Aaron

On Wed, Jun 26, 2013 at 12:29 PM, Shane Carey  wrote:

> Nope, neither work. :-(
>
>
> On Wed, Jun 26, 2013 at 5:16 PM, Clint Bowman  wrote:
>
> > John,
> >
> > That still leaves a string of identical numbers in the vector.
> >
> > Shane,
> >
> > ?jitter
> >
> > perhaps jitter(X,1,0.0001)
> >
> > Clint
> >
> > Clint BowmanINTERNET:   cl...@ecy.wa.gov
> > Air Quality Modeler INTERNET:   cl...@math.utah.edu
> > Department of Ecology   VOICE:  (360) 407-6815
> > PO Box 47600FAX:(360) 407-7534
> > Olympia, WA 98504-7600
> >
> > USPS:   PO Box 47600, Olympia, WA 98504-7600
> > Parcels:300 Desmond Drive, Lacey, WA 98503-1274
> >
> > On Wed, 26 Jun 2013, John Kane wrote:
> >
> >  mm  <-  1:10
> >> nn  <- mm + .001
> >>
> >> John Kane
> >> Kingston ON Canada
> >>
> >>
> >>  -Original Message-
> >>> From: careys...@gmail.com
> >>> Sent: Wed, 26 Jun 2013 16:48:34 +0100
> >>> To: r-help@r-project.org
> >>> Subject: [R] XYZ data
> >>>
> >>> I have x, y, z data. The x, y fields dont change but Z does. How do I
> add
> >>> a
> >>> very small number onto the end of each x, y data point.
> >>>
> >>> For example:
> >>>
> >>> Original (X)  Original (Y) Original (Z)
> >>> 15   20  30
> >>> 15   20  40
> >>>
> >>>
> >>>
> >>>
> >>> New (X)  New (Y) New (Z)
> >>> 15.1 20.01  30
> >>> 15.2 20.02  40
> >>>
> >>>
> >>> Thanks
> >>> --
> >>> Shane
> >>>
> >>> [[alternative HTML version deleted]]
> >>>
> >>> __**
> >>> R-help@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/**listinfo/r-help<
> https://stat.ethz.ch/mailman/listinfo/r-help>
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/**posting-guide.html<
> http://www.R-project.org/posting-guide.html>
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> __**__
> >> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
> >>
> >> __**
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/**listinfo/r-help<
> https://stat.ethz.ch/mailman/listinfo/r-help>
> >> PLEASE do read the posting guide http://www.R-project.org/**
> >> posting-guide.html <http://www.R-project.org/posting-guide.html>
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
>
>
> --
> Shane
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] points3d and ordirgl

2012-12-07 Thread Aaron Wells

Hello all, I have been using the function ordirgl to plot 3D dynamic
ordinations.  The ordirgl function works just fine. IN fact, I was even
able to write a function that allows me to identify points in the 3D plot:

identify.rgl<-function(env_var,ord,dim1,dim2,dim3)

{
tmp<-select3d(button="left")
tmp.keep<-tmp(ord[,dim1],ord[,dim2],ord[,dim3])
env_var[tmp.keep=="TRUE"]
}

where
env_var = a variable to be identified (e.g. plot IDs as in >
row.names(dataframe))
ord = ordination points or scores created using a function such as metaMDS
or nmds) that is recognized by points or scores
dim1 = dimension 1 (e.g., 1)
dim2 = dimension 2 (e.g., 2)
dim 3 = dimension 3 (e.g, 3

e.g.,  > identify.rgl(row.names(vegmat),veg_nmds$points,1,2,3)

My issue is that I would like to use the points3d function to add points of
different colors and sizes to the dynamic 3D plot created by using ordirgl.
In my case the different colored and sized points represent different
clusters from the results of the Partitioning Around Mediods (pam)
clustering function (from library cluster).  I have used this with success
in the past (two years back), but can't get it to work properly  now. An
example of the code I have used in the past is:

>
points3d(veg_nmds$points[,1],veg_nmds$points[,2],veg_nmds$points[,3],display
= "sites",veg_pam12$clustering=="1",col=2,size=3)

The code above is intended to add the points from cluster 1 to the nmds
plot in the color red and size 3.

Anyone have an ideas?

Thanks,
   Aaron

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Renaming variables

2013-09-20 Thread Aaron Mackey

On Fri, Sep 20, 2013 at 10:10 AM, Preetam Pal  wrote:

> I have 25 variables in the data file (name: score), i.e. X1,X2,.,X25.
>
> I dont want to use score$X1, score$X2 everytime I use these variables.
>

attach(score)

plot(X1, X2) # etc. etc.

-Aaron

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combine glmnet and coxph (and survfit) with strata()

2013-12-09 Thread Aaron Mackey

I'm also curious how to use glmnet with survfit -- specifically, for use
with interval regression (which, under the hood, is implemented using
survfit).  Can you show how you converted your Surv object formula to a
design matrix for use with glmnet?

Thanks,
-Aaron

On Sun, Dec 8, 2013 at 12:45 AM, Jieyue Li  wrote:

> Dear All,
>
> I want to generate survival curve with cox model but I want to estimate the
> coefficients using glmnet. However, I also want to include a strata() term
> in the model. Could anyone please tell me how to have this strata() effect
> in the model in glmnet? I tried converting a formula with strata() to a
> design matrix and feeding to glmnet, but glmnet just treats the strata()
> term with one independent variable...
>
> I know that if there is no such strata(), I can estimate coefficients from
> glmnet and use "...init=selectedBeta,iter=0)" in the coxph. Please advise
> me or also correct me if I'm wrong.
>
> Thank you very much!
>
> Best,
>
> Jieyue
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generate random percentages and placing vectors

2010-10-28 Thread Aaron Lee

Thanks for the help!

However, for the code in #2, it seems to just randomly split up the vectors.
I would still like to keep the integrity of each vector.

For example:

if v1 = (1,2,3)
   v2 = (4,5,6)

output = (0,0,0,1,2,3,0,0,0,0,4,5,6,0,0,0,0,0,0,0,0,0,0,0,0) - which has a
specified length of 25

With v1 and v2 inserted in at random locations.

On Wed, Oct 27, 2010 at 10:25 AM, Jonathan P Daily  wrote:

>
> 1)
> rands <- runif(5)
> rands <- rands/sum(rands)*100
>
> 2)
> # assume vectors are v1, v2, etc.
> v_all <- c(v1, v2, ...)
> v_len <- length(v_all)
>
> output <- rep(0,25)
> output[sample(1:25, v_len)] <- v_all
>
> --
> Jonathan P. Daily
> Technician - USGS Leetown Science Center
> 11649 Leetown Road
> Kearneysville WV, 25430
> (304) 724-4480
> "Is the room still a room when its empty? Does the room,
> the thing itself have purpose? Or do we, what's the word... imbue it."
> - Jubal Early, Firefly
>
>
>  From: Aaron Lee  To: r-help@r-project.org Date: 
> 10/27/2010
> 11:06 AM Subject: [R] Generate random percentages and placing vectors Sent
> by: r-help-boun...@r-project.org
> --
>
>
>
> Hello everyone,
>
> I have two questions:
>
> 1.) I would like to generate random percentages that add up to 100. For
> example, if I need 5 percentages, I would obtain something like: 20, 30,
> 40,
> 5, 5. Is there some way to do this in R?
>
> 2.) I would like to insert vectors of specified length into a larger vector
> of specified length randomly, and fill the gaps with zeroes. For example,
> if
> I have 3 vectors of length 3, 2, and 2 with values and I would like to
> randomly place them into a vector of length 25 made of 0's.
>
> Thank you in advance!
>
> -Aaron
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Arrange elements on a matrix according to rowSums + short 'apply' Q

2010-12-02 Thread Aaron Polhamus

Greetings,

My goal is to create a Markov transition matrix (probability of moving from
one state to another) with the 'highest traffic' portion of the matrix
occupying the top-left section. Consider the following sample:

inputData <- c(
c(5, 3, 1, 6, 7),
c(9, 7, 3, 10, 11),
c(1, 2, 3, 4, 5),
c(2, 4, 6, 8, 10),
c(9, 5, 2, 1, 1)
)

MAT <- matrix(inputData, nrow = 5, ncol = 5, byrow = TRUE)
colnames(MAT) <- c("A", "B", "C", "D", "E")
rownames(MAT) <- c("A", "B", "C", "D", "E")

rowSums(MAT)

I wan to re-arrange the elements of this matrix such that the elements with
the largest row sums are placed to the top-left, in descending order. Does
this make sense? In this case the order I'm looking for would be B, D, A, E,
C Any thoughts?

As an aside, here is the function I've written to construct the transition
matrix. Is there a more elegant way to do this that doesn't involve a double
transpose?

TMAT <- apply(t(MAT), 2, function(X) X/sum(X))
TMAT <- t(TMAT)

I tried the following:

TMAT <- apply(MAT, 1, function(X) X/sum(X))

But my the custom function is still getting applied over the columns of the
array, rather than the rows. For a check try:

rowSums(TMAT)
colSums(TMAT)

Row sums here should equal 1...

Many thanks in advance,
Aaron

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Arrange elements on a matrix according to rowSums + short 'apply' Q

2010-12-02 Thread Aaron Polhamus

Ivan and Michael,

Many thanks for the tips, those solved my queries. Still interested in how
to force custom functions to work over rows rather than columns when using
apply, but the MAT/rowSums(MAT) technique is definitely the most efficient
way to go for this application.

Cheers,
Aaron

2010/12/2 Michael Bedward 

> Hi Aaron,
>
> Following up on Ivan's suggestion, if you want the column order to
> mirror the row order...
>
> mo <- order(rowSums(MAT), decreasing=TRUE)
> MAT2 <- MAT[mo, mo]
>
> Also, you don't need all those extra c() calls when creating
> inputData, just the outermost one.
>
> Regarding your second question, your statements...
>
> TMAT <- apply(t(MAT), 2, function(X) X/sum(X))
> TMAT <- t(TMAT)
>
> is actually just a complicated way of doing this...
>
> TMAT <- MAT / rowSums(MAT)
>
> You can confirm that by doing it your way and then this...
>
> TMAT == MAT / rowSums(MAT)
>
> ...and you should see a matrix of TRUE values
>
> Michael
>
>
> On 2 December 2010 20:43, Ivan Calandra 
> wrote:
> > Hi,
> >
> > Here is a not so easy way to do your first step, but it works:
> > MAT2 <- cbind(MAT, rowSums(MAT))
> > MAT[order(MAT2[,6], decreasing=TRUE),]
> >
> > For the second, I don't know!
> >
> > HTH,
> > Ivan
> >
> >
> > Le 12/2/2010 09:46, Aaron Polhamus a écrit :
> >>
> >> Greetings,
> >>
> >> My goal is to create a Markov transition matrix (probability of moving
> >> from
> >> one state to another) with the 'highest traffic' portion of the matrix
> >> occupying the top-left section. Consider the following sample:
> >>
> >> inputData<- c(
> >> c(5, 3, 1, 6, 7),
> >> c(9, 7, 3, 10, 11),
> >> c(1, 2, 3, 4, 5),
> >> c(2, 4, 6, 8, 10),
> >> c(9, 5, 2, 1, 1)
> >> )
> >>
> >> MAT<- matrix(inputData, nrow = 5, ncol = 5, byrow = TRUE)
> >> colnames(MAT)<- c("A", "B", "C", "D", "E")
> >> rownames(MAT)<- c("A", "B", "C", "D", "E")
> >>
> >> rowSums(MAT)
> >>
> >> I wan to re-arrange the elements of this matrix such that the elements
> >> with
> >> the largest row sums are placed to the top-left, in descending order.
> Does
> >> this make sense? In this case the order I'm looking for would be B, D,
> A,
> >> E,
> >> C Any thoughts?
> >>
> >> As an aside, here is the function I've written to construct the
> transition
> >> matrix. Is there a more elegant way to do this that doesn't involve a
> >> double
> >> transpose?
> >>
> >> TMAT<- apply(t(MAT), 2, function(X) X/sum(X))
> >> TMAT<- t(TMAT)
> >>
> >> I tried the following:
> >>
> >> TMAT<- apply(MAT, 1, function(X) X/sum(X))
> >>
> >> But my the custom function is still getting applied over the columns of
> >> the
> >> array, rather than the rows. For a check try:
> >>
> >> rowSums(TMAT)
> >> colSums(TMAT)
> >>
> >> Row sums here should equal 1...
> >>
> >> Many thanks in advance,
> >> Aaron
> >>
> >>[[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > --
> > Ivan CALANDRA
> > PhD Student
> > University of Hamburg
> > Biozentrum Grindel und Zoologisches Museum
> > Abt. Säugetiere
> > Martin-Luther-King-Platz 3
> > D-20146 Hamburg, GERMANY
> > +49(0)40 42838 6231
> > ivan.calan...@uni-hamburg.de
> >
> > **
> > http://www.for771.uni-bonn.de
> > http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



-- 
Aaron Polhamus 
Statistical consultant, Revolution Analytics
MSc Applied Statistics, The University of Oxford, 2009
838a NW 52nd St, Seattle, WA 98107
Cell: +1 (206) 380.3948

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Writing out data from a list

2011-01-13 Thread Aaron Lee

Hello,

I have a list of data, such that:

[[1]]
 [1] 0.00 0.00 0.03 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.01 0.00
0.00 0.03 0.01 0.00 0.01 0.00 0.03 0.16 0.14 0.02 0.17 0.01 0.01 0.00 0.00
0.03 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00
[42] 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.04 0.08 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00

[[2]]
 [1] 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.07 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00

[[3]]
 [1] 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

etc.

I would like to write to a text file with this data, but would like each
section of the file to be separated by some text. For example:

"Event 1"
"Random Text"
0
0
0.03
0.01

"Event 2"
"Random Text"
0
0
0
0
0.01

etc.

Is there some way to continually write text out using a loop and also
attaching a string before each data segment? Thank you in advance!
-Aaron

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Difficult with round() function

2011-01-17 Thread Aaron Polhamus

Dear list,

I'm writing a function to re-grid a data set from finer to coarser
resolutions in R as follows (I use this function with sapply/apply):

gridResize <- function(startVec = stop("What's your input vector"),
to = stop("Missing 'to': How long do you want the fnial vector to be?")){
 from <- length(startVec)
shortVec<-numeric()
tics <- from*to
for(j in 1:to){
interval <- ((j/to)*tics - (1/to)*tics + 1):((j/to)*tics)
benchmarks <- interval/to
 #FIRST RUN ASSUMES FINAL BENCHMARK/TO IS AN INTEGER...
positions <- which(round(benchmarks) == benchmarks)
indeces <- benchmarks[positions]
fracs <- numeric()
 #SINCE MUCH OF THE TIME THIS WILL NOT BE THE CASE, THIS SCRIPT DEALS WITH
THE REMAINDER...
for(i in 1:length(positions)){
if(i == 1) fracs[i] <- positions[i]/length(benchmarks) else{
fracs[i] <- (positions[i] - sum(positions[1:(i-1)]))/length(benchmarks)
}
}
 #AND UPDATES STARTVEC INDECES AND FRACTION MULTIPLIERS
if(max(positions) != length(benchmarks)) indeces <- c(indeces, max(indeces)
+ 1)
if(sum(fracs) != 1) fracs <- c(fracs, 1 - sum(fracs))
 fromVals <- startVec[indeces]
 if(any(is.na(fromVals))){
NAindex <- which(is.na(fromVals))
if(sum(Fracs[-NAindex]) >= 0.5)  shortVec[j] <- sum(fromVals*fracs,
na.rm=TRUE) else shortVec[j] <- NA
}else{shortVec[j] <- sum(fromVals*fracs)}
}
return(shortVec)
}


for the simple test case test <- gridResize(startVec =
c(2,4,6,8,10,8,6,4,2), to = 7) the function works fine. For larger vectors,
however, it breaks down. E.g.: test <- gridResize(startVec = rnorm(300, 9,
20), to = 200)

This returns the error:

Error in positions[1:(i - 1)] :
  only 0's may be mixed with negative subscripts

and the problem seems to be in the line positions <- which(round(benchmarks)
== benchmarks). In this particular example the code cracks up at j = 27.
When set j = 27 and run the calculation manually I discover the following:

> benchmarks[200]
[1] 40
> benchmarks[200] == 40
[1] FALSE
> round(benchmarks[200]) == 40
[1] TRUE

Even though my benchmark calculation seems to be returning a clean integers
to serve as inputs for the creation of the 'positions' variable, for
whatever reason R doesn't read it that way. I would be very grateful for any
advice on how I can either alter my approach entirely (I am sure there is a
far more elegant way to regrid data in R) or a simple fix for this rounding
error.

Many thanks in advance,
Aaron

-- 
Aaron Polhamus 
Statistical consultant, Revolution Analytics
MSc Applied Statistics, The University of Oxford, 2009
838a NW 52nd St, Seattle, WA 98107
Cell: +1 (206) 380.3948

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Shrink file size of pdf graphics

2011-05-19 Thread Aaron Mackey

You can try something like this, at the command line:

  gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 -dPDFSETTINGS=/screen
-dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

evidently, the new compactPDF() function in R 2.13 does something very similar.

-Aaron

On Thu, May 19, 2011 at 11:30 AM, Duncan Murdoch
 wrote:
>
> On 19/05/2011 11:14 AM, Layman123 wrote:
>>
>> Hi everyone,
>>
>> My data consists of a system of nearly 75000 roads, available as a
>> shapefile. When I plot the road system, by adding the individual roads with
>> 'lines' and store it as a pdf-file with 'pdf' I get a file of size 13 MB.
>> This is way too large to add it in my LaTeX-document, because there will be
>> some more graphics of this type.
>> Now I'm curious to learn wheter there is a possibility in R to shrink the
>> file size of this graphic? I merely need it in a resolution so that it looks
>> "smooth" when printed out. I don't know much about the storage of R
>> graphics, but maybe there is a way to change the way the file is stored
>> perhaps as a pixel image?
>
>
> There are several possibilities.  You can use a bitmapped device (e.g. png()) 
> to save the image; pdflatex can include those.
>
> You can compress the .pdf file using an external tool like pdftk (or do it 
> internally in R 2.14.x, coming soon).
>
> There are probably others...
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Variable in file name png

2011-06-08 Thread Aaron Coutino

Hi,

I'm having trouble with getting the png function to properly produce multiple 
graphs. RIght now I have:

for (z in data) {
 png(file=z,bg="white")
 thisdf<-data[[z]]
 plot(thisdf$rc,thisdf$psi)
 dev.off()
 }

Which should take the "data" object, a list of data sets and produce a graph of 
each with respect to the two variables rc and psi.
I want the names to change for each graph, but am not sure how to do it, any 
help would be apreciated.
Thanks,
-Acoutino
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running gplots package with Windows 7

2010-05-17 Thread Aaron Gusdon

Thank-you for the replies.  I believe I figured out what the problem was.
When I installed the package on linux it ran smoothly, but I just need to
install a lot of accessory packages to make gplots work with Windows.

Thanks again,

Aaron





2010/5/17 Uwe Ligges 

> Additionally, please give the full output that let you assume
> "The package installs fine"...
>
> Uwe Ligges
>
>
>
> On 17.05.2010 10:18, Henrik Bengtsson wrote:
>
>> I won't have an answer but it will help others to help you if you also
>> report what the following gives:
>>
>> library("gtools");
>> print(sessionInfo());
>>
>> and
>>
>> print(packageDescription("gtools"));
>>
>> My $.02
>>
>> Henrik
>>
>>
>> On Mon, May 17, 2010 at 4:01 AM, agusdon  wrote:
>>
>>>
>>> Hello,
>>>
>>> I'm fairly new to R and am running version 2.11.0 with Windows 7.  I need
>>> to
>>> run the package gplots.  The package installs fine, but when I try to
>>> load
>>> it I receive the message:
>>>
>>> Loading required package: gtools
>>> Error: package 'gtools' could not be loaded
>>> In addition: Warning message:
>>> In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc =
>>> lib.loc) :
>>>  there is no package called 'gtools'
>>>
>>> After that, the package has some functionality but I cannot run the
>>> barplot2
>>> command, which is what I need to use the most.
>>>
>>> If anyone has suggests how to fix this problem, I would be very grateful.
>>>
>>> Thanks!
>>>
>>> Aaron
>>> --
>>> View this message in context:
>>> http://r.789695.n4.nabble.com/Running-gplots-package-with-Windows-7-tp2219020p2219020.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Complex sampling?

2011-03-09 Thread Aaron Mackey

What I think you need is something along the lines of:

matrix(c(sample(3:7), sample(3:7), sample(3:7), sample(3:7), ...), nrow=2)

now, each column are your random pairs.

-Aaron

On Wed, Mar 9, 2011 at 1:01 PM, Hosack, Michael  wrote:

> > -Original Message-
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at
> r-project.org]
> > On Behalf Of Hosack, Michael
> > Sent: Wednesday, March 09, 2011 7:34 AM
> > To: r-help at R-project.org
> > Subject: [R] Complex sampling?
> >
> > R users,
> >
> > I am trying to generate a randomized weekday survey schedule that ensures
> > even coverage of weekdays in
> > the sample, where the distribution of variable DOW is random with respect
> > to WEEK. To accomplish this I need
> > to randomly sample without replacement two weekdays per week for each of
> > 27 weeks (only 5 are shown).
>
> This seems simple enough, sampling without replacement.
>
> However,
> > I need to sample from a sequence (3:7) that needs to be completely
> > depleted and replenished until the
> > final selection is made. Here is an example of what I want to do,
> > beginning at WEEK 1. I would prefer to do
> > this without using a loop, if possible.
> >
> > sample frame: [3,4,5,6,7] --> [4,5,6] --> [4],[1,2,3,(4),5,6] -->
> > [1,2,4,5,6] --> for each WEEK in dataframe
>
> OK, now you have me completely lost.  Sorry, but I have no clue as to what
> you just did here.  I looks like you are trying to describe some
> transformation/algorithm but I don't follow it.
>
>
>
> I could not reply to this email because it not been delivered to my inbox,
> so I had to copy it from the forum.
> I apologize for the confusion, this would take less than a minute to
> explain in conversation but an hour
> to explain well in print. Two DOW_NUMs will be selected randomly without
> replacement from the vector 3:7 for each WEEK. When this vector is reduced
> to a single integer that integer will be selected and the vector will be
> restored and a single integer will then be selected that differs from the
> prior selected integer (i.e. cannot sample the same day twice in the same
> week). This process will be repeated until two DOW_NUM have been assigned
> for each WEEK. That process is what I attempted to illustrate in my original
> message. This is beyond my current coding capabilities.
>
>
>
> >
> > Randomly sample 2 DOW_NUM without replacement from each WEEK ( () = no
> two
> > identical DOW_NUM can be sampled
> > in the same WEEK)
> >
> > sample = {3,7}, {5,6}, {4,3}, {1,5}, --> for each WEEK in dataframe
> >
>
> So, are you sampling from [3,4,5,6,7], or [1,2,4,5,6], or ...?  Can you
> show an 'example' of what you would like to end up given your data below?
>
> >
> > Thanks you,
> >
> > Mike
> >
> >
> >  DATE DOW DOW_NUM WEEK
> > 2  2011-05-02 Mon   31
> > 3  2011-05-03 Tue   41
> > 4  2011-05-04 Wed   51
> > 5  2011-05-05 Thu   61
> > 6  2011-05-06 Fri   71
> > 9  2011-05-09 Mon   32
> > 10 2011-05-10 Tue   42
> > 11 2011-05-11 Wed   52
> > 12 2011-05-12 Thu   62
> > 13 2011-05-13 Fri   72
> > 16 2011-05-16 Mon   33
> > 17 2011-05-17 Tue   43
> > 18 2011-05-18 Wed   53
> > 19 2011-05-19 Thu   63
> > 20 2011-05-20 Fri   73
> > 23 2011-05-23 Mon   34
> > 24 2011-05-24 Tue   44
> > 25 2011-05-25 Wed   54
> > 26 2011-05-26 Thu   64
> > 27 2011-05-27 Fri   74
> > 30 2011-05-30 Mon   35
> > 31 2011-05-31 Tue   45
> > 32 2011-06-01 Wed   55
> > 33 2011-06-02 Thu   65
> > 34 2011-06-03 Fri   75
> >
> > DF <-
> > structure(list(DATE = structure(c(15096, 15097, 15098, 15099,
> > 15100, 15103, 15104, 15105, 15106, 15107, 15110, 15111, 15112,
> > 15113, 15114, 15117, 15118, 15119, 15120, 15121, 15124, 15125,
> > 15126, 15127, 15128), class = "Date"), DOW = c("Mon", "Tue",
> > "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri", "Mon",
> > "Tue", "Wed", "Thu", "Fri", "Mon", "Tue", "Wed", "Thu", "Fri",
> > "Mon", "Tue", "Wed", "Thu", "Fri"), DOW_NUM = c(3, 4, 5, 6, 7,
> > 3, 4, 5, 6, 7, 3, 4

[R] Easy 'apply' question

2011-03-10 Thread Aaron Polhamus

Dear list,

I couldn't find a solution for this problem online, as simple as it seems.
Here's the problem:


#Construct test dataframe
tf <- data.frame(1:3,4:6,c("A","A","A"))

#Try the apply function I'm trying to use
test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else unique(x)[1])

#Look at the output--all columns treated as character columns...
test

#Look at the format of the original data--the first two columns are
integers.
str(tf)


In general terms, I want to differentiate what function I apply over a
row/column based on what type of data that row/column contains. Here I want
a simple mean if the column is numeric, and the first unique value if the
column is a character column. As you can see, 'apply' treats all columns as
characters the way I've written his function.

Any thoughts? Many thanks in advance,
Aaron

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Easy 'apply' question

2011-03-11 Thread Aaron Polhamus

Perfect, thanks Josh!

Cheers,
A

2011/3/10 Joshua Wiley 

> Dear Aaron,
>
> The problem is not with your function, but using apply().  Look at the
> "Details" section of ?apply  You will see that if the data is not an
> array or matrix, apply will coerce it to one (or try).  Now go over to
> the "Details" section of ?matrix and you will see that matrices can
> only contain a single class of data and that this follows a hierarchy.
>  In short, your data frame is coerced to a data frame and the classes
> are all coerced to the highest---character.  You can use lapply()
> instead to get your desired results.  Here is an example:
>
> ## Construct (named) test dataframe
> tf <- data.frame(x = 1:3, y = 4:6, z = c("A","A","A"))
>
> ## Show why what you tried did not work
> (test <- apply(tf, 2, class))
>
> ## using lapply()
> (test <- lapply(tf, function(x) {
>  if(is.numeric(x)) mean(x) else unique(x)[1]}))
>
>
> Hope this helps,
>
> Josh
>
> On Thu, Mar 10, 2011 at 5:11 PM, Aaron Polhamus 
> wrote:
> > Dear list,
> >
> > I couldn't find a solution for this problem online, as simple as it
> seems.
> > Here's the problem:
> >
> >
> > #Construct test dataframe
> > tf <- data.frame(1:3,4:6,c("A","A","A"))
> >
> > #Try the apply function I'm trying to use
> > test <- apply(tf,2,function(x) if(is.numeric(x)) mean(x) else
> unique(x)[1])
> >
> > #Look at the output--all columns treated as character columns...
> > test
> >
> > #Look at the format of the original data--the first two columns are
> > integers.
> > str(tf)
> >
> >
> > In general terms, I want to differentiate what function I apply over a
> > row/column based on what type of data that row/column contains. Here I
> want
> > a simple mean if the column is numeric, and the first unique value if the
> > column is a character column. As you can see, 'apply' treats all columns
> as
> > characters the way I've written his function.
> >
> > Any thoughts? Many thanks in advance,
> > Aaron
> >
> >[[alternative HTML version deleted]]
> >
> > ______
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>



-- 
Aaron Polhamus
NASA Jet Propulsion Lab
Statistical consultant, Revolution Analytics

160 E Corson Street Apt 207, Pasadena, CA 91103
Cell: +1 (206) 380.3948
Email: 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reporting odds ratios or risk ratios from GLM

2011-03-15 Thread Aaron Mackey

OR <- exp(coef(GLM.2)[-1])
OR.ci <- exp(confint(GLM.2)[-1,])

-Aaron

On Tue, Mar 15, 2011 at 1:25 PM, lafadnes  wrote:

> I am a new R user (am using it through the Rcmdr package) and have
> struggled
> to find out how to report OR and RR directly when running GLM models (not
> only reporting coefficients.)
>
> Example of the syntax that I have used:
>
> GLM.2 <- glm(diarsev ~ treatmentarm +childage +breastfed,
> family=binomial(logit), data=fieldtrials2)
> summary(GLM.2)
>
> This works well except that I manually have to calculate the OR based on
> the
> coefficients. Can I get these directly (with confidence intervals) by just
> amending the syntax?
>
> Will be grateful for advice!
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Reporting-odds-ratios-or-risk-ratios-from-GLM-tp3357209p3357209.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Determining frequency and period of a wave

2012-06-01 Thread Aaron Patterson

Hello!  I'm collecting data on a refrigerator that I'm using to cure
meat.  Specifically I am collection humidity and temperature readings.
The temperature readings look sinusoidal (due to the refrigerator
turning on and off).

I'd like to calculate the frequency and period of the wave so that I can
determine if modifications I make to the equipment are increasing or
decreasing efficiency.  Unfortunately, I'm pretty new to R, so I'm not
sure how to figure this out.  I *suspect* I should be doing an fft on
the temperature data, but I'm not sure where to go from there.

Here is a graph I'm producing:

  http://i.imgur.com/WpsDi.png

Here is the program I have so far:

  https://github.com/tenderlove/rsausage/blob/master/graphing.r

I have posted a repository with a SQLite database that has the data I've
collected here:
  
  https://github.com/tenderlove/rsausage

Any help would be greatly appreciated!

-- 
Aaron Patterson
http://tenderlovemaking.com/


pgp8lz0FZdvoQ.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hardy Weinberg

2011-06-22 Thread Aaron Mackey

H-W only gives you the expected frequency of AA, AB, and BB genotypes (i.e.
a 1x3 table):

minor <- runif(1, 0.05, 0.25)
major <- 1-minor
AA <- minor^2
AB <- 2*minor*major
BB <- major^2
df <- cbind(AA, AB, BB)

-Aaron

On Tue, Jun 21, 2011 at 9:30 PM, Jim Silverton wrote:

> Hello all,
> I am interested in simulating 10,000  2 x 3 tables for SNPs data with the
> Hardy Weinberg formulation. Is there a quick way to do this? I am assuming
> that the minor allelle frequency is uniform in (0.05, 0.25).
>
> --
> Thanks,
> Jim.
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Very slow optim()

2011-07-13 Thread aaron Greenberg


Why use a hammer when you need a wrench?
Admb seems to be the best tool for the job. It has several slick
interfaces with R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] HWEBayes, swapping the homozygotes genotype frequencies

2011-10-09 Thread Aaron Mackey

Without really knowing this code, I can guess that it may be the
"triangular" prior at work.  Bayes Factors are notorious for being sensitive
to the prior.  Presumably, the prior somehow prefers to see the rarer allele
as the "BB", and not the "AA" homozygous genotype (this is a common
assumption: that AA is the reference, and thus the major, more frequent,
allele).

-Aaron

On Sat, Oct 8, 2011 at 7:52 PM, stat999  wrote:

>  I evaluated the Bayes factor in the k=2 allele case with a "triangular"
> prior under the null as in the example in the help file:
>
>
> HWETriangBF2(nvec=c(88,10,2))
> [1] 0.4580336
>
> When I swap the n11 entry and n22 entry of nvec, I received totally
> different Bayes factor:
>
> >
> > HWETriangBF2(nvec=c(2,10,88))
> [1] 5.710153
> >
>
> In my understanding, defining the genotype frequency as n11 or n22 are
> arbitrary.
> So I was expecting the same value of Bayes factor.
>
> This is the case for conjugate Dirichlet prior:
> >DirichNormHWE(nvec=c(88,10,2), c(1,1))/DirichNormSat(nvec=c(88,10,2),
> c(1,1,1))
> [1] 1.542047
> >DirichNormHWE(nvec=c(2,10,88), c(1,1))/DirichNormSat(nvec=c(2,10,88),
> c(1,1,1))
> [1] 1.542047
>
> Could you explain why the HWETriangBF2 is returining completely different
> values of Bayes Factor??
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/HWEBayes-swapping-the-homozygotes-genotype-frequencies-tp3886313p3886313.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to selectively sum rows [Beginner question]

2011-10-24 Thread Aaron Siirila

Sorry, I attempted to paste the sample data but it must have been stripped
out when I posted. It is hopefully now listed below.

tapply looks useful. I will check it out further.

Here's the sample data:

> flights[1:10,]
   PASSENGERS DISTANCE ORIGIN   ORIGIN_CITY_NAME ORIGIN_WAC DEST
DEST_CITY_NAME DEST_WAC YEAR
1   17266 5995LAXLos Angeles, CA 91  ICN Seoul,
South Korea  778 2010
2   16934 5995LAXLos Angeles, CA 91  ICN Seoul,
South Korea  778 2010
3   15470 5995LAXLos Angeles, CA 91  ICN Seoul,
South Korea  778 2010
4   13997 5995ICN Seoul, South Korea778  LAXLos
Angeles, CA   91 2010
5   13738 5995LAXLos Angeles, CA 91  ICN Seoul,
South Korea  778 2010
6   13682 5995LAXLos Angeles, CA 91  ICN Seoul,
South Korea  778 2010
7   13187 5995ICN Seoul, South Korea778  LAXLos
Angeles, CA   91 2010
8   13051 5995LAXLos Angeles, CA 91  ICN Seoul,
South Korea  778 2010
9   12761 1940SPN Saipan, TT  5  ICN Seoul,
South Korea  778 2010
10  12419 5995ICN Seoul, South Korea778  LAXLos
Angeles, CA   91 2010

Thanks,
Aaron


-Original Message-
From: jim holtman [mailto:jholt...@gmail.com] 
Sent: Monday, October 24, 2011 11:58 AM
To: asindc
Cc: r-help@r-project.org
Subject: Re: [R] How to selectively sum rows [Beginner question]

It would be good to follow the posting guide and at least supply a
sample of the data.

Most likely 'tapply' is one way of doing it:

tapply(df$passenger, list(df$orig, df$dest), sum)

On Mon, Oct 24, 2011 at 11:27 AM, asindc 
wrote:
> Hi, I am new to R so I would appreciate any help. I have some data that
has
> passenger flight data between city pairs. The way I got the data, there
are
> multiple rows of data for each city pair; the number of passengers needs
to
> be summed to get a TOTAL annual passenger count for each city pair.
>
> So my question is: how do I create a new table (or data frame) that
> selectively sums
>
> My initial thought would be to iterate through each row with the following
> logic:
>
> 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add
> them to the table
> 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the
> passengers (and do not add a new row)
>
> Is this logical? If so, I think I just need some help on syntax (or do I
use
> a script?). Thanks.
>
> The first few rows of data look like this:
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question-
tp3933512p3933512.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to selectively sum rows [Beginner question]

2011-10-24 Thread Aaron Siirila

The count() function in the plyr package works beautifully. Thanks to Jim,
Rainer and Dennis for your help. 

Best.

-Original Message-
From: Dennis Murphy [mailto:djmu...@gmail.com] 
Sent: Monday, October 24, 2011 12:05 PM
To: asindc
Cc: r-help@r-project.org
Subject: Re: [R] How to selectively sum rows [Beginner question]

See the count() function in the plyr package; it does fast summation.
Something like

library('plyr')
count(passengerData, c('ORIGIN_WAC', 'DEST_WAC'), 'npassengers')

HTH,
Dennis

On Mon, Oct 24, 2011 at 8:27 AM, asindc  wrote:
> Hi, I am new to R so I would appreciate any help. I have some data that
has
> passenger flight data between city pairs. The way I got the data, there
are
> multiple rows of data for each city pair; the number of passengers needs
to
> be summed to get a TOTAL annual passenger count for each city pair.
>
> So my question is: how do I create a new table (or data frame) that
> selectively sums
>
> My initial thought would be to iterate through each row with the following
> logic:
>
> 1. If the ORIGIN_WAC and DEST_WAC pair are not in the new table, then add
> them to the table
> 2. If the ORIGIN_WAC and DEST_WAC pair already exist, then sum the
> passengers (and do not add a new row)
>
> Is this logical? If so, I think I just need some help on syntax (or do I
use
> a script?). Thanks.
>
> The first few rows of data look like this:
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/How-to-selectively-sum-rows-Beginner-question-
tp3933512p3933512.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dependency-aware scripting tools for R

2012-04-19 Thread Aaron Mackey

shameless self-plug: we break out of R to do this, and after many painful
years developing and maintaining idiosyncratic Makefiles, we are now using
Taverna to (visually) glue together UNIX commands (including R scripts) --
the benefits of which (over make and brethren) is that you can actually
*see* the dependencies and overall workflow (nesting workflows also makes
it easier to manage complexity).

see TavernaPBS:

  http://cphg.virginia.edu/mackey/projects/sequencing-pipelines/tavernapbs/

while designed to automate job submission to a PBS queuing system, you can
also use it to simply execute non-PBS jobs.

--
Aaron J. Mackey, PhD
Assistant Professor
Center for Public Health Genomics
University of Virginia
amac...@virginia.edu
http://www.cphg.virginia.edu/mackey

On Thu, Apr 19, 2012 at 3:27 PM, Sean Davis  wrote:

> There are numerous tools like scons, make, ruffus, ant, rake, etc.
> that can be used to build complex pipelines based on task
> dependencies.  These tools are written in a variety of languages, but
> I have not seen such a thing for R.  Is anyone aware of a package
> available?  The goal is to be able to develop robust bioinformatic
> pipelines driven by scripts written in R.
>
> Thanks,
> Sean
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] MLE Estimation of Gamma Distribution Parameters for data with 'zeros'

2008-06-11 Thread Fox, Aaron

Greetings, all

I am having difficulty getting the fitdistr() function to return without
an error on my data. Specifically, what I'm trying to do is get a
parameter estimation for fracture intensity data in a well / borehole.
Lower bound is 0 (no fractures in the selected data interval), and upper
bound is ~ 10 - 50, depending on what scale you are conducting the
analysis on.

I read in the data from a text file, convert it to numerics, and then
calculate initial estimates of the shape and scale parameters for the
gamma distribution from moments. I then feed this back into the
fitdistr() function.

R code (to this point):

data.raw=c(readLines("FSM_C_9m_ENE.inp"))
data.num <- as.numeric(data.raw)
data.num
library(MASS)
shape.mom = ((mean(data.num))/ (sd(data.num))^2
shape.mom
med.data = mean(data.num)
sd.data = sd(data.num)
med.data
sd.data
shape.mom = (med.data/sd.data)^2
shape.mom
scale.mom = (sd.data^2)/med.data
scale.mom
fitdistr(data.num,"gamma",list(shape=shape.mom,
scale=scale.mom),lower=0)

fitdistr() returns the following error:

" Error in optim(x = c(0.402707037, 0.40348, 0.404383704,
2.432626667,  : 
  L-BFGS-B needs finite values of 'fn'"

Next thing I tried was to manually specify the negative log-likelihood
function and pass it straight to mle() (the method specified in Ricci's
tutorial on fitting distributions with R).  Basically, I got the same
result as using fitdistr().

Finally I tried using some R code I found from someone with a similar
problem back in 2003 from the archives of this mailing list:

R code

gamma.param1 <- shape.mom
gamma.param2 <- scale.mom
log.gamma.param1 <- log(gamma.param1)
log.gamma.param2 <- log(gamma.param2)


   gammaLoglik <-
function(params, 
negative=TRUE){
   lglk <- sum(dgamma(data, shape=exp(params[1]), scale=exp(params[2]),
log=TRUE))
   if(negative)
  return(-lglk)
   else
  return(lglk)
}

optim.list <- optim(c(log.gamma.param1, log.gamma.param2), gammaLoglik)
gamma.param1 <- exp(optim.list$par[1])
gamma.param2 <- exp(optim.list$par[2])
#

If I test this function using my sample data and the estimates of shape
and scale derived from the method of moments, gammaLogLike returns as
INF. I suspect the problem is that the zeros in the data are causing the
optim solver problems when it attempts to minimize the negative
log-likelihood function.

Can anyone suggest some advice on a work-around?  I have seen
suggestions online that a 'censoring' algorithm can allow one to use MLE
methods to estimate the gamma distribution for data with zero values
(Wilkes, 1990, Journal of Climate). I have not, however, found R code to
implement this, and, frankly, am not smart enough to do it myself... :-)

Any suggestions? Has anyone else run up against this and written code to
solve the problem?

Thanks in advance!

Aaron Fox
Senior Project Geologist, Golder Associates
+1 425 882 5484 || +1 425 736 3958 (mobile)
[EMAIL PROTECTED] || www.fracturedreservoirs.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Types in grouped multi-panel (lattice) xyplot

2008-04-10 Thread Aaron Arvey

Apologetic prologue: I've looked through the mailing list for an answer to 
this (since I'm sure it's trivial) but I have not been able to find a fix.

So the problem is that I want each group to have a different type of plot. 
"Probes" should be points and "Segments" should be lines (preferably using 
the segment plot command, but I've just been trying -- unsuccessfully -- 
to get lines to work).

To be exact, the data looks like:

loc  val   valtype  mouse
1428  0.1812367 Probes  2
1439 -0.4534155 Probes  2
1499 -0.4957303 Probes  2
1559  0.2448838 Probes  2
1611 -0.2030937 Probes  2
1788 -0.2235331 Probes  2
1428 0.5Segment 2
1439 0.5Segment 2
1499 0.5Segment 2
1559 0.5Segment 2
1611 0.5Segment 2
1788 0.5Segment 2
1428  0.1812367 Probes  1
1439 -0.4534155 Probes  1
1499 -0.4957303 Probes  1
1559  0.2448838 Probes  1
1611 -0.2030937 Probes  1
1788 -0.2235331 Probes  1
1428 0.5Segment 1
1439 0.5Segment 1
1499 0.5Segment 1
1559 0.1Segment 1
1611 0.1Segment 1
1788 0.1Segment 1


   * loc is the x-axis location
   * val is the y-axis value
   * valtype is equal to "which" had I been smart and used make.groups
   * mouse is the 'cond' variable


The plot command I'm currently using is,

xyplot(val ~ loc | mouse, data = df,
 groups=valtype
 aspect=0.5, layout=c(3,3),
 lty=0, lwd=3, type="p",
 col=c("black", "blue"),
 as.table = TRUE)

which gives me black and blue points for the probes/segments (I've infered 
alphabetical order for the groups colors).  When I change the type to 
c("p", "l"), I get

xyplot(val ~ loc | mouse, data = df,
 groups=valtype
 aspect=0.5, layout=c(3,3),
 lty=0, lwd=3, type=c("p","l"),
 col=c("black", "blue"),
 as.table = TRUE)


I get the exact same plot.  I've tried using a few of the panel functions 
I found on the list (I was particularly hopeful for 
http://tolstoy.newcastle.edu.au/R/help/06/07/30363.html) but I've either 
been misusing them, or they are not right for what I want to do.

If anyone knows how to get points and lines in the same panel for the two 
different groups (probes/segments), I would love to hear about it.

If you further know how to use the 'segment' plot in panels for the 
segments, I would really love to hear about it.

Thanks in advance!

Aaron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Types in grouped multi-panel (lattice) xyplot

2008-04-10 Thread Aaron Arvey

On Thu, 10 Apr 2008, Deepayan Sarkar wrote:

> On 4/10/08, Deepayan Sarkar <[EMAIL PROTECTED]> wrote:
>> On 4/10/08, Aaron Arvey <[EMAIL PROTECTED]> wrote:
>> > Apologetic prologue: I've looked through the mailing list for an answer to
>> >  this (since I'm sure it's trivial) but I have not been able to find a fix.
>> >
>> >  So the problem is that I want each group to have a different type of plot.
>> >  "Probes" should be points and "Segments" should be lines (preferably using
>> >  the segment plot command, but I've just been trying -- unsuccessfully --
>> >  to get lines to work).
>> >
>> >  To be exact, the data looks like:
>> >
>> >  loc  val   valtype  mouse
>> >  1428  0.1812367 Probes  2
>> >  1439 -0.4534155 Probes  2
>> >  1499 -0.4957303 Probes  2
>> >  1559  0.2448838 Probes  2
>> >  1611 -0.2030937 Probes  2
>> >  1788 -0.2235331 Probes  2
>> >  1428 0.5Segment 2
>> >  1439 0.5Segment 2
>> >  1499 0.5Segment 2
>> >  1559 0.5Segment 2
>> >  1611 0.5Segment 2
>> >  1788 0.5Segment 2
>> >  1428  0.1812367 Probes  1
>> >  1439 -0.4534155 Probes  1
>> >  1499 -0.4957303 Probes  1
>> >  1559  0.2448838 Probes  1
>> >  1611 -0.2030937 Probes  1
>> >  1788 -0.2235331 Probes  1
>> >  1428 0.5Segment 1
>> >  1439 0.5Segment 1
>> >  1499 0.5Segment 1
>> >  1559 0.1Segment 1
>> >  1611 0.1Segment 1
>> >  1788 0.1Segment 1
>> >
>> >
>> >* loc is the x-axis location
>> >* val is the y-axis value
>> >* valtype is equal to "which" had I been smart and used make.groups
>> >* mouse is the 'cond' variable
>> >
>> >
>> >  The plot command I'm currently using is,
>> >
>> >  xyplot(val ~ loc | mouse, data = df,
>> >  groups=valtype
>> >  aspect=0.5, layout=c(3,3),
>> >  lty=0, lwd=3, type="p",
>> >  col=c("black", "blue"),
>> >  as.table = TRUE)
>> >
>> >  which gives me black and blue points for the probes/segments (I've infered
>> >  alphabetical order for the groups colors).  When I change the type to
>> >  c("p", "l"), I get
>> >
>> >  xyplot(val ~ loc | mouse, data = df,
>> >  groups=valtype
>> >  aspect=0.5, layout=c(3,3),
>> >  lty=0, lwd=3, type=c("p","l"),
>> >  col=c("black", "blue"),
>> >  as.table = TRUE)
>>
>>
>> Try
>>
>>
>>  xyplot(val ~ loc | mouse, data = df,
>>
>>groups=valtype,
>>type=c("p","l"),
>>## distribute.type = TRUE,
>
> Sorry, that should be
>
>   distribute.type = TRUE,
>
>>col=c("black", "blue"))

That did exactly what I was looking for!  I now have a very nice lattice 
plot with points and lines!


>> >  I get the exact same plot.  I've tried using a few of the panel functions
>> >  I found on the list (I was particularly hopeful for
>> >  http://tolstoy.newcastle.edu.au/R/help/06/07/30363.html) but I've either
>> >  been misusing them, or they are not right for what I want to do.
>> >
>> >  If anyone knows how to get points and lines in the same panel for the two
>> >  different groups (probes/segments), I would love to hear about it.
>> >
>> >  If you further know how to use the 'segment' plot in panels for the
>> >  segments, I would really love to hear about it.
>>
>>
>> Well, panel.segments() draws segments, but you need your data in the
>>  form (x1, y1, x2, y2) for that. With your setup, it's probably easier
>>  to have lines with some NA-s inserted wherever you want line breaks.

That works perfectly!  I was just planning on reformating the data, but 
this makes life even easier!

Thanks!

Aaron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] variable names when using S3 methods

2008-04-28 Thread Aaron Rendahl

I'm seeing some funny behavior when using methods (the older S3 type)  
and having variables that start with the same letter.  I have a vague  
recollection of reading something about this once but now can't seem  
to find anything in the documentation.  Any explanation, or a link to  
the proper documentation, if it does exist, would be appreciated.


Thanks, Aaron Rendahl
University of Minnesota School of Statistics


# set up two function that both use method "foo" but with different  
variable names

fooA<-function(model,...)
UseMethod("foo")
fooB<-function(Bmodel,...)
UseMethod("foo")

# now set up two methods (default and character) that have an  
additional variable

foo.character <- function(model, m=5,...)
cat("foo.character: m is", m, "\n")
foo.default <- function(model, m=5,...)
cat("foo.default: m is", m, "\n")

# both of these use foo.character, as expected
fooA("hi")
fooB("hi")

# but here, fooA uses foo.default instead
fooA("hi",m=1)
fooB("hi",m=1)

# additionally, these use foo.character, as expected
fooA("hi",1)
fooA(model="hi",m=1)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R crash using rpanel on mac os x

2007-11-27 Thread Aaron Robotham

Hello,
I've recently discovered a persistent issue with rpanel when running
R.app (2.6.1) on Mac OS X 10.4.11. tcltk and rpanel load without any
apparent error, and the interactive panels appear to work as expected,
however upon closing the panels rpanel has created I get catastrophic
errors and R crashes completely. For the most part R manages to crash
with dignity and work can be saved, but sometimes it will crash
straight out. Below is an example of an entire work session (only base
packages loaded) with the crash at the end typical of those
encountered:

> library(tcltk)
Loading Tcl/Tk interface ... done
> library(rpanel)
Package `rpanel', version 1.0-4
type help(rpanel) for summary information
> density.draw <- function(panel) {
+   plot(density(panel$x, bw = panel$h))
+   panel
+ }
> panel <- rp.control(x = rnorm(50))
> rp.slider(panel, h, 0.5, 5, log = TRUE, action = density.draw)

 *** caught bus error ***
address 0x0, cause 'non-existent physical address'

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

All packages that are required are up to date, and I can find no
evidence of similar issues from searching the mailing lists. Any
suggestions would be appreciated.

Aaron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with lm and multiple linear regression?

2007-12-27 Thread Aaron Barzilai

Hello,

I'm new to R, but I've read the intro to R and successfully connected it to an 
instance of mysql.  I'm trying to perform multiple linear regression, but I'm 
having trouble using the lm function.  To start, I have read in a simply y 
matrix of values(dependent variable) and x matrix of independent variables.  It 
says both are data frames, but lm is giving me an error that my y variable is a 
list.

Any suggestions on how to do this?  It's not clear to me what the problem is as 
they're both data frames.  My actual problem will use a much wider matrix of 
coefficients, I've only included two for illustration.  

Additionally, I'd actually like to weight the observations.  How would I go 
about doing that?  I also have that as a separate column vector.

Thanks,
Aaron

Here's my session:

> margin
margin
166.67
2   -58.33
3   100.00
4   -33.33
5   200.00
6   -83.33
7  -100.00
8 0.00
9   100.00
10  -18.18
11  -55.36
12 -125.00
13  -33.33
14 -200.00
150.00
16 -100.00
17   75.00
180.00
19 -200.00
20   35.71
21  100.00
22   50.00
23  -86.67
24  165.00
> personcoeff
   Person1 Person2
1   -1   1
2   -1   1
3   -1   1
4   -1   1
5   -1   1
6   -1   1
70   0
80   0
90   1
10  -1   1
11  -1   1
12  -1   1
13  -1   1
14  -1   0
15   0   0
16   0   0
17   0   1
18  -1   1
19  -1   1
20  -1   1
21  -1   1
22  -1   1
23  -1   1
24  -1   1
> class(margin)
[1] "data.frame"
> class(personcoeff)
[1] "data.frame"
> lm(margin~personcoeff)
Error in model.frame(formula, rownames, variables, varnames, extras, 
extranames,  : 
invalid type (list) for variable 'margin'


  

Be a better friend, newshound, and 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with lm and multiple linear regression? (Plain Text version)

2007-12-27 Thread Aaron Barzilai

(Apologies the previous version was sent as rich text)

Hello,
I'm new to R, but I've read the intro to R and successfully connected it to an 
instance of mysql.  I'm trying to perform multiple linear regression, but I'm 
having trouble using the lm function.  To start, I have read in a simply y 
matrix of values(dependent variable) and x matrix of independent variables.  It 
says both are data frames, but lm is giving me an error that my y variable is a 
list.

Any suggestions on how to do this?  It's not clear to me what the problem is as 
they're both data frames.  My actual problem will use a much wider matrix of 
coefficients, I've only included two for illustration.  

Additionally, I'd actually like to weight the observations.  How would I go 
about doing that?  I also have that as a separate column vector.

Thanks,
Aaron

Here's my session:
> margin
margin
166.67
2   -58.33
3   100.00
4   -33.33
5   200.00
6   -83.33
7  -100.00
8 0.00
9   100.00
10  -18.18
11  -55.36
12 -125.00
13  -33.33
14 -200.00
150.00
16 -100.00
17   75.00
180.00
19 -200.00
20   35.71
21  100.00
22   50.00
23  -86.67
24  165.00
> personcoeff
   Person1 Person2
1   -1   1
2   -1   1
3   -1   1
4   -1   1
5   -1   1
6   -1   1
70   0
80   0
90   1
10  -1   1
11  -1   1
12  -1   1
13  -1   1
14  -1   0
15   0   0
16   0   0
17   0   1
18  -1   1
19  -1   1
20  -1   1
21  -1   1
22  -1   1
23  -1   1
24  -1   1
> class(margin)
[1] "data.frame"
> class(personcoeff)
[1] "data.frame"
> lm(margin~personcoeff)
Error in model.frame(formula, rownames, variables, varnames, extras, 
extranames,  : 
invalid type (list) for variable 'margin'


  

Be a better friend, newshound, and

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with lm and multiple linear regression? (Plain Text version)

2007-12-27 Thread Aaron Barzilai

Tim (and others who responded privately),

Thanks for the help, this approach did work.  I have also reread ?lm a little 
more closely, I do see the weights functionality.

I have one last question: Now that I understand how to call this function and 
review the results, I want to extend it to my much larger real problem, with 
100s of columns.  Is there a way to call the function in more of a matrix 
algebra syntax, where I would list the matrix(e.g. personcoeff) rather than the 
individual column names?  It seems like I might need to use lm.wfit, but per 
the help I'd rather use lm.

Thanks,
Aaron

- Original Message 
From: Tim Calkins <[EMAIL PROTECTED]>
To: Aaron Barzilai <[EMAIL PROTECTED]>
Cc: r-help@r-project.org
Sent: Thursday, December 27, 2007 6:55:57 PM
Subject: Re: [R] Help with lm and multiple linear regression? (Plain Text 
version)

consider merging everything into a singe dataframe.  i haven't tried
it, but something like the following could work:

> reg.data <- cbind(margin, personcoeff)
> names(reg.data) <- c('margin', 'p1', 'p2')
> lm(margin~p1+p2, data = reg.data)

the idea here is that by specifying the data frame with the data
argument in lm, R looks for the columns of the names specified in the
formula.

for weights, see ?lm and look for the weights argument.

cheers,
tc

On Dec 28, 2007 10:22 AM, Aaron Barzilai <[EMAIL PROTECTED]> wrote:
> (Apologies the previous version was sent as rich text)
>
> Hello,
> I'm new to R, but I've read the intro to R and successfully connected it to 
> an instance of mysql.  I'm trying to perform multiple linear regression, but 
> I'm having trouble using the lm function.  To start, I have read in a simply 
> y matrix of values(dependent variable) and x matrix of independent variables. 
>  It says both are data frames, but lm is giving me an error that my y 
> variable is a list.
>
> Any suggestions on how to do this?  It's not clear to me what the problem is 
> as they're both data frames.  My actual problem will use a much wider matrix 
> of coefficients, I've only included two for illustration.
>
> Additionally, I'd actually like to weight the observations.  How would I go 
> about doing that?  I also have that as a separate column vector.
>
> Thanks,
> Aaron
>
> Here's my session:
> > margin
>margin
> 166.67
> 2  -58.33
> 3  100.00
> 4  -33.33
> 5  200.00
> 6  -83.33
> 7  -100.00
> 80.00
> 9  100.00
> 10  -18.18
> 11  -55.36
> 12 -125.00
> 13  -33.33
> 14 -200.00
> 150.00
> 16 -100.00
> 17  75.00
> 180.00
> 19 -200.00
> 20  35.71
> 21  100.00
> 22  50.00
> 23  -86.67
> 24  165.00
> > personcoeff
>Person1 Person2
> 1  -1  1
> 2  -1  1
> 3  -1  1
> 4  -1  1
> 5  -1  1
> 6  -1  1
> 70  0
> 80  0
> 90  1
> 10  -1  1
> 11  -1  1
> 12  -1  1
> 13  -1  1
> 14  -1  0
> 15  0  0
> 16  0  0
> 17  0  1
> 18  -1  1
> 19  -1  1
> 20  -1  1
> 21  -1  1
> 22  -1  1
> 23  -1  1
> 24  -1  1
> > class(margin)
> [1] "data.frame"
> > class(personcoeff)
> [1] "data.frame"
> > lm(margin~personcoeff)
> Error in model.frame(formula, rownames, variables, varnames, extras, 
> extranames,  :
>invalid type (list) for variable 'margin'
>
>
>  
> 
> Be a better friend, newshound, and
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Tim Calkins
0406 753 997

Be a better friend, newshound, and

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gumbell distribution - minimum case

2008-09-09 Thread Aaron Mackey

If you mean you want an EVD with a fat left tail (instead of a fat
right tail), then can;t you just multiply all the values by -1 to
"reverse" the distribution?  A new location parameter could then shift
the distribution wherever you want along the number line ...

-Aaron

On Mon, Sep 8, 2008 at 5:22 PM, Richard Gwozdz <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I would like to sample from a Gumbell (minimum) distribution.  I have
> installed package {evd} but the Gumbell functions there appear to refer to
> the maximum case.  Unfortunately, setting the scale parameter negative does
> not appear to work.
>
> Is there a separate package for the Gumbell minimum?
>
>
> --
> _
> Rich Gwozdz
> Fire and Mountain Ecology Lab
> College of Forest Resources
> University of Washington
> cell: 206-769-6808 office: 206-543-9138
> [EMAIL PROTECTED]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] database table merging tips with R

2008-09-11 Thread Aaron Mackey

I would load your set of userid's into a temporary table in oracle,
then join that table with the rest of your SQL query to get only the
matching rows out.

-Aaron

On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony <[EMAIL PROTECTED]> wrote:
>
> Dear R list,
>
> What is the best way to efficiently marry an R dataset with a very large 
> (Oracle) database table?
>
> The goal is to only return Oracle table rows that match IDs present in the R 
> dataset.
> I have an R data frame with 2000 user IDs analogous to: r = 
> data.frame(userid=round(runif(2000)*10,0))
>
> ...and I need to pull data from an Oracle table only for these 2000 IDs.  The 
> Oracle table is quite large. Additionally, the sql query may need to join to 
> other tables to bring in ancillary fields.
>
> I currently connect to Oracle via odbc:
>
> library(RODBC)
> connection <- odbcConnect("", uid="", pwd="")
> d = sqlQuery(connection, "select userid, x, y, z from largetable where 
> timestamp > sysdate -7")
>
> ...allowing me to pull data from the database table into the R object "d" and 
> then use the R merge function.  The problem however is that if "d" is too 
> large it may fail due to memory limitations or be inefficient.  I would like 
> to push the merge portion to the database and it would be very convenient if 
> it were possible to request that the query look to the R object for the ID's 
> to which it should restrict the output.
>
> Is there a way to do this?
> Something like the following fictional code:
> d = sqlQuery(connection, "select t.userid, x, y, z from largetable t where 
> r$userid=t.userid")
>
> Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how?   
> This would be convenient and help me avoid needing to create a temporary 
> table to store the R data, join via sql, then return the data back to R.
>
> I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 .
> Thanks for your comments, ideas, recommendations.
>
>
> -Avram
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] database table merging tips with R

2008-09-11 Thread Aaron Mackey

Sorry, I see now you want to avoid this, but you did ask what was the
"best way to efficiently ...", and the temp. table solution certainly
matches your description.  What's wrong with using a temporary table?

-Aaron

On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey <[EMAIL PROTECTED]> wrote:
> I would load your set of userid's into a temporary table in oracle,
> then join that table with the rest of your SQL query to get only the
> matching rows out.
>
> -Aaron
>
> On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony <[EMAIL PROTECTED]> wrote:
>>
>> Dear R list,
>>
>> What is the best way to efficiently marry an R dataset with a very large 
>> (Oracle) database table?
>>
>> The goal is to only return Oracle table rows that match IDs present in the R 
>> dataset.
>> I have an R data frame with 2000 user IDs analogous to: r = 
>> data.frame(userid=round(runif(2000)*10,0))
>>
>> ...and I need to pull data from an Oracle table only for these 2000 IDs.  
>> The Oracle table is quite large. Additionally, the sql query may need to 
>> join to other tables to bring in ancillary fields.
>>
>> I currently connect to Oracle via odbc:
>>
>> library(RODBC)
>> connection <- odbcConnect("", uid="", pwd="")
>> d = sqlQuery(connection, "select userid, x, y, z from largetable where 
>> timestamp > sysdate -7")
>>
>> ...allowing me to pull data from the database table into the R object "d" 
>> and then use the R merge function.  The problem however is that if "d" is 
>> too large it may fail due to memory limitations or be inefficient.  I would 
>> like to push the merge portion to the database and it would be very 
>> convenient if it were possible to request that the query look to the R 
>> object for the ID's to which it should restrict the output.
>>
>> Is there a way to do this?
>> Something like the following fictional code:
>> d = sqlQuery(connection, "select t.userid, x, y, z from largetable t where 
>> r$userid=t.userid")
>>
>> Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how?  
>>  This would be convenient and help me avoid needing to create a temporary 
>> table to store the R data, join via sql, then return the data back to R.
>>
>> I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 .
>> Thanks for your comments, ideas, recommendations.
>>
>>
>> -Avram
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] database table merging tips with R

2008-09-11 Thread Aaron Mackey

I guess I'd do it something like this:

dbGetQuery(con, "CREATE TEMPORARY TABLE foo ( etc etc)")
sapply(@userids, function (x) { dbGetQuery(con, paste("INSERT INTO foo
(userid) VALUES (", x, ")")) })

then later:

dbGetQuery(con, "DROP TABLE foo");

-Aaron

On Thu, Sep 11, 2008 at 3:21 PM, Avram Aelony <[EMAIL PROTECTED]> wrote:
>
> Perhaps I will need to create a temp table, but I am asking if there is a way 
> to avoid it.  It would be great if there were a way to tie the R data frame 
> temporarily to the query in a transparent fashion. If not, I will see if I 
> can create/drop the temp table directly from sqlQuery.
> -Avram
>
>
>
> On Thursday, September 11, 2008, at 12:07PM, "Aaron Mackey" <[EMAIL 
> PROTECTED]> wrote:
>>Sorry, I see now you want to avoid this, but you did ask what was the
>>"best way to efficiently ...", and the temp. table solution certainly
>>matches your description.  What's wrong with using a temporary table?
>>
>>-Aaron
>>
>>On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey <[EMAIL PROTECTED]> wrote:
>>> I would load your set of userid's into a temporary table in oracle,
>>> then join that table with the rest of your SQL query to get only the
>>> matching rows out.
>>>
>>> -Aaron
>>>
>>> On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Dear R list,
>>>>
>>>> What is the best way to efficiently marry an R dataset with a very large 
>>>> (Oracle) database table?
>>>>
>>>> The goal is to only return Oracle table rows that match IDs present in the 
>>>> R dataset.
>>>> I have an R data frame with 2000 user IDs analogous to: r = 
>>>> data.frame(userid=round(runif(2000)*10,0))
>>>>
>>>> ...and I need to pull data from an Oracle table only for these 2000 IDs.  
>>>> The Oracle table is quite large. Additionally, the sql query may need to 
>>>> join to other tables to bring in ancillary fields.
>>>>
>>>> I currently connect to Oracle via odbc:
>>>>
>>>> library(RODBC)
>>>> connection <- odbcConnect("", uid="", pwd="")
>>>> d = sqlQuery(connection, "select userid, x, y, z from largetable where 
>>>> timestamp > sysdate -7")
>>>>
>>>> ...allowing me to pull data from the database table into the R object "d" 
>>>> and then use the R merge function.  The problem however is that if "d" is 
>>>> too large it may fail due to memory limitations or be inefficient.  I 
>>>> would like to push the merge portion to the database and it would be very 
>>>> convenient if it were possible to request that the query look to the R 
>>>> object for the ID's to which it should restrict the output.
>>>>
>>>> Is there a way to do this?
>>>> Something like the following fictional code:
>>>> d = sqlQuery(connection, "select t.userid, x, y, z from largetable t where 
>>>> r$userid=t.userid")
>>>>
>>>> Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, 
>>>> how?   This would be convenient and help me avoid needing to create a 
>>>> temporary table to store the R data, join via sql, then return the data 
>>>> back to R.
>>>>
>>>> I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 .
>>>> Thanks for your comments, ideas, recommendations.
>>>>
>>>>
>>>> -Avram
>>>>
>>>> __
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] XML package help

2009-01-23 Thread Skewes,Aaron

Please consider this:

http://www.w3.org/2001/XMLSchema-instance"; >


./XYZ


10
./XYZ/







I am attempting to use XML package and xpathSApply() to extract, say, the 
eValue attribute for eName=='0ne' for all  nodes that have 
==10.  I try the following, amoung several things:

doc<-xmlInternalTreeParse(Manifest)
Root = xmlRoot(doc)
xpathSApply(Root, 
"//File[FileTypeId=10]/PatientCharacteristics/[...@ename='one']", xmlAttrs)

and it does not work.

Might somebody help me with the syntax here?

Thanks a lot!!
Aaron


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] XML package help

2009-01-26 Thread Skewes,Aaron

Thanks! Works like a charm.

-Aaron


From: Duncan Temple Lang [dun...@wald.ucdavis.edu]
Sent: Friday, January 23, 2009 6:48 PM
To: Skewes,Aaron
Cc: r-help@r-project.org
Subject: Re: [R] XML package help

Skewes,Aaron wrote:
> Please consider this:
>
> http://www.w3.org/2001/XMLSchema-instance"; >
> 
>
> ./XYZ
> 
> 
> 10
> ./XYZ/
> 
> 
> 
> 
> 
> 
>
> I am attempting to use XML package and xpathSApply() to extract, say, the 
> eValue attribute for eName=='0ne' for all  nodes that have 
> ==10.  I try the following, amoung several things:
>


  getNodeSet(doc,
"//File[FileTypeId/text()='10']/patientcharacteristi...@ename='one']/@eValue")


should do it.
You need to compare the text() of the FileTypeId node.
And the / after the PatientCharacterstics and before the [] will cause
trouble.


HTH,

   D.


> doc<-xmlInternalTreeParse(Manifest)
> Root = xmlRoot(doc)
> xpathSApply(Root, 
> "//File[FileTypeId=10]/PatientCharacteristics/[...@ename='one']", xmlAttrs)
>
> and it does not work.
>
> Might somebody help me with the syntax here?
>
> Thanks a lot!!
> Aaron
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] XML package- accessing nodes based on attributes

2009-02-09 Thread Skewes,Aaron

Hi,

I have a rather complex xml document that I am attempting to parse based on 
attributes:

http://www.w3.org/2001/XMLSchema-instance";>
  
  D:\CN_data\Agilent\Results\
  










File>






















My requirement is to access eValues at each  node based on FileTypeId. 
For example:

How can I get the eValue of eName="PatientReference" for all Type="Patient" 
,where the ?

i.e. "TCGA-06-0875-01A" and "TCGA-06-0875-02A"


For the life of me, I can not get this to work!

Thanks,
-Aaron



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] an S idiom for ordering matrix by columns?

2009-02-19 Thread Aaron Mackey

There's got to be a better way to use order() on a matrix than this:

> y
2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4
3L-173-2
3981 1 221 12
2
8571 1 221 22
2
9111 1 221 22
2
3831 1 221 12
2
6391 2 212 21
2
7561 2 212 21
2
3L-186-1 3R-013-7 3R-032-1 3R-169-10 X-002 X-087
398122 2 1 2
857122 2 1 2
911122 2 1 2
383122 2 1 2
639221 2 1 2
756221 2 1 2

>
y[order(y[,1],y[,2],y[,3],y[,4],y[,5],y[,6],y[,7],y[,8],y[,9],y[,10],y[,11],y[,12],y[,13],y[,14]),]
2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4
3L-173-2
3981 1 221 12
2
3831 1 221 12
2
8571 1 221 22
2
9111 1 221 22
2
6391 2 212 21
2
7561 2 212 21
2
3L-186-1 3R-013-7 3R-032-1 3R-169-10 X-002 X-087
398122 2 1 2
383122 2 1 2
857122 2 1 2
911122 2 1 2
639221 2 1 2
756221 2 1 2

Thanks for any suggestions!

-Aaron

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] an S idiom for ordering matrix by columns?

2009-02-19 Thread Aaron Mackey

Thanks to all, "do.call(order, as.data.frame(y))" was the idiom I was
missing!

-Aaron

On Thu, Feb 19, 2009 at 11:52 AM, Gustaf Rydevik
wrote:

> On Thu, Feb 19, 2009 at 5:40 PM, Aaron Mackey  wrote:
> > There's got to be a better way to use order() on a matrix than this:
> >
> >> y
> >2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4
> > 3L-173-2
> > 3981 1 221 12
> > 2
> > 8571 1 221 22
> > 2
> > 9111 1 221 22
> > 2
> > 3831 1 221 12
> > 2
> > 6391 2 212 21
> > 2
> > 7561 2 212 21
> > 2
> >3L-186-1 3R-013-7 3R-032-1 3R-169-10 X-002 X-087
> > 398122 2 1 2
> > 857122 2 1 2
> > 911122 2 1 2
> > 383122 2 1 2
> > 639221 2 1 2
> > 756221 2 1 2
> >
> >>
> >
> y[order(y[,1],y[,2],y[,3],y[,4],y[,5],y[,6],y[,7],y[,8],y[,9],y[,10],y[,11],y[,12],y[,13],y[,14]),]
> >2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4
> > 3L-173-2
> > 3981 1 221 12
> > 2
> > 3831 1 221 12
> > 2
> > 8571 1 221 22
> > 2
> > 9111 1 221 22
> > 2
> > 6391 2 212 21
> > 2
> > 7561 2 212 21
> > 2
> >3L-186-1 3R-013-7 3R-032-1 3R-169-10 X-002 X-087
> > 398122 2 1 2
> > 383122 2 1 2
> > 857122 2 1 2
> > 911122 2 1 2
> > 639221 2 1 2
> > 756221 2 1 2
> >
> > Thanks for any suggestions!
> >
> > -Aaron
> >
>
>
> You mean something like this:
> > test<-matrix(sample(1:4,100,replace=T),ncol=10)
> > test[do.call(order,data.frame(test)),]
>
> ?
>
> Regards,
>
> Gustaf
>
>
> --
> Gustaf Rydevik, M.Sci.
> tel: +46(0)703 051 451
> address:Essingetorget 40,112 66 Stockholm, SE
> skype:gustaf_rydevik
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with R and MySQL

2009-02-23 Thread Aaron Barzilai

Hello,

This forum has been very helpful to me in the past, and I've run out of ideas 
on how to solve my problem.

I had been using R and MySQL (and Perl) together for quite some time 
successfully on my Windows XP machine. However, I recently had some problems 
with MySQL (the ibdata file had become 35GB on my hard drive, turns out it's a 
known bug with InnoDB), and ultimately the way I fixed my problem with MySQL 
was to upgrade it.  It's working fine now, I can use MySQL however I'd like.  
I'm sticking to MyISAM tables for now, though.

However, I had set up my system so I did a linear regression in R. Originally, 
this was done in R 2.5.0, I would load in the tables from MySQL to R and then 
conduct the regression in R.  However, after solving my MySQL problem, I ran 
into a strange error in R (and DBI/RMySQL).  R connected to the database just 
fine, and I could even show the tables in the database and load two of them 
into R.  However, the tables I loaded successfully were only a single column.  
Every time I tried to load in a recordset that was multiple columns, I got a 
relatively nondescript Windows error("R for Windows terminal front-end has 
encountered a problem and needs to close.  We are sorry for the 
inconvenience.").  To verify that it wasn't a memory issue,  I even tried "rs 
<- dbSendQuery(con, "select 'a', 'b'")".  This statement causes the error as 
well.

I tried upgrading the packages, and upgrading R from 2.5.0 to 2.8.1.  However, 
I still get the same errors.  Has anyone run into this problem before?  Any 
suggestions on how to solve it?

Thanks in advance,
Aaron


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with R and MySQL

2009-02-23 Thread Aaron Barzilai

Thanks Jeff, that was exactly the problem.  When I unzipped the version at the 
page below for my version of MySQL (5.1), it worked fine.  The version I 
downloaded through install.packages() must have been for 5.0.

Thanks so much for the help and quick response,
Aaron






From: Jeffrey Horner 

Cc: R-help@r-project.org
Sent: Monday, February 23, 2009 10:10:02 AM
Subject: Re: [R] Help with R and MySQL

Aaron Barzilai wrote:
> Hello,
> 
> This forum has been very helpful to me in the past, and I've run out of ideas 
> on how to solve my problem.
> 
> I had been using R and MySQL (and Perl) together for quite some time 
> successfully on my Windows XP machine. However, I recently had some problems 
> with MySQL (the ibdata file had become 35GB on my hard drive, turns out it's 
> a known bug with InnoDB), and ultimately the way I fixed my problem with 
> MySQL was to upgrade it.  It's working fine now, I can use MySQL however I'd 
> like.  I'm sticking to MyISAM tables for now, though.
> 
> However, I had set up my system so I did a linear regression in R. 
> Originally, this was done in R 2.5.0, I would load in the tables from MySQL 
> to R and then conduct the regression in R.  However, after solving my MySQL 
> problem, I ran into a strange error in R (and DBI/RMySQL).  R connected to 
> the database just fine, and I could even show the tables in the database and 
> load two of them into R.  However, the tables I loaded successfully were only 
> a single column.  Every time I tried to load in a recordset that was multiple 
> columns, I got a relatively nondescript Windows error("R for Windows terminal 
> front-end has encountered a problem and needs to close.  We are sorry for the 
> inconvenience.").  To verify that it wasn't a memory issue,  I even tried "rs 
> <- dbSendQuery(con, "select 'a', 'b'")".  This statement causes the error as 
> well.
> 
> I tried upgrading the packages, and upgrading R from 2.5.0 to 2.8.1.  
> However, I still get the same errors.  Has anyone run into this problem 
> before?  Any suggestions on how to solve it?


Hi Aaron,

Be sure to read the details of the RMySQL web page:

http://biostat.mc.vanderbilt.edu/RMySQL

You need to make sure and match the version of your MySQL client library (not 
the running MySQL server) with the RMySQL binary that you choose from the web 
page above.

Best,

Jeff
-- http://biostat.mc.vanderbilt.edu/JeffreyHorner



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ordering

2009-03-10 Thread aaron wells


Hello, I would like to order a matrix by a specific column. For instance:

 

> test
  [,1] [,2] [,3]
 [1,]1  100   21
 [2,]23   22
 [3,]3  100   23
 [4,]4   60   24
 [5,]5   55   25
 [6,]6   45   26
 [7,]7   75   27
 [8,]8   12   28
 [9,]9   10   29
[10,]   10   22   30
>

 test[order(test[,2]),]
  [,1] [,2] [,3]
 [1,]23   22
 [2,]9   10   29
 [3,]8   12   28
 [4,]   10   22   30
 [5,]6   45   26
 [6,]5   55   25
 [7,]4   60   24
 [8,]7   75   27
 [9,]1  100   21
[10,]3  100   23


This works well and good in the above example matrix.  However in the matrix 
that I actually want to sort (derived from a function that I wrote) I get 
something like this:

 

> test[order(as.numeric(test[,2])),] ### First column is row.names


 f con f.1 cov f.2 minimum f.3 maximum f.4   cl
asahi* 100   *   1   * 0.1   *   2   * test
castet   * 100   *   2   * 0.1   *   5   * test
clado* 100   *   1   * 0.7   *   2   * test
aulac*  33   *   0   * 0.1   * 0.1   * test
buell*  33   *   0   * 0.1   * 0.1   * test
camlas   *  33   *   0   * 0.1   * 0.1   * test
carbig   *  33   *   1   *   1   *   1   * test
poaarc   *  67   *   0   * 0.1   * 0.1   * test
polviv   *  67   *   0   * 0.1   * 0.1   * test


 

where R interprets 100 to be the lowest value and orders increasing from there. 
 

 

> is.numeric(test[,2])
[1] FALSE
> is.double(test[,2])
[1] FALSE
> is.integer(test[,2])
[1] FALSE
> is.real(test[,2])
[1] FALSE


 

My questions are:  Why is this happening? and How do I fix  it? 

 

Thanks in advance!

 

  Aaron Wells

_


cns!503D1D86EBB2B53C!2285.entry?ocid=TXT_TAGLM_WL_UGC_Contacts_032009
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ordering

2009-03-10 Thread aaron wells


Thanks Peter, that did the trick.  I'll modify my function so that the numeric 
conversion is done automatically thus saving me the extra step of converting 
later on.

 

  Aaron Wells
 
> Subject: RE: [R] ordering
> Date: Wed, 11 Mar 2009 08:41:50 +1300
> From: palsp...@hortresearch.co.nz
> To: awell...@hotmail.com; r-help@r-project.org
> 
> Kia ora Aaron
> 
> As you have identified, test[,2] is not numeric - it is probably factor.
> Your function must have made the conversion, so you may want to modify
> that. Alternative, try:
> 
> test[order(as.numeric(as.character(test[,2]))),] 
> 
> BTW, str(test) is a good way to find out more about the structure of
> your object.
> 
> HTH 
> 
> Peter Alspach
> 
> 
> 
> 
> > -Original Message-
> > From: r-help-boun...@r-project.org 
> > [mailto:r-help-boun...@r-project.org] On Behalf Of aaron wells
> > Sent: Wednesday, 11 March 2009 8:30 a.m.
> > To: r-help@r-project.org
> > Subject: [R] ordering
> > 
> > 
> > Hello, I would like to order a matrix by a specific column. 
> > For instance:
> > 
> > 
> > 
> > > test
> > [,1] [,2] [,3]
> > [1,] 1 100 21
> > [2,] 2 3 22
> > [3,] 3 100 23
> > [4,] 4 60 24
> > [5,] 5 55 25
> > [6,] 6 45 26
> > [7,] 7 75 27
> > [8,] 8 12 28
> > [9,] 9 10 29
> > [10,] 10 22 30
> > >
> > 
> > test[order(test[,2]),]
> > [,1] [,2] [,3]
> > [1,] 2 3 22
> > [2,] 9 10 29
> > [3,] 8 12 28
> > [4,] 10 22 30
> > [5,] 6 45 26
> > [6,] 5 55 25
> > [7,] 4 60 24
> > [8,] 7 75 27
> > [9,] 1 100 21
> > [10,] 3 100 23
> > 
> > 
> > This works well and good in the above example matrix. 
> > However in the matrix that I actually want to sort (derived 
> > from a function that I wrote) I get something like this:
> > 
> > 
> > 
> > > test[order(as.numeric(test[,2])),] ### First column is row.names
> > 
> > 
> > f con f.1 cov f.2 minimum f.3 maximum f.4 cl
> > asahi * 100 * 1 * 0.1 * 2 * test
> > castet * 100 * 2 * 0.1 * 5 * test
> > clado * 100 * 1 * 0.7 * 2 * test
> > aulac * 33 * 0 * 0.1 * 0.1 * test
> > buell * 33 * 0 * 0.1 * 0.1 * test
> > camlas * 33 * 0 * 0.1 * 0.1 * test
> > carbig * 33 * 1 * 1 * 1 * test
> > poaarc * 67 * 0 * 0.1 * 0.1 * test
> > polviv * 67 * 0 * 0.1 * 0.1 * test
> > 
> > 
> > 
> > 
> > where R interprets 100 to be the lowest value and orders 
> > increasing from there. 
> > 
> > 
> > 
> > > is.numeric(test[,2])
> > [1] FALSE
> > > is.double(test[,2])
> > [1] FALSE
> > > is.integer(test[,2])
> > [1] FALSE
> > > is.real(test[,2])
> > [1] FALSE
> > 
> > 
> > 
> > 
> > My questions are: Why is this happening? and How do I fix it? 
> > 
> > 
> > 
> > Thanks in advance!
> > 
> > 
> > 
> > Aaron Wells
> > 
> > _
> > 
> > 
> > cns!503D1D86EBB2B53C!2285.entry?ocid=TXT_TAGLM_WL_UGC_Contacts_032009
> > [[alternative HTML version deleted]]
> > 
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> 
> The contents of this e-mail are confidential and may be subject to legal 
> privilege.
> If you are not the intended recipient you must not use, disseminate, 
> distribute or
> reproduce all or any part of this e-mail or attachments. If you have received 
> this
> e-mail in error, please notify the sender and delete all material pertaining 
> to this
> e-mail. Any opinion or views expressed in this e-mail are those of the 
> individual
> sender and may not represent those of The New Zealand Institute for Plant and
> Food Research Limited.

_



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] geometric mean of probability density functions

2009-03-18 Thread Aaron Spivak

Hi,
This is my first time posting to the mailing list, so if I'm doing something
wrong, just let me know.  I've taken ~1000 samples from 8 biological
replicates, and I want to somehow combine the density functions of the
replicates.  Currently, I can plot the density function for each biological
replicate, and I'd like to see how pool of replicates compares to a
simulation I conducted earlier.  I can compare each replicate to the
simulation, but there's a fair amount of variability between replicates.
I'd like to take the geometric mean of the density functions at each point
along the x-axis, but when I compute:

> a<-density(A[,1][A[,1]>=0], n=2^15)
> b<-density(A[,3][A[,3]>=0], n=2^15)
> a$x[1]
[1] -70.47504
> b$x[1]
[1] -69.28902

So I can't simply compute the mean across y-values, because the x-values
don't match.  Is there a way to set the x-values to be the same for multiple
density plots?  Also, there are no negative values in the dataset, so I'd
like to bound the x-axis at 0 if at all possible?  Is there a standard way
to combine density functions?  Thanks for the advice.
-Aaron Spivak

ps. I thought about just pooling all measurements, but I don't think that's
appropriate because they are from different replicates and the smoothing
kernel depends on the variance in the sample to calculate the distribution.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Automatic command line options

2008-03-18 Thread Aaron MacNeil

Hello users,
Does anyone know how to turn off the automatic double quoting and  
bracketing on the command line that appeared in R 2.6.x (OS X). It's  
driving me nuts!
Many thanks,
Aaron.



----
M. Aaron MacNeil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read.table not clearing colClasses

2008-11-11 Thread Skewes,Aaron

I am attempting to created colClasses for several tables, then read only 
specific columns. There are two different table layouts that I am working with. 
If I use exclusively one layout, the script works perfectly, but if I mix the 
layouts, it fails to grab the correct columns form layout that is read in 
second. It appears that colClasses fails to adopt the new structure after the 
first iteration. Is there some way to clear colClasses of flush the write 
buffer between iterations?

Thanks,
Aaron


for(i in 1:length(fullnames.in))
{
cnames<- read.table(fullnames.in[i], header=FALSE, sep="\t", 
na.strings="", nrows=1, row.names = NULL
, skip=9, fill=TRUE, quote="")

#initialize col.classes to NULL vector
seq(1,length(cnames))->column.classes
column.classes[1:length(cnames)]="NULL"

#find where the desired columns are
idx<-which(cnames=="Row")
column.classes[idx]="integer"
idx<-which(cnames=="Col")
column.classes[idx]="integer"
idx<-which(cnames=="ControlType")
column.classes[idx]="integer"
idx<-which(cnames=="ProbeName")
column.classes[idx]="character"
idx<-which(cnames=="GeneName")
column.classes[idx]="character"
idx<-which(cnames=="SystematicName")
column.classes[idx]="character"
idx<-which(cnames=="LogRatio")
column.classes[idx]="numeric"
idx<-which(cnames=="gMeanSignal")
column.classes[idx]="numeric"
idx<-which(cnames=="rMeanSignal")
column.classes[idx]="numeric"
idx<-which(cnames=="gBGMeanSignal")
column.classes[idx]="numeric"
idx<-which(cnames=="rBGMeanSignal")
column.classes[idx]="numeric"

print(fullnames.in[i])

print("Reading file, this could take a few minutes")
#read all rows of selected columns into data.frame
d <- read.table(fullnames.in[1], header=TRUE, sep="\t", 
na.strings="", nrows=number.rows, colClasses=column.classes, row.names = NULL
 , skip=9, fill=TRUE, quote="")

print("Writing file, this could take a few minutes")
#write all rows of selected columns into file
write.table(d, fullnames.out[i], sep="\t", quote=FALSE, 
row.names=FALSE)

rm(cnames, column.classes, d, idx)
}


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 2D density tophat

2008-11-29 Thread Aaron Robotham

Hello R users,
I have successfully created a square (or more generally, rectangular) tophat
smoothing routine based on altering the already available KDE2D. I would be
keen to implement a circular tophat routine also, however this appears to be
much more difficult to write efficiently (I have a routine, but it's very
slow). I tried to create one based on using crossdist to create a distance
matrix between my data and the sampling grid, but it doesn't take a
particularly large amount of data (or hi res grid) for memory to be a big
problem. The 2D density routines I have been able to find either don't
support a simple tophat, or don't use the absolute distances between the
sampling grid and the data. Should anyone know of more general 2D density
routines that might support circular tophats, or know of a simple and
efficient method for creating them, I would be very grateful.

Aaron

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 2D density tophat

2008-12-01 Thread Aaron Robotham

Hello R users,

I have successfully created a square (or more generally, rectangular)  
tophat smoothing routine based on altering the already available  
KDE2D. I would be keen to implement a circular tophat routine also,  
however this appears to be much more difficult to write efficiently (I  
have a routine, but it's very slow). I tried to create one based on  
using crossdist (in spatstat) to create a distance matrix between my  
data and the sampling grid, but it doesn't take a particularly large  
amount of data (or hi-res grid) for memory to be a big problem. The 2D  
density routines I have been able to find either don't support a  
simple tophat, or don't use the absolute distances between the  
sampling grid and the data. Should anyone know of more general 2D  
density routines that might support circular tophats, or know of a  
simple and efficient method for creating them, I would be very grateful.

Thanks for your time,

Aaron

PS: I tried sending this on Friday originally, but as far as I know  
that didn't work, so should another post appear from me asking the  
same thing I apologise in advance.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] making sense of posterior statistics in the deal package

2008-12-04 Thread Aaron Tarone

Hello,
I'm doing bayesian network analyses with the deal package.  I am at a loss 
for how to interpret output from the analysis (i.e. what is a good score, what 
is a bad score, which stats tell me what about the network edges/nodes).

Here is an example node with its posterior scores for all parent nodes.



Conditional Posterior: Yp1| 3  4  5  6  9  11  12  15  18  
[[1]]
[[1]]$tau
  [,1][,2]   [,3][,4]   [,5][,6]
 [1,]  138.000 -201.944190 -61.827901 -29.5419149 11.7780877 -56.1691436
 [2,] -201.9441898  379.014299 101.336606  49.2886631 -9.5976678  99.0119458
 [3,]  -61.8279013  101.336606  55.301879  18.3175413  0.4718180  31.7741275
 [4,]  -29.5419149   49.288663  18.317541  18.5074653  0.7297184  14.7963722
 [5,]   11.7780877   -9.597668   0.471818   0.7297184 11.9705940  -0.1152971
 [6,]  -56.1691436   99.011946  31.774127  14.7963722 -0.1152971  33.0750507
 [7,]   11.8398168  -11.819652   2.372613   2.4241871  8.3525307  -0.5909911
 [8,]  -15.8233513   27.136706  13.261521  10.3380918  5.2238205  10.7721059
 [9,]  -63.0844071  112.477658  36.867027  18.7342207  1.8345119  32.6573681
[10,]   -0.91256760.892410   3.995155   3.3759532  5.2495044   4.8010982
 [,7]   [,8]   [,9]  [,10]
 [1,]  11.8398168 -15.823351 -63.084407 -0.9125676
 [2,] -11.8196521  27.136706 112.477658  0.8924099
 [3,]   2.3726129  13.261521  36.867027  3.9951552
 [4,]   2.4241871  10.338092  18.734221  3.3759532
 [5,]   8.3525307   5.223821   1.834512  5.2495044
 [6,]  -0.5909911  10.772106  32.657368  4.8010982
 [7,]  11.7576987   5.339882   1.364748  4.5801216
 [8,]   5.3398823  17.269931  14.659995  6.8871204
 [9,]   1.3647480  14.659995  43.586099  4.5549556
[10,]   4.5801216   6.887120   4.554956 11.1188844

[[1]]$phi
[1] 5.395758

[[1]]$mu
 [1] -0.151400686  0.459786917 -0.091988847 -0.009952914  0.074523419
 [6]  0.215198198 -0.010968581 -0.026347501  0.423837846 -0.018999184

[[1]]$rho
[1] 147

Any help you can give me is greatly appreciated.

Aaron Tarone


__
Aaron Tarone
Postdoctoral Research Associate
Molecular and Computational Biology Program
University of Southern California
[EMAIL PROTECTED]
(213) 740-3063

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 2D density tophat

2008-12-05 Thread Aaron Robotham

In case anyone other than me was interested, a pretty efficient  
circular tophat can be made using the fields function fields.rdist.near:


CircHat=function (x, y, h=1, gridres = c((max(x)-min(x))/25,(max(y)- 
min(y))/25), lims = c(range(x), range(y)),density=FALSE)

{
require(fields)
nx <- length(x)
ny <- length(y)
n=c(1+(lims[2]-lims[1])/gridres[1],1+(lims[4]-lims[3])/gridres[2])
if (length(y) != nx)
stop("data vectors must be the same length")
if (any(!is.finite(x)) || any(!is.finite(y)))
stop("missing or infinite values in the data are not allowed")
if (any(!is.finite(lims)))
stop("only finite values are allowed in 'lims'")
gx <- seq(lims[1], lims[2], by = gridres[1])
gy <- seq(lims[3], lims[4], by = gridres[2])
fullgrid=expand.grid(gx,gy)

if (missing(h))
h <- c(bandwidth.nrd(x), bandwidth.nrd(y))

temp 
= 
table 
(fields 
.rdist 
.near 
(as 
.matrix 
(fullgrid 
),as.matrix(cbind(x,y)),mean.neighbor=ceiling(length(x)*pi*h^2/ 
((lims[2]-lims[1])*(lims[4]-lims[3]))),delta=h)$ind[,1])


pad=rep(0,length(gx)*length(gy))
pad[as.numeric(names(temp))]=as.numeric(temp)

z <- matrix(pad, length(gx), length(gy))

if(density){z=z/(nx*pi*h^2)}

return=list(x = gx, y = gy, z = z)

}

It works in more or less the same way as kde2d, but by default it  
returns counts not densities


Aaron

On 1 Dec 2008, at  11:46, Aaron Robotham wrote:


Hello R users,

I have successfully created a square (or more generally, rectangular)
tophat smoothing routine based on altering the already available
KDE2D. I would be keen to implement a circular tophat routine also,
however this appears to be much more difficult to write efficiently (I
have a routine, but it's very slow). I tried to create one based on
using crossdist (in spatstat) to create a distance matrix between my
data and the sampling grid, but it doesn't take a particularly large
amount of data (or hi-res grid) for memory to be a big problem. The 2D
density routines I have been able to find either don't support a
simple tophat, or don't use the absolute distances between the
sampling grid and the data. Should anyone know of more general 2D
density routines that might support circular tophats, or know of a
simple and efficient method for creating them, I would be very  
grateful.


Thanks for your time,

Aaron

PS: I tried sending this on Friday originally, but as far as I know
that didn't work, so should another post appear from me asking the
same thing I apologise in advance.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] very long integers

2008-12-11 Thread Aaron Robotham


A quick question really:

I have a database with extremely long integer IDs (eg  
588848900971299297), which is too big for R to cope with internally  
(it appears to store as a double), and when I do any frequency tables  
erroneous results appear. Does anyone know of a package that extends  
internal storage up to LONG, or is the only solution to read it in as  
a character from the original data?


In case anyone is curious, I didn't create the IDs, and in some form I  
must conserve all of the ID information for later use.


Thanks,

Aaron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] very long integers

2008-12-11 Thread Aaron Robotham


A quick question really:

I have a database with extremely long integer IDs (eg  
588848900971299297), which is too big for R to cope with internally  
(it appears to store as a double), and when I do any frequency tables  
erroneous results appear. Does anyone know of a package that extends  
internal storage up to LONG, or is the only solution to read it in as  
a character from the original data?


In case anyone is curious, I didn't create the IDs, and in some form I  
must conserve all of the ID information for later use.


Thanks,

Aaron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Coxph frailty model counting process error X matrix deemed singular

2009-06-24 Thread Porter, Aaron T

Hello,

I am currently trying to simulate data and analyze it using the frailty option 
in the coxph function.  I am working with recurrent event data, using counting 
process notation.  Occasionally, (about 1 in every 100 simulations) I get the 
following warning: 

Error in coxph(Surv(start, end, censorind) ~ binary + uniform + 
frailty(subject,  : 
  X matrix deemed to be singular; variable 2

My data is structured as follows: I have a Bernoulli random variable 
(parameter=0.5) (labeled "binary") and a second variable which was generated as 
seq(0.02, 1, 0.02), which is labeled as "uniform".  There are 50 individual 
subjects.  Recurrent events are then generated as rexp(1, 
0.2*frailparm[j]*exp(mydata[j,1]*alpha[1]+mydata[j,2]*alpha[2])) where mydata 
is the cbind of the data just mentioned, alpha are the parameters for the 
recurrent events (here I am using c(1,1)) and frailparm is the frailty term for 
subject {j}.  I generate recurrent events until the sum of the times is greater 
than the terminal time or censoring time, and keep the previous highest time as 
my final recurrent time, with one additional time which is censored at the 
minimum of the terminal event time and the censoring time.  I then repeat for 
each subject.

I then try to analyze the data like this: 
coxph(Surv(start,end,censorind)~binary+uniform+frailty(subject,distribution="gauss",
 method="reml"), method="breslow", singular.ok=FALSE, data=fulldata) 
Where start is the previous recurrent time, end is the current recurrent time, 
censorind is the censoring indicator for the current recurrent time, and 
subject is the current observation.

There does not appear to be an issue with the binary variable taking a 
particular value for every observed event time, nor does there appear to be 
perfect correlation between the variable "uniform" and the survival time.

Any help would be much appreciated.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] which rows are duplicates?

2009-03-29 Thread Aaron M. Swoboda

I would like to know which rows are duplicates of each other, not  
simply that a row is duplicate of another row. In the following  
example rows 1 and 3 are duplicates.


> x <- c(1,3,1)
> y <- c(2,4,2)
> z <- c(3,4,3)
> data <- data.frame(x,y,z)
x y z
1 1 2 3
2 3 4 4
3 1 2 3

I can't figure out how to get R to tell me that observation 1 and 3  
are the same.  It seems like the "duplicated" and "unique" functions  
should be able to help me out, but I am stumped.


For instance, if I use "duplicated" ...

> duplicated(data)
[1] FALSE FALSE TRUE

it tells me that row 3 is a duplicate, but not which row it matches.  
How do I figure out WHICH row it matches?


And If I use "unique"...

> unique(data)
x y z
1 1 2 3
2 3 4 4

I see that rows 1 and 2 are unique, leaving me to infer that row 3 was  
a duplicate, but again it doesn't tell me which row it was a duplicate  
of (as far as I can tell). Am I missing something?


How can I determine that row 3 is a duplicate OF ROW 1?

Thanks,

Aaron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] confusion over "names" of lm.influence()$hat

2009-04-14 Thread Aaron M. Swoboda

76994995
0.04149530 0.04125143 0.06158475

I only noticed this problem because several times the observation in  
question wasn't even a part of the hat matrix output... Am I incorrect  
in assuming that the output from print(which(housedata$w>0))  should  
be the same as the "names" from print(lm.influence(result.b)$hat).  
Both have the same length (in this case 88 observations, but they  
don't appear to be the same observations.


Thanks for anyone who can help me clear this up,

Aaron

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Save a graph file use jpeg(file=file)

2022-01-05 Thread Ebert,Timothy Aaron

The simple solution for Windows is to use (windows icon) + shift + s. You then 
select a portion of your screen and it gets copied to your clipboard. You can 
then paste that into your document. Of course this will not work if it is 
important that the reader is able to rotate the graphic. 
Tim

-Original Message-
From: R-help  On Behalf Of Sorkin, John
Sent: Wednesday, January 5, 2022 2:46 PM
To: r-help@r-project.org (r-help@r-project.org) 
Subject: [R] Save a graph file use jpeg(file=file)

[External Email]

I am trying to create a 3-D graph (using scatter3d) and save the graph to a 
file so I can insert the graph into a manuscript. I am able to create the 
graph. When I run the code below an RGL window opens that has the graph. The 
file is saved to disk after dev.odd() runs. Unfortunately, when I open the 
saved file, all I see is a white window. Can someone tell me how to have the 
file so I can subsequently read and place the file in a paper? The problem 
occurs regardless of the format in which I try to save the file, e.g. png, tiff.

x <- 1:10
y <- 2:11
z <- y+rnorm(10)
ForGraph<-data.frame(x=x,y=y,z=z)
ForGraph

gpathj <- file.path("C:","LAL","test.jpeg") gpathj jpeg(file = gpathj) par(mai 
= c(0.5, 0.5, 0.5, 0.5)) scatter3d(z=ForGraph$x,
  y=ForGraph$y,
  x=ForGraph$z,
  surface=FALSE,grid=TRUE,sphere.size=4
  ,xlab="Categories",ylab="ScoreRange",
  zlab="VTE Rate (%)",axis.ticks=TRUE)
dev.off()

Thank you,
John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9AMmnr8IGTA1cG1beZmaYb3IDwiObdbc6OI3PkjbwbniR_W4i9hcMzYbyzYTE-gS&s=2IkUWGiufhME4qqOuIRPGShSOMTsStDDwSkLoTt4zdM&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9AMmnr8IGTA1cG1beZmaYb3IDwiObdbc6OI3PkjbwbniR_W4i9hcMzYbyzYTE-gS&s=NHEqSQ0hFwKq_5lC0CqVYr53rYXLTpdWVNXZMCEMLLI&e=
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NAs are removed

2022-01-14 Thread Ebert,Timothy Aaron

Hi Neha,
   You used a variable named "fraction" so we took a guess. However, as another 
pointed out 1/0 does not give NA in R. number/0 returns Inf except 0/0 which 
returns NaN. So 1/0 <= 1 returns FALSE and 0/0 <= 1 returns NA. A great deal of 
the behavior of your program hinges on what "fraction" is in your program.

Tim

-Original Message-
From: R-help  On Behalf Of Neha gupta
Sent: Friday, January 14, 2022 4:50 PM
To: Jim Lemon 
Cc: r-help mailing list 
Subject: Re: [R] NAs are removed

[External Email]

Hi Jim and Ebert

How I am using divide by zero, I did not understand? I am using caret and AUC 
metric.

If I do, what is the solution?

On Fri, Jan 14, 2022 at 9:41 PM Jim Lemon  wrote:

> Hi Neha,
> You're using the argument "na.omit" in what function? My blind guess 
> is that there's a divide by zero shooting you from behind.
>
> Jim
>
> On Sat, Jan 15, 2022 at 6:32 AM Neha gupta 
> wrote:
> >
> > Hi everyone
> >
> > I use na.omit to remove NAs but still it gives me error
> >
> > Error in if (fraction <= 1) { : missing value where TRUE/FALSE 
> > needed
> >
> > My data is:
> >
> > data.frame': 340 obs. of  15 variables:
> >  $ DepthTree: num  1 1 1 1 1 1 1 1 1 1 ...
> >  $ NumSubclass  : num  0 0 0 0 0 0 0 0 0 0 ...
> >  $ McCabe   : num  1 1 1 1 1 1 3 3 3 3 ...
> >  $ LOC  : num  3 4 3 3 4 4 10 10 10 10 ...
> >  $ DepthNested  : num  1 1 1 1 1 1 2 2 2 2 ...
> >  $ CA   : num  1 1 1 1 1 1 1 1 1 1 ...
> >  $ CE   : num  2 2 2 2 2 2 2 2 2 2 ...
> >  $ Instability  : num  0.667 0.667 0.667 0.667 0.667 0.667 0.667 
> > 0.667
> > 0.667 0.667 ...
> >  $ numCovered   : num  0 0 0 0 0 0 0 0 0 0 ...
> >  $ operator : Factor w/ 16 levels "T0","T1","T2",..: 2 2 4 13 13 13
> 1 3
> > 4 7 ...
> >  $ methodReturn : Factor w/ 22 levels "I","V","Z","method",..: 2 2 2 
> > 2 2
> 2
> > 2 2 2 2 ...
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_ma
> > ilman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2k
> > VeAsRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06S
> > kfQKtGPM2OWM5&s=LqAgI3qNLyTF5KeFM9sT9jTT4rkvlcJa1V9CIW_SVy4&e=
> > PLEASE do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
> g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> sRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06SkfQKt
> GPM2OWM5&s=hUs_mg3eaWhd-I4H-9rKF6C4B7CFwLsuBkx3Qv68_o0&e=
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06SkfQKtGPM2OWM5&s=LqAgI3qNLyTF5KeFM9sT9jTT4rkvlcJa1V9CIW_SVy4&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06SkfQKtGPM2OWM5&s=hUs_mg3eaWhd-I4H-9rKF6C4B7CFwLsuBkx3Qv68_o0&e=
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NAs are removed

2022-01-14 Thread Ebert,Timothy Aaron

I don’t see any. To support this claim I tried it (but no dataframe):
CA = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
prot <- ifelse(CA == '2', 0, 1)
print(prot)

R responds:
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[37] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[73] 0 0 0 0 0 0

You can check other statements in the same way. That said, in a huge dataset 
you might want to ask if the data provided match what you are assuming is 
there. If you type unique(ts$CA) do you get anything other than 1 and 2? This 
is a common task of figuring out if the problem is with the code or the data.

Tim
From: Neha gupta 
Sent: Friday, January 14, 2022 5:11 PM
To: Ebert,Timothy Aaron 
Cc: Jim Lemon ; r-help mailing list 
Subject: Re: [R] NAs are removed

[External Email]
I have a variable in dataset "CA", which has the following values:

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1
 [40] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
2 2

then I used this statement

prot <- ifelse(ts$CA == '2', 0, 1)

Is the problem exist here?

On Fri, Jan 14, 2022 at 11:02 PM Ebert,Timothy Aaron 
mailto:teb...@ufl.edu>> wrote:
Hi Neha,
   You used a variable named "fraction" so we took a guess. However, as another 
pointed out 1/0 does not give NA in R. number/0 returns Inf except 0/0 which 
returns NaN. So 1/0 <= 1 returns FALSE and 0/0 <= 1 returns NA. A great deal of 
the behavior of your program hinges on what "fraction" is in your program.

Tim

-Original Message-
From: R-help 
mailto:r-help-boun...@r-project.org>> On Behalf 
Of Neha gupta
Sent: Friday, January 14, 2022 4:50 PM
To: Jim Lemon mailto:drjimle...@gmail.com>>
Cc: r-help mailing list mailto:r-help@r-project.org>>
Subject: Re: [R] NAs are removed

[External Email]

Hi Jim and Ebert

How I am using divide by zero, I did not understand? I am using caret and AUC 
metric.

If I do, what is the solution?

On Fri, Jan 14, 2022 at 9:41 PM Jim Lemon 
mailto:drjimle...@gmail.com>> wrote:

> Hi Neha,
> You're using the argument "na.omit" in what function? My blind guess
> is that there's a divide by zero shooting you from behind.
>
> Jim
>
> On Sat, Jan 15, 2022 at 6:32 AM Neha gupta 
> mailto:neha.bologn...@gmail.com>>
> wrote:
> >
> > Hi everyone
> >
> > I use na.omit to remove NAs but still it gives me error
> >
> > Error in if (fraction <= 1) { : missing value where TRUE/FALSE
> > needed
> >
> > My data is:
> >
> > data.frame': 340 obs. of  15 variables:
> >  $ DepthTree: num  1 1 1 1 1 1 1 1 1 1 ...
> >  $ NumSubclass  : num  0 0 0 0 0 0 0 0 0 0 ...
> >  $ McCabe   : num  1 1 1 1 1 1 3 3 3 3 ...
> >  $ LOC  : num  3 4 3 3 4 4 10 10 10 10 ...
> >  $ DepthNested  : num  1 1 1 1 1 1 2 2 2 2 ...
> >  $ CA   : num  1 1 1 1 1 1 1 1 1 1 ...
> >  $ CE   : num  2 2 2 2 2 2 2 2 2 2 ...
> >  $ Instability  : num  0.667 0.667 0.667 0.667 0.667 0.667 0.667
> > 0.667
> > 0.667 0.667 ...
> >  $ numCovered   : num  0 0 0 0 0 0 0 0 0 0 ...
> >  $ operator : Factor w/ 16 levels "T0","T1","T2",..: 2 2 4 13 13 13
> 1 3
> > 4 7 ...
> >  $ methodReturn : Factor w/ 22 levels "I","V","Z","method",..: 2 2 2
> > 2 2
> 2
> > 2 2 2 2 ...
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
> > UNSUBSCRIBE and more, see
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_ma
> > ilman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2k
> > VeAsRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06S
> > kfQKtGPM2OWM5&s=LqAgI3qNLyTF5KeFM9sT9jTT4rkvlcJa1V9CIW_SVy4&e=
> > PLEASE do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
> g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> sRzsn7AkP-g&m=WsafNBaOwXzuF-v3jJZaUbBRngZTxjDnPCJN1jlMOzYqG9yy06SkfQKt
> GPM2OWM5&s=hUs_mg3eaWhd-I4H-9rKF6C4B7CFwLsuBkx3Qv68_o0&e=
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.or

Re: [R] [External] Weird behaviour of order() when having multiple ties

2022-01-31 Thread Ebert,Timothy Aaron

Dat1 <- c(0.6, 0.5, 0.3, 0.2, 0.1, 0.1, 0.2)
print(order(Dat1))
print(sort(Dat1))

Compare output



-Original Message-
From: R-help  On Behalf Of Martin Maechler
Sent: Monday, January 31, 2022 9:04 AM
To: Stefan Fleck 
Cc: r-help@r-project.org
Subject: Re: [R] [External] Weird behaviour of order() when having multiple ties

[External Email]

> Stefan Fleck
> on Sun, 30 Jan 2022 21:07:19 +0100 writes:

> it's not about the sort order of the ties, shouldn't all the 1s in
> order(c(2,3,4,1,1,1,1,1)) come before 2,3,4? because that's not what
> happening

aaah.. now we are getting somewhere:
It looks you have always confused order() with sort() ...
have you ?


> On Sun, Jan 30, 2022 at 9:00 PM Richard M. Heiberger  
wrote:

>> when there are ties it doesn't matter which is first.
>> in a situation where it does matter, you will need a tiebreaker column.
>> --
>> *From:* R-help  on behalf of Stefan Fleck <
>> stefan.b.fl...@gmail.com>
>> *Sent:* Sunday, January 30, 2022 4:16:44 AM
>> *To:* r-help@r-project.org 
>> *Subject:* [External] [R] Weird behaviour of order() when having multiple
>> ties
>>
>> I am experiencing a weird behavior of `order()` for numeric vectors. I
>> tested on 3.6.2 and 4.1.2 for windows and R 4.0.2 on ubuntu. Can anyone
>> confirm?
>>
>> order(
>> c(
>> 0.6,
>> 0.5,
>> 0.3,
>> 0.2,
>> 0.1,
>> 0.1
>> )
>> )
>> ## Result [should be in order]
>> [1] 5 6 4 3 2 1
>>
>> The sort order is obviously wrong. This only occurs if i have multiple
>> ties. The problem does _not_ occur for decreasing = TRUE.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>
>> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__nam10.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fstat.ethz.ch-252Fmailman-252Flistinfo-252Fr-2Dhelp-26amp-3Bdata-3D04-257C01-257Crmh-2540temple.edu-257Cbae20314c2314a5cc7cd08d9e429e33f-257C716e81efb52244738e3110bd02ccf6e5-257C0-257C0-257C637791692024451993-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26amp-3Bsdata-3DO6R-252FNM6IdPzP8RY3JIWfLgmkE-252B0KcVyYBxoRMo8v2dk-253D-26amp-3Breserved-3D0&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=kydE98W9Su8vCPoxYcigO1iYSHVO2pjdbYqF8z4CEwo&e=
>> PLEASE do read the posting guide
>> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__nam10.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fwww.r-2Dproject.org-252Fposting-2Dguide.html-26amp-3Bdata-3D04-257C01-257Crmh-2540temple.edu-257Cbae20314c2314a5cc7cd08d9e429e33f-257C716e81efb52244738e3110bd02ccf6e5-257C0-257C0-257C637791692024451993-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C3000-26amp-3Bsdata-3D6hlfMjZLzopVzGnFVWlGnoEqvZBQwXPlxMuZ2sglEUk-253D-26amp-3Breserved-3D0&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=_xSJacXhmOM-JE0jBCZ62UPEgerWHVqFkW2aXuIekvY&e=
>> and provide commented, minimal, self-contained, reproducible code.
>>

> [[alternative HTML version deleted]]

> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=eoBL8fgGe-j3eEYAo1fT5-oVM-5twH3nn5iTJ3Dh6vc&e=
> PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=6QEl5w7lJHJJELW6QwypJN8KK64mDcTZXg5yoLs9Wu4&e=
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=eoBL8fgGe-j3eEYAo1fT5-oVM-5twH3nn5iTJ3Dh6vc&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=LETnyM_0QTFNEMNectANV0IJZgIOyofIv54iJDPBZF-atb3Xe9lGTZ7tN68hw3Te&s=6QEl5w7lJHJJELW6QwypJN8KK64mDcTZXg5yoLs9Wu4&e=
and provide commented, minimal, self-contained,

Re: [R] Convert a character string to variable names

2022-02-08 Thread Ebert,Timothy Aaron

"A variable in R can refer to many things, ..." I agree.
"It absolutely _can_ refer to a list, ..." I partly agree. In R as a 
programming language I agree. In R as a statistical analysis tool then only 
partly. Typically one would need to limit the list so each variable would be of 
the same length and all values within the variable be of the same data type 
(integer, real, factor, character). As a programmer yes, as a statistician not 
really unless you always qualify the type of list considered and that gets 
tiresome.

R does name individual elements using numeric place names: hence df[row, 
column]. Each element must have a unique address, and that is true in all 
computer languages.

A dataframe is a list of columns of the same length containing the same data 
type within a column. 

mtcars$disp does not have a value (a value is one number). With 32 elements I 
can calculate a mean and the mean is a value. 32 numbers is not a value. I 
suppose a single value could be the starting memory address of the name, but I 
don't see how that distinction helps unless one is doing Assembly or Machine 
language programming. 

I have never used get(), so I will keep that in mind. I agree that it makes 
life much easier to enter the data in the way it will be analyzed. 

-Original Message-
From: Jeff Newmiller  
Sent: Tuesday, February 8, 2022 10:10 PM
To: r-help@r-project.org; Ebert,Timothy Aaron ; Richard O'Keefe 
; Erin Hodgess 
Cc: r-help@r-project.org
Subject: Re: [R] Convert a character string to variable names

[External Email]

A variable in R can refer to many things, but it cannot be an element of a 
vector. It absolutely _can_ refer to a list, a list of lists, a function, an 
environment, and any of the various kinds of atomic vectors that you seem to 
think of as variables. (R does _not_ name individual elements of vectors, 
unlike many other languages.)

The things you can do with the mtcars object may be different than the things 
you can do with the object identified by the expression mtcars$disp, but the 
former has a variable name in an environment while the latter is embedded 
within the former. mtcars$disp is shorthand for the expression mtcars[[ "disp" 
]] which searches the names attribute of the mtcars list (a data frame is a 
list of columns) to refer to that object.

R allows non-standard evaluation to make elements of lists accessible as though 
they were variables in an environment, such as with( mtcars, disp ) or various 
tidyverse evaluation conventions. But while the expression mtcars$disp DOES 
have a value( it is an atomic vector of 32 integer elements) it is not a 
variable so get("mtcars$disp") cannot be expected to work (as it does not). You 
may be confusing "variable" with "object" ... lots of objects have no variable 
names.

I have done all sorts of complicated data manipulations in R, but I have never 
found a situation where a use of get() could not be replaced with a clearer way 
to get the job done. Using lists is central to this... avoid making distinct 
variables in the first place if you plan to be retrieving them later indirectly 
like this.

On February 8, 2022 5:45:39 PM PST, "Ebert,Timothy Aaron"  
wrote:
>
>I had thought that mtcars in "mtcars$disp" was the name of a dataframe and 
>that "disp" was the name of a column in the dataframe. If I would make a model 
>like horse power = displacement then "disp" would be a variable in the model 
>and I can find values for this variable in the "disp" column in the "mtcars" 
>dataframe. I am not sure how I would use "mtcars" as a variable.
>"mtcars$disp" has no specific value, though it will have a specific value for 
>any given row of data (assuming rows are observations).
>
>Tim
>
>
>-Original Message-
>From: R-help  On Behalf Of Richard 
>O'Keefe
>Sent: Tuesday, February 8, 2022 8:17 PM
>To: Erin Hodgess 
>Cc: r-help@r-project.org
>Subject: Re: [R] Convert a character string to variable names
>
>[External Email]
>
>"mtcars$disp" is not a variable name.
>"mtcars" is a variable name, and
>get("mtcars") will get the value of that variable assign("mtcars", 
>~~whatever~~) will set it.
>mtcars$disp is an *expression*,
>where $ is an indexing operator
>https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.o
>rg_doc_manuals_r-2Drelease_R-2Dlang.html-23Indexing&d=DwICAg&c=sJ6xIWYx
>-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=CI-7ZdIwlhUvhmOkVD7KJkv3IvSSW
>y4ix2Iz1netW81V-NUV8aOVVqyn5-fmD6cf&s=RjRC5kve6D8k59qZQYcX-PR-aA4TTu1yf
>LPBhHxSlWk&e=
>so what you want is
>> mtcars <- list(cyl=4, disp=1.8)
>> eval(parse(text="mtcars$disp"))
>[1] 1.8
>
>Th

Re: [R] Convert a character string to variable names

2022-02-12 Thread Ebert,Timothy Aaron

How does “a value” differ from “an object?”

From: Richard O'Keefe 
Sent: Friday, February 11, 2022 12:25 AM
To: Ebert,Timothy Aaron 
Cc: Jeff Newmiller ; r-help@r-project.org; Erin 
Hodgess 
Subject: Re: [R] Convert a character string to variable names

[External Email]
You wrote "32 numbers is not a value".
It is, it really is.  When you have a vector like
 x <- 1:32
you have a simple variable (x) referring to an immutable value
(1, 2, ..., 32).  A vector in R is NOT a collection of mutable
boxes, it is a collection of *numbers* (or strings).  The vector
itself is a good a value as ever twanged.  You cannot change it.
A statement like
 x[i] <- 77
is just shorthand for
 x <- "[<-"(x, i, 77)
which constructs a whole new 32-number value and assigns that to x.
(The actual implementation is cleverer when it can be, but often it
cannot be clever.)
Pure values like vectors can be shared: if x is a vector,
then y <- x is a constant time operation.  If you then change
y, you only change y, not the vector.  x is unchanged.

On Wed, 9 Feb 2022 at 17:06, Ebert,Timothy Aaron 
mailto:teb...@ufl.edu>> wrote:
"A variable in R can refer to many things, ..." I agree.
"It absolutely _can_ refer to a list, ..." I partly agree. In R as a 
programming language I agree. In R as a statistical analysis tool then only 
partly. Typically one would need to limit the list so each variable would be of 
the same length and all values within the variable be of the same data type 
(integer, real, factor, character). As a programmer yes, as a statistician not 
really unless you always qualify the type of list considered and that gets 
tiresome.

R does name individual elements using numeric place names: hence df[row, 
column]. Each element must have a unique address, and that is true in all 
computer languages.

A dataframe is a list of columns of the same length containing the same data 
type within a column.

mtcars$disp does not have a value (a value is one number). With 32 elements I 
can calculate a mean and the mean is a value. 32 numbers is not a value. I 
suppose a single value could be the starting memory address of the name, but I 
don't see how that distinction helps unless one is doing Assembly or Machine 
language programming.

I have never used get(), so I will keep that in mind. I agree that it makes 
life much easier to enter the data in the way it will be analyzed.

-Original Message-
From: Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>>
Sent: Tuesday, February 8, 2022 10:10 PM
To: r-help@r-project.org<mailto:r-help@r-project.org>; Ebert,Timothy Aaron 
mailto:teb...@ufl.edu>>; Richard O'Keefe 
mailto:rao...@gmail.com>>; Erin Hodgess 
mailto:erinm.hodg...@gmail.com>>
Cc: r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: [R] Convert a character string to variable names

[External Email]

A variable in R can refer to many things, but it cannot be an element of a 
vector. It absolutely _can_ refer to a list, a list of lists, a function, an 
environment, and any of the various kinds of atomic vectors that you seem to 
think of as variables. (R does _not_ name individual elements of vectors, 
unlike many other languages.)

The things you can do with the mtcars object may be different than the things 
you can do with the object identified by the expression mtcars$disp, but the 
former has a variable name in an environment while the latter is embedded 
within the former. mtcars$disp is shorthand for the expression mtcars[[ "disp" 
]] which searches the names attribute of the mtcars list (a data frame is a 
list of columns) to refer to that object.

R allows non-standard evaluation to make elements of lists accessible as though 
they were variables in an environment, such as with( mtcars, disp ) or various 
tidyverse evaluation conventions. But while the expression mtcars$disp DOES 
have a value( it is an atomic vector of 32 integer elements) it is not a 
variable so get("mtcars$disp") cannot be expected to work (as it does not). You 
may be confusing "variable" with "object" ... lots of objects have no variable 
names.

I have done all sorts of complicated data manipulations in R, but I have never 
found a situation where a use of get() could not be replaced with a clearer way 
to get the job done. Using lists is central to this... avoid making distinct 
variables in the first place if you plan to be retrieving them later indirectly 
like this.

On February 8, 2022 5:45:39 PM PST, "Ebert,Timothy Aaron" 
mailto:teb...@ufl.edu>> wrote:
>
>I had thought that mtcars in "mtcars$disp" was the name of a dataframe and 
>that "disp" was the name of a column in the dataframe. If I would make a model 
>like horse power = displacement then "disp" would be a variable in the model 
>and I can find values

Re: [R] confusion matrix like detail with continuous data?

2022-02-16 Thread Ebert,Timothy Aaron

In your prediction you will have a target level of accuracy. Something like "I 
need to predict the slope of the regression to within 1%." You break your data 
into a training and testing data sets, then for the testing data set you ask is 
the prediction within 1% of the observed value. That is about as close as I can 
come as I have trouble thinking how to get a false positive out of a regression 
with a continuous dependent variable.
   Of course, you have to have enough data that splitting the data set into two 
pieces leaves enough observations to make a reasonable model. 
Tim

-Original Message-
From: R-help  On Behalf Of Ivan Krylov
Sent: Wednesday, February 16, 2022 5:00 AM
To: r-help@r-project.org
Subject: Re: [R] confusion matrix like detail with continuous data?

[External Email]

On Tue, 15 Feb 2022 22:17:42 +0100
Neha gupta  wrote:

> (1) Can we get the details like the confusion matrix with continuous 
> data?

I think the closest you can get is a predicted-reference plot. That is, plot 
true values on the X axis and the corresponding predicted values on the Y axis.

Unsatisfying option: use cut() to transform a continuous variable into a 
categorical variable and make a confusion matrix out of that.

> (2) How can we get the mean absolute error for an individual instance? 
> For example, if the ground truth is 4 and our model predicted as 6, 
> how to find the mean absolute error for this instance?

Mathematically speaking, mean absolute error of an individual instance would be 
just the absolute value of the error in that instance, but that's probably not 
what you're looking for. If you need some kind of confidence bands for the 
predictions, it's the model's responsibility to provide them. There's lots of 
options, ranging from the use of the loss function derivative around the 
optimum to Monte-Carlo simulations.
For examples, see the confint() method.

--
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=n0Pz_t-BEeazrrz7r5DIs0qGgfyJ0E0_F5sGlJyjhnwJRydXFvfNs1g5Pe25PGK0&s=ZeN73VTXr4Z-qwxODgOWPyhqtvKWIXp6xVsLle-eWYA&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=n0Pz_t-BEeazrrz7r5DIs0qGgfyJ0E0_F5sGlJyjhnwJRydXFvfNs1g5Pe25PGK0&s=CqgXaJDSeFk1kD9-xcMjcbZYWKXSCkuJZodGf0yvRDk&e=
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with data distribution

2022-02-17 Thread Ebert,Timothy Aaron

You pipe the filter but do not save the result. A reproducible example might 
help.
Tim

-Original Message-
From: R-help  On Behalf Of Neha gupta
Sent: Thursday, February 17, 2022 1:55 PM
To: r-help mailing list 
Subject: [R] Problem with data distribution

[External Email]

Hello everyone

I have a dataset with output variable "bug" having the following values (at the 
bottom of this email). My advisor asked me to provide data distribution of bugs 
with 0 values and bugs with more than 0 values.

data = readARFF("synapse.arff")
data2 = readARFF("synapse.arff")
data$bug
library(tidyverse)
data %>%
  filter(bug == 0)
data2 %>%
  filter(bug >= 1)
boxplot(data2$bug, data$bug, range=0)

But both the graphs are exactly the same, how is it possible? Where I am doing 
wrong?

data$bug
  [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
0 4 1 0
 [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
0 0 0 0
 [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
7 0 0 1
[118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
0 1 0 0
[157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
0 0 0 1
[196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&e=
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with data distribution

2022-02-17 Thread Ebert,Timothy Aaron

Maybe what you want is to recode your data differently.
One data set has bug versus no bug. What is the probability of having one or 
more bugs?
The other data set has bugs only. Given that I have bugs how many will I get?

Tim

-Original Message-
From: R-help  On Behalf Of Neha gupta
Sent: Thursday, February 17, 2022 4:54 PM
To: Bert Gunter 
Cc: r-help mailing list 
Subject: Re: [R] Problem with data distribution

[External Email]

:) :)

On Thu, Feb 17, 2022 at 10:37 PM Bert Gunter  wrote:

> imo, with such simple data, a plot is mere chartjunk. A simple table(= 
> the distribution) would suffice and be more informative:
>
> > table(bug) ## bug is a vector. No data frame is needed
>
>   0   1 23   4   5   7   ## bug count
> 162  40   9   7   2   1   1   ## nmbr of cases with the given count
>
> You or others may disagree, of course.
>
> Bert Gunter
>
>
>
> On Thu, Feb 17, 2022 at 11:56 AM Neha gupta 
> wrote:
> >
> > Ebert and Rui, thank you for providing the tips (in fact, for 
> > providing
> the
> > answer I needed).
> >
> > Yes, you are right that boxplot of all zero values will not make sense.
> > Maybe histogram will work.
> >
> > I am providing a few details of my data here and the context of the 
> > question I asked.
> >
> > My data is about bugs/defects in different classes of a large 
> > software system. I have to predict which class will contain bugs and 
> > which will be free of bugs (bug=0). I trained ML models and predict 
> > but my advisor
> asked
> > me to provide first the data distribution about bugs e.g details of 
> > how many classes with bugs (bug > 0) and how many are free of bugs (bug=0).
> >
> > That is why I need to provide the data distribution of both types of
> values
> > (i.e. bug=0 and bug >0)
> >
> > Thank you again.
> >
> > On Thu, Feb 17, 2022 at 8:28 PM Rui Barradas 
> wrote:
> >
> > > Hello,
> > >
> > > In your original post you read the same file "synapse.arff" twice, 
> > > apparently to filter each of them by its own criterion. You don't 
> > > need to do that, read once and filter that one by different criteria.
> > >
> > > As for the data as posted, I have read it in with the following code:
> > >
> > >
> > > x <- "
> > > 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 
> > > 0 0 0
> > > 4 1 0
> > > 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 
> > > 0 0 0
> > > 0 0 0
> > > 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 
> > > 0 0 7
> > > 0 0 1
> > > 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 
> > > 0 0 0
> > > 1 0 0
> > > 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 
> > > 1 0 0
> > > 0 0 1
> > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0 "
> > > bug <- scan(text = x)
> > > data <- data.frame(bug)
> > >
> > >
> > > This is not the right way to post data, the posting guide asks to 
> > > post the output of
> > >
> > >
> > > dput(data)
> > > structure(list(bug = c(0, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 
> > > 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 
> > > 4, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 3, 2, 0, 0, 0, 0, 3, 0, 0, 
> > > 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 
> > > 2, 1, 0, 1, 0, 0, 0, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 
> > > 0, 1, 0, 0, 5, 0, 0, 0, 0, 0, 0, 7, 0, 0, 1, 0, 1, 1, 0, 2, 0, 3, 
> > > 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 3, 2, 1, 1, 
> > > 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 
> > > 0, 0, 3, 0, 0, 1, 0, 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 4, 1, 1, 
> > > 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> > > 0, 0, 3, 0, 1, 0, 0, 0, 0, 0)), class = "data.frame", row.names = 
> > > c(NA, -222L))
> > >
> > >
> > >
> > > This can be copied into an R session and the data set recreated 
> > > with
> > >
> > > data <- structure(etc)
> > >
> > >
> > > Now the boxplots.
> > >
> > > (Why would you want to plot a vector of all zeros, btw?)
> > >
> > >
> > >
> > > library(dplyr)
> > >
> > > boxplot(filter(data, bug == 0))# nonsense
> > > boxplot

Re: [R] conditional filling of data.frame - improve code

2022-03-10 Thread Ebert,Timothy Aaron

You could try some of the "join" commands from dplyr.
https://dplyr.tidyverse.org/reference/mutate-joins.html
https://statisticsglobe.com/r-dplyr-join-inner-left-right-full-semi-anti

Regards,
Tim
-Original Message-
From: R-help  On Behalf Of Jeff Newmiller
Sent: Thursday, March 10, 2022 11:25 AM
To: r-help@r-project.org; Ivan Calandra ; R-help 

Subject: Re: [R] conditional filling of data.frame - improve code

[External Email]

Use merge.

expts <- read.csv( text =
"expt,sample
ex1,sample1-1
ex1,sample1-2
ex2,sample2-1
ex2,sample2-2
ex2,sample2-3
", header=TRUE, as.is=TRUE )

mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", 
"sample1-1", "sample1-1", "sample2-1"))

merge( mydata, expts, by="sample", all.x=TRUE )

On March 10, 2022 7:50:23 AM PST, Ivan Calandra  wrote:
>Dear useRs,
>
>I would like to improve my ugly (though working) code, but I think I 
>need a completely different approach and I just can't think out of my box!
>
>I have some external information about which sample(s) belong to which 
>experiment. I need to get that manually into R (either typing directly 
>in a script or read a CSV file, but that makes no difference):
>exp <- list(ex1 = c("sample1-1", "sample1-2"), ex2 = c("sample2-1", 
>"sample2-2" , "sample2-3"))
>
>Then I have my data, only with the sample IDs:
>mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", 
>"sample1-1", "sample1-1", "sample2-1"))
>
>Now I want to add a column to mydata with the experiment ID. The best I 
>could find is that:
>for (i in names(exp)) mydata[mydata[["sample"]] %in% exp[[i]], 
>"experiment"] <- i
>
>In this example, the experiment ID could be extracted from the sample 
>IDs, but this is not the case with my real data so it really is a 
>matter of matching. Of course I also have other columns with my real data.
>
>I'm pretty sure the last line (with the loop) can be improved in terms 
>of readability (speed is not an issue here). I have close to no 
>constraints on 'exp' (here I chose a list, but anything could do), the 
>only thing that cannot change is the format of 'mydata'.
>
>Thank you in advance!
>Ivan
>

--
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=4HazMU4Mqs2oOcAkBrZd0VGrHX_lw6J1XozQNQ9RsHk&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=LdQqnVBkEAmRk7baBZLPs2svUpN6DIYaznrka_X8maI&e=
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How important is set.seed

2022-03-21 Thread Ebert,Timothy Aaron

If you are using the program for data analysis then set.seed() is not necessary 
unless you are developing a reproducible example. In a standard analysis it is 
mostly counter-productive because one should then ask if your presented results 
are an artifact of a specific seed that you selected to get a particular 
result. However, in cases where you need a reproducible example, debugging a 
program, or specific other cases where you might need the same result with 
every run of the program then set.seed() is an essential tool.
Tim

-Original Message-
From: R-help  On Behalf Of Jeff Newmiller
Sent: Monday, March 21, 2022 8:41 PM
To: r-help@r-project.org; Neha gupta ; r-help mailing 
list 
Subject: Re: [R] How important is set.seed

[External Email]

First off, "ML models" do not all use random numbers (for prediction I would 
guess very few of them do). Learn and pay attention to what the functions you 
are using do.

Second, if you use random numbers properly and understand the precision that 
your specific use case offers, then you don't need to use set.seed. However, in 
practice, using set.seed can allow you to temporarily avoid chasing precision 
gremlins, or set up specific test cases for testing code, not results. It is 
your responsibility to not let this become a crutch... a randomized simulation 
that is actually sensitive to the seed is unlikely to offer an accurate result.

Where to put set.seed depends a lot on how you are performing your simulations. 
In general each process should set it once uniquely at the beginning, and if 
you use parallel processing then use the features of your parallel processing 
framework to insure that this happens. Beware of setting all worker processes 
to use the same seed.

On March 21, 2022 5:03:30 PM PDT, Neha gupta  wrote:
>Hello everyone
>
>I want to know
>
>(1) In which cases, we need to use set.seed while building ML models?
>
>(2) Which is the exact location we need to put the set.seed function i.e.
>when we split data into train/test sets, or just before we train a model?
>
>Thank you
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm
>an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
>sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
>0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
>PLEASE do read the posting guide 
>https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org
>_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
>zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
>f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e=
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How important is set.seed

2022-03-22 Thread Ebert,Timothy Aaron

That approach would start the trainControl method at set.seed(123) and it would 
start ran_search at set.seed(123).
I am not sure it would be good or not – especially in this context. I am not 
clear on how the results are being compared, but I could get some differences 
if one method had a few extra calls to an RNG (random number generator).

I would think it makes more sense to ask how approach 1 differs from approach 2 
over a wide range of seeds. You are not testing the RNG, and I am not sure 
using the same seed for each model makes a difference unless the analysis is a 
paired samples approach. Might it be more effective to remove the initial 
set.seed() and then replace the second set.seed with set.seed(NULL) ?

Otherwise wrap this into a loop

N1=100
set.seed(123)
seed1<- runif(100, min=20, max=345689)
for (I in 1:100){
set.seed(seed1[i]
code
set.seed(seed1[i]
}

Or use set.seed(NULL) between the models.
You will need some variable to store the relevant results from each model, and 
some code do display the results. In the former I suggest setting up a matrix 
or two that can be indexed using the for loop index.

Tim

From: Neha gupta 
Sent: Tuesday, March 22, 2022 12:03 PM
To: Ebert,Timothy Aaron 
Cc: Jeff Newmiller ; r-help@r-project.org
Subject: Re: How important is set.seed

[External Email]
Thank you again Tim

d=readARFF("my data")

set.seed(123)

tr <- d[index, ]
ts <- d[-index, ]


ctrl <- trainControl(method = "repeatedcv",number=10)

set.seed(123)
ran_search <- train(lneff ~ ., data = tr,
 method = "mlp",
   tuneLength = 30,
 metric = "MAE",
 preProc = c("center", "scale", "nzv"),
 trControl = ctrl)
getTrainPerf(ran_search)


Would it be good?

On Tue, Mar 22, 2022 at 4:34 PM Ebert,Timothy Aaron 
mailto:teb...@ufl.edu>> wrote:
My inclination is to follow Jeff’s advice and put it at the beginning of the 
program.
You can always experiment:

set.seed(42)
rnorm(5,5,5)
rnorm(5,5,5)
runif(5,0,3)

As long as the commands are executed in the order they are written, then the 
outcome is the same every time. Set seed is giving you reproducible outcomes. 
However, the second rnorm() does not give you the same outcome as the first. So 
set seed starts at the same point but if you want the first and second rnorm() 
call to give the same results you will need another set.seed(42).

Note also, that it does not matter if you pause: run the above code as a chunk, 
or run each command individually you get the same result (as long as you do it 
in the sequence written). So, if you set seed, run some code, take a break, 
come back write some more code you  might get in trouble because R is still 
using the original set.seed() command.
To solve this issue use
set.seed(Sys.time())

Or

set.seed(NULL)

Some of this is just good programming style workflow:

Import data
Declare variables and constants (set.seed() typically goes here)
Define functions
Body of code
Generate output
Clean up ( set.seed(NULL) would go here, along with removing unused variables 
and such)

Regards,
Tim

From: Neha gupta mailto:neha.bologn...@gmail.com>>
Sent: Tuesday, March 22, 2022 10:48 AM
To: Ebert,Timothy Aaron mailto:teb...@ufl.edu>>
Cc: Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>>; 
r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: How important is set.seed

[External Email]

Hello Tim

In some of the examples I see in the tutorials, they put the random seed just 
before the model training e.g train function in case of caret library. Should I 
follow this?

Best regards
On Tuesday, March 22, 2022, Ebert,Timothy Aaron 
mailto:teb...@ufl.edu>> wrote:
Ah, so maybe what you need is to think of “set.seed()” as a treatment in an 
experiment. You could use a random number generator to select an appropriate 
number of seeds, then use those seeds repeatedly in the different models to see 
how seed selection influences outcomes. I am not quite sure how many seeds 
would constitute a good sample. For me that would depend on what I find and how 
long a run takes.
  In parallel processing you set seed in master and then use a random number 
generator to set seeds in each worker.
Tim

From: Neha gupta mailto:neha.bologn...@gmail.com>>
Sent: Tuesday, March 22, 2022 6:33 AM
To: Ebert,Timothy Aaron mailto:teb...@ufl.edu>>
Cc: Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>>; 
r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: How important is set.seed

[External Email]
Thank you all.

Actually I need set.seed because I have to evaluate the consistency of features 
selection generated by different models, so I think for this, it's recommended 
to use the seed.

Warm regards

On Tuesday, March 22, 2022, Ebert,Timothy Aaron 
mailto:teb...@ufl.edu>> wrote:
If you are

Re: [R] How important is set.seed

2022-03-22 Thread Ebert,Timothy Aaron

Not wrong, just mostly different words.
1) I think of reproducible code as something for teaching or sharing. It can be 
useful in debugging if I want help (one reason for sharing). In solo debugging 
my code, I have not used set.seed() -- at least not yet. However, my programs 
are all small, mostly less than 100 lines of code.
2) Agreed. 
3) Agreed -- one needs to be very clear on why one is using set seed(). In many 
situations it is undoing the purpose of using a random number generator. 
4) Agreed -- this is why it is so important to publish the version of R and the 
package used when presenting results. A great deal of effort has gone into 
building and selecting a good RNG. Depending on how the RNG is used, a basic 
understanding of what defines "good" is valuable. If there are huge numbers of 
calls to the RNG then periodicity in the RNG may start making a difference. 
Random.org might be another place for the OP to explore.

Tim

-Original Message-
From: Bert Gunter  
Sent: Tuesday, March 22, 2022 12:12 PM
To: Neha gupta 
Cc: Ebert,Timothy Aaron ; r-help@r-project.org
Subject: Re: [R] How important is set.seed

[External Email]

OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But...

1. set.seed() is used to make any procedure that uses R's pseudo-random number 
generator -- including, for example, sampling from a distribution, random data 
splitting, etc. -- "reproducible".
That is, if the procedure is repeated *exactly,* by invoking
set.seed() with its original argument values (once!) *before* the procedure 
begins, exactly the same results should be produced by the procedure. Full 
stop. It does not matter how many times random number generation occurs within 
the procedure thereafter -- R preserves the state of the rng between 
invocations (but see the notes in ?set.seed for subtle qualifications of this 
claim).

2. Hence, if no (pseudo-) random number generation is used, set.seed() is 
irrelevant. Full stop.

3. Hence, if you don't care about reproducibility (you should! -- if for no 
other reason than debugging), you don't need set.seed()

4. The "randomness" of any sequence of results from any particular
set.seed() arguments (including further calls to the rng) is a complex issue. 
?set.seed has some discussion of this, but one needs considerable expertise to 
make informed choices here. As usual, we untutored users should be guided by 
the expert recommendations of the Help file.

*** If anything I have said above is wrong, I would greatly appreciate a public 
response here showing my error.***

Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Mar 22, 2022 at 7:48 AM Neha gupta  wrote:
>
> Hello Tim
>
> In some of the examples I see in the tutorials, they put the random 
> seed just before the model training e.g train function in case of caret 
> library.
> Should I follow this?
>
> Best regards
> On Tuesday, March 22, 2022, Ebert,Timothy Aaron  wrote:
>
> > Ah, so maybe what you need is to think of “set.seed()” as a 
> > treatment in an experiment. You could use a random number generator 
> > to select an appropriate number of seeds, then use those seeds 
> > repeatedly in the different models to see how seed selection 
> > influences outcomes. I am not quite sure how many seeds would 
> > constitute a good sample. For me that would depend on what I find and how 
> > long a run takes.
> >
> >   In parallel processing you set seed in master and then use a 
> > random number generator to set seeds in each worker.
> >
> > Tim
> >
> >
> >
> > *From:* Neha gupta 
> > *Sent:* Tuesday, March 22, 2022 6:33 AM
> > *To:* Ebert,Timothy Aaron 
> > *Cc:* Jeff Newmiller ; 
> > r-help@r-project.org
> > *Subject:* Re: How important is set.seed
> >
> >
> >
> > *[External Email]*
> >
> > Thank you all.
> >
> >
> >
> > Actually I need set.seed because I have to evaluate the consistency 
> > of features selection generated by different models, so I think for 
> > this, it's recommended to use the seed.
> >
> >
> >
> > Warm regards
> >
> > On Tuesday, March 22, 2022, Ebert,Timothy Aaron  wrote:
> >
> > If you are using the program for data analysis then set.seed() is 
> > not necessary unless you are developing a reproducible example. In a 
> > standard analysis it is mostly counter-productive because one should 
> > then ask if your presented results are an artifact of a specific 
> > seed that you selected to get a particular result. However, in cases 
>

Re: [R] How important is set.seed

2022-03-22 Thread Ebert,Timothy Aaron

I would also disagree with your rephrasing. What is the point in characterizing 
if there is no understanding? What one wants is to understand the variability 
in outcome caused by including a random element in the model if the focus is on 
the random numbers. It may also be that one wants to understand the variability 
in outcome if one were to repeat an experiment. One approach is to split a 
dataset into testing and training sets, and use the RNG to decide which 
observation goes into which set. However, every run will give a slightly 
different answer.
The random number generator is then used in place of a permutation test where 
the number of permutations is too large for current computational effort.

I assume what the OP was asking is whether the conclusion(s) of two (or more) 
models were the same given the range in outcomes produced by the random number 
generator(s). The only way to address this is to characterize the distribution 
of model outcomes from different runs with different random seeds. Examine that 
characterization and hope for understanding.

Tim

From: Bert Gunter 
Sent: Tuesday, March 22, 2022 2:03 PM
To: Ebert,Timothy Aaron 
Cc: Neha gupta ; r-help@r-project.org
Subject: Re: [R] How important is set.seed

[External Email]
"rather to understand how the choice of seed influences final model output."

No! Different seeds just produce different streams of (pseudo)-random numbers.  
Hence there cannot be any "understanding" of how "choice of seed" influences 
results.  Presumably, what you meant is to characterize the variability in 
results from the procedure due to its incorporation of randomness in what it 
does. Re-read Jeff's last post.  This does *not* require set.seed() at all.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Mar 22, 2022 at 9:55 AM Ebert,Timothy Aaron 
mailto:teb...@ufl.edu>> wrote:
So step 1 is not to compare models, rather to understand how the choice of seed 
influences final model output. Once you have a handle on this issue, then work 
at comparing models.
Tim

From: Neha gupta mailto:neha.bologn...@gmail.com>>
Sent: Tuesday, March 22, 2022 12:19 PM
To: Bert Gunter mailto:bgunter.4...@gmail.com>>
Cc: Ebert,Timothy Aaron mailto:teb...@ufl.edu>>; 
r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: [R] How important is set.seed

[External Email]
I read a paper two days ago (and that's why I then posted here about set.seed) 
which used interpretable machine learning.

According to the authors, different explanations (of the black-box models) will 
be produced by the ML models if different seeds are used or never used.

On Tue, Mar 22, 2022 at 5:12 PM Bert Gunter 
mailto:bgunter.4...@gmail.com>> wrote:
OK, I'm somewhat puzzled by this discussion. Maybe I'm just clueless. But...

1. set.seed() is used to make any procedure that uses R's
pseudo-random number generator -- including, for example, sampling
from a distribution, random data splitting, etc. -- "reproducible".
That is, if the procedure is repeated *exactly,* by invoking
set.seed() with its original argument values (once!) *before* the
procedure begins, exactly the same results should be produced by the
procedure. Full stop. It does not matter how many times random number
generation occurs within the procedure thereafter -- R preserves the
state of the rng between invocations (but see the notes in ?set.seed
for subtle qualifications of this claim).

2. Hence, if no (pseudo-) random number generation is used, set.seed()
is irrelevant. Full stop.

3. Hence, if you don't care about reproducibility (you should! -- if
for no other reason than debugging), you don't need set.seed()

4. The "randomness" of any sequence of results from any particular
set.seed() arguments (including further calls to the rng) is a complex
issue. ?set.seed has some discussion of this, but one needs
considerable expertise to make informed choices here. As usual, we
untutored users should be guided by the expert recommendations of the
Help file.

*** If anything I have said above is wrong, I would greatly appreciate
a public response here showing my error.***

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Mar 22, 2022 at 7:48 AM Neha gupta 
mailto:neha.bologn...@gmail.com>> wrote:
>
> Hello Tim
>
> In some of the examples I see in the tutorials, they put the random seed
> just before the model training e.g train function in case of caret library.
> Should I follow this?
>
> Best regards
> On Tuesday, March 22, 2022, Ebert,Timothy Aaron 
>

Re: [R] What is the intended behavior, when subsetting using brackets [ ], when the subset criterion has NA's?

2022-04-06 Thread Ebert,Timothy Aaron

I get an error with this:
my_subset_criteria <- c( F, F, T, NA, NA) my_subset_criteria

Tim

-Original Message-
From: R-help  On Behalf Of Kelly Thompson
Sent: Wednesday, April 6, 2022 4:13 PM
To: r-help@r-project.org
Subject: [R] What is the intended behavior, when subsetting using brackets [ ], 
when the subset criterion has NA's?

[External Email]

I noticed that I get different results when subsetting using subset, compared 
to subsetting using  "brackets" when the subset criteria have NA's.

Here's an example

#START OF EXAMPLE
my_data <- 1:5
my_data

my_subset_criteria <- c( F, F, T, NA, NA) my_subset_criteria

#subsetting using subset returns the data where my_subset_criteria equals TRUE 
my_data[my_subset_criteria == T]

#subsetting using brackets returns the data where my_subset_criteria equals 
TRUE, and also NA where my_subset_criteria is NA subset(my_data, 
my_subset_criteria == T)

#END OF EXAMPLE

This behavior is also mentioned here
https://urldefense.proofpoint.com/v2/url?u=https-3A__statisticaloddsandends.wordpress.com_2018_10_07_subsetting-2Din-2Dthe-2Dpresence-2Dof-2Dnas_&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=NoPFkG0n9RFRaacmiiQ9Hp1cGniz9ED5YGN11-Jh6rD_zkTTE8e5egsKqzQDMSEW&s=5lgkxT5A_MSfElILNk1ZM3RGpcBWpMBu713av1DH1mk&e=

Q. Is this the intended behavior when subsetting with brackets?

Thank you!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=NoPFkG0n9RFRaacmiiQ9Hp1cGniz9ED5YGN11-Jh6rD_zkTTE8e5egsKqzQDMSEW&s=g9IzSC3WrXPLYjys_RdYSmgUoFFjsbwRJZZodqtDRa0&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=NoPFkG0n9RFRaacmiiQ9Hp1cGniz9ED5YGN11-Jh6rD_zkTTE8e5egsKqzQDMSEW&s=uy6rCSNVehGynLn3ZCpLp_r2gHhoGcya4dbRe-tqQRc&e=
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error with text analysis data

2022-04-13 Thread Ebert,Timothy Aaron

Is this a different question from the original post? It would be better to keep 
threads separate.
Always pre-process the data. Clean the data of obvious mistakes. This can be 
simple typographical errors or complicated like an author that wrote too when 
they intended two or to. In old English texts spelling was not standardized and 
the same word could have multiple spellings within one book or chapter. 
Removing punctuation is probably a part of this, though a program like 
Grammarly would not work very well if it removed punctuation.

After that it depends on what you are trying to accomplish. Are you interested 
in the number of times an author used the word “a” or “the” and is “The” 
different from “the?” Are you modeling word use frequency or comparing 
vocabulary between texts.

Too many choices.

Tim

From: Neha gupta 
Sent: Wednesday, April 13, 2022 2:49 PM
To: Bill Dunlap 
Cc: Ebert,Timothy Aaron ; r-help mailing list 

Subject: Re: Error with text analysis data

[External Email]
Someone just told me that you need to pre process the data before model 
construction. For instance, make the text to lower case, remove punctuation, 
symbols etc and tokenize the text (give number to each word). Then create word 
of bags model (not sure about it), and then create a model.

Is it true to perform all these steps?

Best regards

On Wednesday, April 13, 2022, Bill Dunlap 
mailto:williamwdun...@gmail.com>> wrote:
>  I would always suggest working until the model works, no errors and no NA 
> values

We agree on that.  However, the error gives you no hint about which variables 
are causing the problem.  If it did, then it could only tell about the first 
variable with the problem.  I think you would get to your working model faster 
if you got NA's for the constant columns and then could drop them all at once 
(or otherwise deal with them).

-Bill

On Wed, Apr 13, 2022 at 9:40 AM Ebert,Timothy Aaron 
mailto:teb...@ufl.edu>> wrote:
I suspect that it is because you are looking at two types of error, both 
telling you that the model was not appropriate. In the “error in contrasts” 
there is nothing to contrast in the model. For a numerical constant the program 
calculates the standard deviation and ends with a division by zero. Division by 
zero is undefined, or NA.

I would always suggest working until the model works, no errors and no NA 
values. The reason is that I can get NA in several ways and I need to 
understand why. If I just ignore the NA in my model I may be assuming the wrong 
thing.

Tim

From: Bill Dunlap mailto:williamwdun...@gmail.com>>
Sent: Wednesday, April 13, 2022 12:23 PM
To: Ebert,Timothy Aaron mailto:teb...@ufl.edu>>
Cc: Neha gupta mailto:neha.bologn...@gmail.com>>; 
r-help mailing list mailto:r-help@r-project.org>>
Subject: Re: [R] Error with text analysis data

[External Email]
Constant columns can be the model when you do some subsetting or are exploring 
a new dataset.  My objection is that constant columns of numbers and logicals 
are fine but those of characters and factors are not.

-Bill

On Wed, Apr 13, 2022 at 9:15 AM Ebert,Timothy Aaron 
mailto:teb...@ufl.edu>> wrote:
What is the goal of having a constant in the model? To me that seems pointless. 
Also there is no variability in sexCode regardless of whether you call it 
integer or factor. So the model y ~ sexCode is just a strange way to look at 
the variability in y and it would be better to do something like summarize(y) 
or mean(y) if that was the goal.

Tim

-Original Message-
From: R-help 
mailto:r-help-boun...@r-project.org>> On Behalf 
Of Bill Dunlap
Sent: Wednesday, April 13, 2022 9:56 AM
To: Neha gupta mailto:neha.bologn...@gmail.com>>
Cc: r-help mailing list mailto:r-help@r-project.org>>
Subject: Re: [R] Error with text analysis data

[External Email]

This sounds like what I think is a bug in stats::model.matrix.default(): a 
numeric column with all identical entries is fine but a constant character or 
factor column is not.

> d <- data.frame(y=1:5, sex=rep("Female",5)) d$sexFactor <-
> factor(d$sex, levels=c("Male","Female")) d$sexCode <-
> as.integer(d$sexFactor) d
  ysex sexFactor sexCode
1 1 FemaleFemale   2
2 2 FemaleFemale   2
3 3 FemaleFemale   2
4 4 FemaleFemale   2
5 5 FemaleFemale   2
> lm(y~sex, data=d)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels
> lm(y~sexFactor, data=d)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels
> lm(y~sexCode, data=d)

Call:
lm(formula = y ~ sexCode, data = d)

Coefficients:
(Intercept)  sexCode
  3   NA

Calling traceback() after the error would clarify this.

-Bill

On Tue, Apr 12, 2022 at 3:12 PM Neha

Re: [R] Symbol/String comparison in R

2022-04-13 Thread Ebert,Timothy Aaron

https://en.wikipedia.org/wiki/ASCII
There is a table towards the end of the document. Some of the other pieces may 
be of interest and/or relevant.

Tim

-Original Message-
From: R-help  On Behalf Of Kristjan Kure
Sent: Wednesday, April 13, 2022 10:06 AM
To: r-help@r-project.org
Subject: [R] Symbol/String comparison in R

[External Email]

Hi!

Sorry, I am a beginner in R.

I was not able to find answers to my questions (tried Google, Stack Overflow, 
etc). Please correct me if anything is wrong here.

When comparing symbols/strings in R - raw numeric values are compared symbol by 
symbol starting from left? If raw numeric values are not used is there an ASCII 
/ Unicode table where symbols have values/ranking/order and R compares those 
values?

*2) Comparing symbols*
Letter "a" raw value is 61, letter "b" raw value is 62? Is this correct?

# Raw value for "a" = 61
a_raw <- charToRaw("a")
a_raw

# Raw value for "b" = 62
b_raw <- charToRaw("b")
b_raw

# equals TRUE
"a" < "b"

Ok, so 61 is less than 62 so it's TRUE. Is this correct?

*3) Comparing strings #1*
"1040" <= "12000"

raw_1040 <- charToRaw("1040")
raw_1040
#31 *30* (comparison happens with the second symbol) 34 30

raw_12000 <- charToRaw("12000")
raw_12000
#31 *32* (comparison happens with the second symbol) 30 30 30

The symbol in the second position is 30 and it's less than 32. Equals to true. 
Is this correct?

*4) Comparing strings #2*
"1040" <= "1"

raw_1040 <- charToRaw("1040")
raw_1040
#31 30 *34*  (comparison happens with third symbol) 30

raw_1 <- charToRaw("1")
raw_1
#31 30 *30*  (comparison happens with third symbol) 30 30

The symbol in the third position is 34 is greater than 30. Equals to false.
Is this correct?

*5) Problem - Why does this equal FALSE?* *"A" < "a"*

41 < 61 # FALSE?

# Raw value for "A" = 41
A_raw <- charToRaw("A")
A_raw

# Raw value for "a" = 61
a_raw <- charToRaw("a")
a_raw

Why is capitalized "A" not less than lowercase "a"? Based on raw values it 
should be. What am I missing here?

Thanks
Kristjan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9E-P8HOWO0s4h1p__tW4o8QGtge3bJ9VUJEDH-e-U_8OKRu2p1zazebKjPltKrWM&s=rhYKCkMRBFMzOVf8rVaRiO1Puh-rTSWAS8P6hoSzdgc&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9E-P8HOWO0s4h1p__tW4o8QGtge3bJ9VUJEDH-e-U_8OKRu2p1zazebKjPltKrWM&s=fI_1ZAYJFp1nrJkOV4i4ueqf4o1MD1gKHzb6AyciJUc&e=
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Code Execution taking forever

2022-04-24 Thread Ebert,Timothy Aaron

1) Does it run perfectly with num_tirals_6 <- 100 ?
2) Rework the code to remove as much as possible from loops. 
Renaming column names each time through the loop seems pointless.
Is the nested for loops converting the dice roll to person name 
necessary within the while loop?
3) Stop all other apps on the computer.
4) Consider rewriting to take advantage of multiple cores in your system in 
parallel processing (this might or might not help much).
5) Rerun with num_trials_6 set to different values 10, 100, 1000, and 1. 
Linear regression with run time and trial size should let you estimate run time 
for 1 million.


Tim 

-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Sunday, April 24, 2022 5:44 AM
To: Paul Bernal ; R 
Subject: Re: [R] R Code Execution taking forever

[External Email]

Hello,

I'm having trouble running the code, where does function dice come from?
CRAN package dice only has two functions,

getEventProb
getSumProbs

not a function dice.

Can you post a link to where the package/function can be found?

Rui Barradas


Às 02:00 de 24/04/2022, Paul Bernal escreveu:
> Dear R friends,
>
> Hope you are doing great. The reason why I am contacting you all, is 
> because the code I am sharing with you takes forever. It started 
> running at
> 2:00 AM today, and it's 7:52 PM and is still running (see code at the 
> end of this mail).
>
> I am using Rx64  4.1.2, and the code is being executed in RStudio. The 
> RStudio version I am currently using is Version 2022.02.0 Build 443 
> "Prairie Trillium" Release (9f796939, 2022-02-16) for Windows.
>
> My PC specs:
> Processor: Intel(R) Core(TM) i5-10310U CPU @ 1.70 GHz Installed RAM: 
> 16.0 GB (15.6 GB usable) System type: 64-bit operating system, 
> x64-based processor Local Disc(C:) Free Space: 274 GB
>
> I am wondering if there is/are a set of system variable(s) or 
> something I could do to improve the performance of the program.
>
> It is really odd this code has taken this much (and it is still running).
>
> Any help and/or guidance would be greatly appreciated.
>
> Best regards,
> Paul
>
>
>
>
> #performing 1,000,000 simulations 10 times
> num_trials_6 = 100
> dice_rolls_6 = num_trials_6*12
> num_dice_6 = 1
> dice_sides_6 = 6
>
> prob_frame_6 <- data.frame(matrix(ncol = 10, nrow = 1))
>
> k <- 0
> while(k < 10){
>dice_simul_6 = data.frame(dice(rolls = dice_rolls_6, ndice = 
> num_dice_6, sides = dice_sides_6, plot.it = FALSE))
>
>#constructing matrix containing results of all dice rolls by month
>prob_matrix_6 <- data.frame(matrix(dice_simul_6[,1], ncol = 12, 
> byrow =
> TRUE))
>
>#naming each column by it's corresponding month name
>colnames(prob_matrix_6) <-
> c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","D
> ec")
>
>
>#assigning each person´s name depending on the number showed in the 
> dice once rolled
>for (i in 1:nrow(prob_matrix_6)){
>  for (j in 1:ncol(prob_matrix_6)){
>if (prob_matrix_6[i,j] == 1){
>  prob_matrix_6[i,j] = "Alice"
>}
>if (prob_matrix_6[i,j] == 2){
>  prob_matrix_6[i,j] = "Bob"
>}
>if (prob_matrix_6[i,j] == 3){
>  prob_matrix_6[i,j] = "Charlie"
>}
>if (prob_matrix_6[i,j] == 4){
>  prob_matrix_6[i,j] = "Don"
>}
>if (prob_matrix_6[i,j] == 5){
>  prob_matrix_6[i,j] = "Ellen"
>}
>if (prob_matrix_6[i,j] == 6){
>  prob_matrix_6[i,j] = "Fred"
>}
>
>  }
>}
>
>#calculating column  which will have a 1 if trial was successful 
> and a 0 otherwise
>prob_matrix_6['success'] <- for (i in 1:nrow(prob_matrix_6)){
>  if (("Alice" %in% prob_matrix_6[i,]) & ("Bob" %in% 
> prob_matrix_6[i,]) & ("Charlie" %in% prob_matrix_6[i,]) & ("Don" %in% 
> prob_matrix_6[i,]) & ("Ellen" %in% prob_matrix_6[i,]) & ("Fred" %in% 
> prob_matrix_6[i,])){
>prob_matrix_6[i,13] = 1
>  }else{
>prob_matrix_6[i,13] = 0
>  }
>}
>
>#relabeling column v13 so that its new name is success
>colnames(prob_matrix_6)[13] <- "success"
>
>
>#calculating probability of success
>
>p6 = sum(prob_matrix_6$success)/nrow(prob_matrix_6)
>prob_frame_6 <- cbind(prob_frame_6, p6)
>
>k = k + 1
>
> }
>
> prob_frame_6 <- prob_frame_6[11:20]
> colnames(prob_frame_6) <-
> c("p1","p2","p3","p4","p5","p6","p7","p8","p9","p10")
> average_prob_frame_6 <- rowMeans(prob_frame_6) trial_100_10_frame 
> <- cbind(prob_frame_6, average_prob_frame_6)
> final_frame_6 <- trial_100_10_frame
> colnames(final_frame_6) <-
> c("p1","p2","p3","p4","p5","p6","p7","p8","p9","p10", 
> "avg_prob_frame_5")
>
> write.csv(final_frame_6, "OneMillion_Trials_Ten_Times_Results.csv")
> print(final_frame_6)
> print(paste("The average probability of success when doing 1,000,000 
> trials
> 10 times is:", average_prob_frame_6))
>
>   [[alternative HTML version deleted]]
>
> __

Re: [R] Confusing fori or ifelse result in matrix manipulation

2022-04-25 Thread Ebert,Timothy Aaron

A <- matrix(1:9,ncol=3)
x <- c(0,1,0)
M <- matrix(ncol=3,nrow=3)
M<-A
for(i in 1:3) {
  if(x[i]){
M[,i] <-0
}
  } 
}
M

The outcome you want is to set all of the middle column values to zero. So I 
used x as a logical in an if test and when true everything in that column is 
set to zero.

Your approach also works but you must go through each element explicitly.
A <- matrix(1:9,ncol=3)
x <- c(0,1,0)
M <- matrix(ncol=3,nrow=3)
for(j in 1:3){
  for(i in 1:3){
ifelse(x[i]==1, M[j,i]<-0, M[j,i]<-A[j,i])
  }
}
M



Tim

-Original Message-
From: R-help  On Behalf Of Uwe Freier
Sent: Sunday, April 24, 2022 11:06 AM
To: r-help@r-project.org
Subject: [R] Confusing fori or ifelse result in matrix manipulation

[External Email]

Hello,

sorry for the newbie question but I can't find out where I'm wrong.

A <- matrix(1:9,ncol=3)
x <- c(0,1,0)
M <- matrix(ncol=3,nrow=3)
for(i in 1:3) {
  M[,i] <- ifelse(x[i] == 0, A[,i], 0)
}

expected:

> M
  [,1] [,2] [,3]
[1,]107
[2,]208
[3,]309


but the result is:

> M
  [,1] [,2] [,3]
[1,]107
[2,]107
[3,]107


If I do it "manually":

> M[,1] <- A[,1]
> M[,2] <- 0
> M[,3] <- A[,3]

M is as expected, where is my misconception?

Thanks for any hint and best regards,

Uwe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=eyJm06tVDfKvtMDgz6oIWM-WVdoW3Szzb5G6rq0cCO_cB6ljj2x80E4oRkt3Vgba&s=K2RWPvtxaxwigGGH2oOrg8qiDWC5KTu60b8Wjybwsg4&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=eyJm06tVDfKvtMDgz6oIWM-WVdoW3Szzb5G6rq0cCO_cB6ljj2x80E4oRkt3Vgba&s=L9VXAAYzIzrG2h17hBO-Qfg_EoS2mRQbjs3sRESp62Q&e=
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convert one rectangle into two rectangles

2022-04-27 Thread Ebert,Timothy Aaron

One thought was that these were farm fields in satellite images. I just had no 
clue why one would subdivide them. Maybe a training and testing portion of 
these polygons? At a few dozen meters the curvature might be negligible on a 
sufficiently large spheroid. If 45-55% split is approximate then go through the 
entire polygon area pixel by pixel and use rnorm() to generate a random value 
that puts the pixel in A if <0.4 and in B it >0.4. This has a better chance of 
working if the polygon is centered in the image so that curvature weighs 
equally.

For this post my thought was that it was either homework because the question 
was so simple, or it was not providing enough information to answer the real 
question. The generalized question is to divide an n-sided polygon mapped to 
the surface of a spheroid with semi-axes A, B, C into 45-55% split into two 
parts 45 and 55%. If multiple polygons were spread over the image and the image 
curvature was large relative to the image size I don't see how you could 
recover polygons close to the edge of the spheroid unless you knew beforehand 
the curvature at that point. Possibly in the most general sense the problem is 
unsolvable because angles near the horizon will be nearly straight lines.

Another part of asking questions is to provide enough detail that others may 
arrive at a creative answer. How about a mechanical solution. Print the polygon 
onto paper. Cut the polygon out. Cut the polygon in half. Weigh the halves to 
get actual split. Scan each piece in. You can print several copies and repeat 
until you get a 45-55 split as close as your equipment will measure. I have 
solved the problem as asked, though it did not involve R. I split the polygon 
problem solved or do I need to do something with the pieces? 

I had posted a simple R solution, but was then asked if the solution would work 
for a trapezoid with 21 bottom and 18 top. My solution was not generalizable 
but the question was for a rectangle (four 90 degree angles and parallel 
opposite sides) with sides 18 and 200 meters. 

You could draw polygons on a balloon using a sharpie marker (or similar). 
Photograph the balloon at different distances, and process the images. At least 
this way you have a system to test that part of the program. Correct pixel area 
for curvature, estimate area of polygon, and then figure out what it means to 
split the polygon. If the non-regular polygon is not evenly divisible 45-55 
what happens to the remainder?

Is this problem better handled using GIS techniques?

Tim




-Original Message-
From: R-help  On Behalf Of Avi Gross via R-help
Sent: Wednesday, April 27, 2022 11:27 AM
To: r-help@r-project.org
Subject: Re: [R] Convert one rectangle into two rectangles

[External Email]

Just FYI, Jim, I was sent private mail by Javad that turns the problem aroundso 
not only are there no rectangles, but the problem is not 2-D. He is working 
with Spatial Polygons representing areas on the surface of a sphere (presumably 
Earth) and wants to subdivide them. This is a VERY different topic and there 
are packages and other references that might apply to his needs. I have no idea 
why the 60/40 split by area.
Yes, as a very simplified idea, I understand why he proposed dividing a 
rectangle proportionately but realistically what he is working with is a bit 
more like a trapezoid which is also bent into a third dimension.
So those of us wanting to help on the original problem can stop and, speaking 
for myself, I am going to approach many questions posed here more carefully to 
see if they are well-thought-out or are some kind of fishing expedition. And, 
since it has been made very clear multiple times that the scope of this forum 
is narrow and not meant to help with HW, and I have no idea how to verify it is 
not, ...




-Original Message-
From: Jim Lemon 
To: javad bayat ; r-help mailing list 

Sent: Wed, Apr 27, 2022 5:59 am
Subject: Re: [R] Convert one rectangle into two rectangles

Hi javad,
Let's think about it a little. There are only two ways you can divide a 
rectangle into two other rectangles.
The dividing line must be straight and parallel to one side or the other or you 
won't get two rectangles.
If there are no other constraints, you can take your pick.
A  EB
-
|  |  |
-
CF  D

AE = CF = 0.45 * AB

Alternatively the same equation can be used on the shorter sides.
If you want to cut it across the length, you can do it with an abacus.
While the IrregLong package can produce an abacus plot, I was unable to find an 
R-based abacus.
For that I suggest:
https://urldefense.proofpoint.com/v2/url?u=https-3A__toytheater.com_abacus_&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Tbmiv3MGuN8f7gYSHUOKBKkBAx5goWhCbuyiJMDVP2c3HV-_FfjZ8__7NHiO0K9V&s=Fz2RhXtevzROx

Re: [R] Is there a canonical way to pronounce CRAN?

2022-05-04 Thread Ebert,Timothy Aaron

It would be nice in some ways if everyone would pronounce the same word in the 
same way, but then we could not argue over the correct pronunciation of words 
like tomato or aluminium/aluminum. I think of cran as "Kran". While I had 
German in high school I didnn't remember the German word for crane, so I did 
not consciously make any connection. I thought more of words like crunch, 
crouch, or crayfish to help pronounce cran. "Sea-Ran" also makes some sense, 
but it makes me wonder if the tide is going out or coming back in. Possibly 
many think of this like C-Span (C-Span.org). That would make even more sense if 
we had C-Ran rather than CRAN. I'll just leave "Sea-Run" alone. 
Tim

-Original Message-
From: R-help  On Behalf Of Kevin Thorpe
Sent: Wednesday, May 4, 2022 10:38 AM
To: Roland Rau 
Cc: R Help Mailing List 
Subject: Re: [R] Is there a canonical way to pronounce CRAN?

[External Email]

Interesting.

I have always pronounced it as See-ran. This probably stems from my exposure to 
other archive like CPAN (perl) and CTAN (TeX) that I have been exposed to. 
Obviously the latter two acronyms are unpronounceable as words so I generalized 
the approach to CRAN.

Kevin

> On May 4, 2022, at 7:20 AM, Roland Rau via R-help  
> wrote:
>
> Dear all,
>
> I talked with colleagues this morning and we realized that some people (=me) 
> pronounce CRAN like the German word "Kran" (probably pronounced like "cruhn" 
> in English -- if it was a word).
> My colleague pronounced it as "Sea-Ran" or "Sea-Run". The colleague was a 
> student and has worked at the same institution as an R Core Developer and 
> heard it from him personally.
>
> So now I am puzzled. Have I been wrong about 43% of my life? ;-)
>
> Honestly: Is there a unique way how the core developers prounounce CRAN?
>
> Not an urgent question at all but maybe interesting to many of us.
>
> Thanks,
> Roland
>
> --
> This mail has been sent through the MPI for Demographic 
> ...{{dropped:2}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> man_listinfo_r-2Dhelp&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
> Rzsn7AkP-g&m=anAxC6nLaRTnsHblKIhnuFXziz9rGvolLryp1bna2ydLx6PBZI_yOmVZr
> RnwMSEM&s=RO9LQR9lV58rbI7LC8B_rUTsesnj1D_xV8ovrzBn4jI&e=
> PLEASE do read the posting guide 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
> g_posting-2Dguide.html&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> sRzsn7AkP-g&m=anAxC6nLaRTnsHblKIhnuFXziz9rGvolLryp1bna2ydLx6PBZI_yOmVZ
> rRnwMSEM&s=Wlt5ym2p1MJIbbvh0nwyvPunaZJTWyZBx0o6qhe6zQo&e=
> and provide commented, minimal, self-contained, reproducible code.

--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC) Li Ka Shing 
Knowledge Institute of St. Michael’s Hospital Assistant Professor, Dalla Lana 
School of Public Health University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=anAxC6nLaRTnsHblKIhnuFXziz9rGvolLryp1bna2ydLx6PBZI_yOmVZrRnwMSEM&s=RO9LQR9lV58rbI7LC8B_rUTsesnj1D_xV8ovrzBn4jI&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIGaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=anAxC6nLaRTnsHblKIhnuFXziz9rGvolLryp1bna2ydLx6PBZI_yOmVZrRnwMSEM&s=Wlt5ym2p1MJIbbvh0nwyvPunaZJTWyZBx0o6qhe6zQo&e=
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R and .asc file extension

2022-05-20 Thread Ebert,Timothy Aaron

A google search returned a stack overflow page that might help.
stackoverflow.com/questions/20177581/reading-an-asc-file-into-r
(add the https part to get a functional link.)

I would also try looking at the file using something like notebook, or any 
program that is a plain text editor. That way I can see exactly what the file 
contains.

Tim




-Original Message-
From: R-help  On Behalf Of Thomas Subia via R-help
Sent: Friday, May 20, 2022 9:27 AM
To: r-help@r-project.org
Subject: [R] R and .asc file extension

[External Email]

Colleagues,

I have data which has a .asc file extension.

Can R read that file extension?

All the best,

Thomas Subia
Statistician

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=OuZijT4yJ8dfQLVPVj4YWSCPkpUW4wkGCSHim27tbH9qYMjXbuMSxybv7i0cvWHl&s=7eUP-gSu0ZAEL_iEdxyJwCEEE7PQF15Fplf_dGvzMfM&e=
PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=OuZijT4yJ8dfQLVPVj4YWSCPkpUW4wkGCSHim27tbH9qYMjXbuMSxybv7i0cvWHl&s=0Y81eoA64cMK6zoqQGZoDecTemarTEV4UxqvF-FSuMA&e=
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Suggestions as to how to proceed would be appreciated...............

2022-05-22 Thread Ebert,Timothy Aaron

Would lm, nls, or nlme work for what you need? 
Tim

-Original Message-
From: R-help  On Behalf Of Bernard Comcast
Sent: Sunday, May 22, 2022 3:01 PM
To: Bert Gunter 
Cc: R-help@r-project.org
Subject: Re: [R] Suggestions as to how to proceed would be 
appreciated...

[External Email]

Its simply a query to know what tools/packages R has for correlating single 
values with multivalued vectors. If that is outside the scope of the PG then so 
be it.

Bernard

Sent from my iPhone so please excuse the spelling!"

> On May 22, 2022, at 1:52 PM, Bert Gunter  wrote:
>
> 
> Please read the posting guide(PG) inked below. Your query sounds more like a 
> project that requires a paid consultant; if so, this is way beyond the scope 
> of this list as described in the PG. So don't be too surprised if you don't 
> get a useful response, which this isn't either of course.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>> On Sun, May 22, 2022 at 10:40 AM Bernard McGarvey 
>>  wrote:
>> I work in aspects of Cold Chain transportation in the pharmaceutical 
>> industry. These shippers are used to transport temperature sensitive 
>> products by surrounding the product load box with insulating materials of 
>> various sorts. The product temperature has lower and upper allowed limits so 
>> that when the product temperature hits one of these limits, the shipper 
>> fails and this failure time is teh shipper duration. If the shipper is 
>> exposed to very low or very high ambient temperatures during a shipment then 
>> we expect the duration of the shipper to be low.
>>
>> The particular problem I am currently undertaking is to create a fast way to 
>> predict the duration of a shipping container when it is exposed to a given 
>> ambient temperature.
>>
>> Currently we have the ability to predict such durations using a calibrated 
>> 3D model (typically a finite element or finite volume transient 
>> representation of the heat transfer equations). These models can predict the 
>> temperature of the pharmaceutical product within the shipper over time as it 
>> is exposed to an external ambient temperature profile. .
>>
>> The problem with the 3D model is that it takes significant CPU time and the 
>> software is specialized. What I would like to do is to be able to enter the 
>> ambient profile into a spreadsheet and then be able to predict the expected 
>> duration of the shipper using a simple calculation that can be implemented 
>> in the spreadsheet environment. The idea I had was as follows:
>>
>> 1. Create a selection of ambient temperature profiles covering a wide range 
>> of ambient behavior. Ensure the profiles are long enough so that the shipper 
>> is sure to fail at some time during the ambient profile.
>>
>> 2. Use the 3D model to predict the shipper duration for the selection of 
>> ambient temperature profiles in (1). Each ambient temperature will have its 
>> own duration.
>>
>> 3. Since only the ambient temperatures up to the duration time are relevant, 
>> truncate each ambient profile for times greater than the duration.
>>
>> 4. Step (3) means that the ambient temperature profiles will have different 
>> lengths corresponding to the different durations.
>>
>> 5. Use the truncated ambient profiles and their corresponding durations to 
>> build some type of empirical model relating the duration to the 
>> corresponding ambient profile.
>>
>> Some other notes:
>>
>> a. We know from our understanding of how the shippers are constructed and 
>> the laws of heat transfer that some sections of the ambient profile will 
>> have more of an impact on determining the duration that other sections.
>> b. Just correlating the duration with the average temperature of the profile 
>> can predict the duration for that profile to within 10-15%. We are looking 
>> for the ability to get within 2% of the shipper duration predicted by the 3D 
>> model.
>>
>> What I am looking for is suggestions as to how to approach step (5) with 
>> tools/packages available in R.
>>
>> Thanks in advance
>>
>> Bernard McGarvey, Ph.D.
>>
>> Technical Advisor
>> Parenteral Supply Chain LLC
>>
>> bernard.first.princip...@gmail.com 
>> mailto:bernard.first.princip...@gmail.com
>>
>> (317) 627-4025
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mai
>> lman_listinfo_r-2Dhelp&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVe
>> AsRzsn7AkP-g&m=Pp0V5t70tdzhilCe5gEo6fR1inb2-RkIPZG4jtPvyUSKaWIjhiPEuI
>> 3ROOS_GDYh&s=O1gCEtqvVraPdsMGEWP_LcLa8IFCzatSw16BRhUsEF8&e=
>> PLEASE do read the posting guide 
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.o
>> rg_posting-2Dguide.h

1 2 3 >

1 - 100 of 285 matches

Mail list logo