Re: [R] Help with big data and parallel computing: 500, 000 x 4 linear models

2016-08-08 Thread Aaron Mackey
Don't run 500K separate models. Use the limma package to fit one model that can learn the variance parameters jointly. Run it on your laptop. And don't use %methylation as your Y variable, use logit(percent), i.e. the Beta value. -Aaron On Mon, Aug 8, 2016 at 2:49 PM, Ellis, Alicia M wrote: > I

Re: [R] combine glmnet and coxph (and survfit) with strata()

2013-12-09 Thread Aaron Mackey
I'm also curious how to use glmnet with survfit -- specifically, for use with interval regression (which, under the hood, is implemented using survfit). Can you show how you converted your Surv object formula to a design matrix for use with glmnet? Thanks, -Aaron On Sun, Dec 8, 2013 at 12:45 AM

Re: [R] Renaming variables

2013-09-20 Thread Aaron Mackey
On Fri, Sep 20, 2013 at 10:10 AM, Preetam Pal wrote: > I have 25 variables in the data file (name: score), i.e. X1,X2,.,X25. > > I dont want to use score$X1, score$X2 everytime I use these variables. > attach(score) plot(X1, X2) # etc. etc. -Aaron [[alternative HTML version delete

Re: [R] XYZ data

2013-06-26 Thread Aaron Mackey
for plotting purposes, I typically jitter() the x's and y's to see the otherwise overlapping data points -Aaron On Wed, Jun 26, 2013 at 12:29 PM, Shane Carey wrote: > Nope, neither work. :-( > > > On Wed, Jun 26, 2013 at 5:16 PM, Clint Bowman wrote: > > > John, > > > > That still leaves a strin

Re: [R] Dependency-aware scripting tools for R

2012-04-19 Thread Aaron Mackey
shameless self-plug: we break out of R to do this, and after many painful years developing and maintaining idiosyncratic Makefiles, we are now using Taverna to (visually) glue together UNIX commands (including R scripts) -- the benefits of which (over make and brethren) is that you can actually *se

Re: [R] HWEBayes, swapping the homozygotes genotype frequencies

2011-10-09 Thread Aaron Mackey
Without really knowing this code, I can guess that it may be the "triangular" prior at work. Bayes Factors are notorious for being sensitive to the prior. Presumably, the prior somehow prefers to see the rarer allele as the "BB", and not the "AA" homozygous genotype (this is a common assumption:

Re: [R] Hardy Weinberg

2011-06-22 Thread Aaron Mackey
H-W only gives you the expected frequency of AA, AB, and BB genotypes (i.e. a 1x3 table): minor <- runif(1, 0.05, 0.25) major <- 1-minor AA <- minor^2 AB <- 2*minor*major BB <- major^2 df <- cbind(AA, AB, BB) -Aaron On Tue, Jun 21, 2011 at 9:30 PM, Jim Silverton wrote: > Hello all, > I am

Re: [R] Shrink file size of pdf graphics

2011-05-19 Thread Aaron Mackey
You can try something like this, at the command line: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf evidently, the new compactPDF() function in R 2.13 does something very similar. -Aaron On Thu, May 19, 2011 at

Re: [R] Reporting odds ratios or risk ratios from GLM

2011-03-15 Thread Aaron Mackey
OR <- exp(coef(GLM.2)[-1]) OR.ci <- exp(confint(GLM.2)[-1,]) -Aaron On Tue, Mar 15, 2011 at 1:25 PM, lafadnes wrote: > I am a new R user (am using it through the Rcmdr package) and have > struggled > to find out how to report OR and RR directly when running GLM models (not > only reporting coef

Re: [R] Complex sampling?

2011-03-09 Thread Aaron Mackey
What I think you need is something along the lines of: matrix(c(sample(3:7), sample(3:7), sample(3:7), sample(3:7), ...), nrow=2) now, each column are your random pairs. -Aaron On Wed, Mar 9, 2011 at 1:01 PM, Hosack, Michael wrote: > > -Original Message- > > From: r-help-bounces at r-

Re: [R] optimization challenge

2010-01-13 Thread Aaron Mackey
FYI, in bioinformatics, we use dynamic programming algorithms in similar ways to solve similar problems of finding guaranteed-optimal partitions in streams of data (usually DNA or protein sequence, but sometimes numerical data from chip-arrays). These "path optimization" algorithms are often calle

[R] hmm.discnp or other?

2009-08-12 Thread Aaron Mackey
(I think) I'd like to use the hmm.discnp package for a simple discrete, two-state HMM, but my training data is irregularly shaped (i.e. the observation chains are of varying length). Additionally, I do not see how to label the state of the observations given to the hmm() function. Ultimately, I'd

Re: [R] an S idiom for ordering matrix by columns?

2009-02-19 Thread Aaron Mackey
Thanks to all, "do.call(order, as.data.frame(y))" was the idiom I was missing! -Aaron On Thu, Feb 19, 2009 at 11:52 AM, Gustaf Rydevik wrote: > On Thu, Feb 19, 2009 at 5:40 PM, Aaron Mackey wrote: > > There's got to be a better way to use order() on a matrix than this:

[R] an S idiom for ordering matrix by columns?

2009-02-19 Thread Aaron Mackey
There's got to be a better way to use order() on a matrix than this: > y 2L-035-3 2L-081-23 2L-143-18 2L-189-1 2R-008-5 2R-068-15 3L-113-4 3L-173-2 3981 1 221 12 2 8571 1 221 22 2 91

Re: [R] database table merging tips with R

2008-09-11 Thread Aaron Mackey
nt fashion. If not, I will see if I > can create/drop the temp table directly from sqlQuery. > -Avram > > > > On Thursday, September 11, 2008, at 12:07PM, "Aaron Mackey" <[EMAIL > PROTECTED]> wrote: >>Sorry, I see now you want to avoid this, but you d

Re: [R] database table merging tips with R

2008-09-11 Thread Aaron Mackey
Sorry, I see now you want to avoid this, but you did ask what was the "best way to efficiently ...", and the temp. table solution certainly matches your description. What's wrong with using a temporary table? -Aaron On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey <[EMAIL PROT

Re: [R] database table merging tips with R

2008-09-11 Thread Aaron Mackey
I would load your set of userid's into a temporary table in oracle, then join that table with the rest of your SQL query to get only the matching rows out. -Aaron On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony <[EMAIL PROTECTED]> wrote: > > Dear R list, > > What is the best way to efficiently marr

Re: [R] Gumbell distribution - minimum case

2008-09-09 Thread Aaron Mackey
If you mean you want an EVD with a fat left tail (instead of a fat right tail), then can;t you just multiply all the values by -1 to "reverse" the distribution? A new location parameter could then shift the distribution wherever you want along the number line ... -Aaron On Mon, Sep 8, 2008 at 5:

[R] hex2RGB back to hex not the same?

2008-08-28 Thread Aaron Mackey
Witness this oddity (to me): > rainbow_hcl(10)[1] [1] "#E18E9E" > d <- attributes(hex2RGB(rainbow_hcl(10)))$coords[1,] > rgb(d[1], d[2], d[3]) [1] "#C54D5F" What happened? FYI, this came up as I'm trying to reuse the RGB values I get from rainbow_hcl in a call to rgb() where I can also set alpha

[R] lmer syntax, matrix of (grouped) covariates?

2008-08-18 Thread Aaron Mackey
I have a fairly large model: > length(Y) [1] 3051 > dim(covariates) [1] 3051 211 All of these 211 covariates need to be nested hierarchically within a grouping "class", of which there are 8. I have an accessory vector, " cov2class" that specifies the mapping between covariates and the 8 classes