Re: [R] Counting occurances of a letter by a factor
I fiddled around and found this solution, which is far from elegant, but it doesn't require you to know the factor levels in advance. t <- with(DF, tapply(as.character(X), Y, table)) lapply(t, function(x) table(strsplit(paste(names(x),collapse=""),split=""))) Darin On Fri, Sep 10, 2010 at 02:40:50PM -0500, Davis, Brian wrote: > I'm trying to find a more elegant way of doing this. What I'm trying to > accomplish is to count the frequency of letters (major / minor alleles) in > a string grouped by the factor levels in another column of my data frame. > > Ex. > > DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", > > "L", NA)) > > colnames(DF)<-c("X", "Y") > > DF > XY > 1 CCL > 2 CCU > 3 L > 4 CGU > 5 GGL > 6 GC > > I have an ugly solution, which works if you know the factor levels of Y in > advance. > > > ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), > > ""))), > + table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'U', 1]), "" > > rownames(ans)<-c("L", "U") > > ans > C G > L 2 2 > U 3 1 > > > I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't > found a combination that gives a more general solution to this problem. > > Any ideas? > > Brian > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting random effects from model formula
I've found using the arm package to be very useful. require(arm) then use ranef(Full_model) and fixef(Full_model) On Wed, Sep 22, 2010 at 05:39:41PM +1000, Lorenzo Cattarino wrote: > Hi again, > > > > Sorry, probably I was not clear enough. > > > > I was wondering if there is a way in R to identify (and extract) only > the random effects, which, because I am using the lmer function, are the > terms with the symbol | on the left side of the grouping variable > ("SITE" in my example). > > > > Thanks > > > > Lorenzo > > From: Lorenzo Cattarino > Sent: Wednesday, 22 September 2010 5:23 PM > To: r-help@r-project.org > Subject: extracting random effects from model formula > > > > Hi R-users > > > > I would like to extract the random effects ("1|SITE", "1+SPECIES|SITE" > and "BA|SITE") from this model formula: > > > > Full_model <- formula (VAR ~ (1|SITE) + (1+SPECIES|SITE) + (BA|SITE) + > HEIGHT + COND + NN_DIST) > > > > I tried: > > > > terms(Full_model) > > labels(terms(Full_model)) > > > > but I could not distinguish between random and fixed effects. > > > > thanks > > > > Lorenzo > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standard Error for difference in predicted probabilities
I think you can use the bootstrap to obtain the std error. My attempt for your problem and data is below. I would be interested if anyone can point out a problem with this approach. Darin y=rbinom(100,1,.4) x1=rnorm(100, 3, 2) x2=rbinom(100, 1, .7) diff <- vector(mode="numeric", length=200) for (i in 1:200) { idx <- sample(1:length(y), length(y), replace=T) mod=glm(y[idx] ~ x1[idx] + x2[idx], family=binomial) pred=predict(mod, newdata=data.frame(cbind(x1=rep(mean(x1[idx]), 100), x2=x2[idx])), type="response") diff[i]=unique(pred)[1]-unique(pred)[2] } sd(diff) On Fri, Sep 24, 2010 at 09:17:29AM -0400, Andrew Miles wrote: > Is there a way to estimate the standard error for the difference in > predicted probabilities obtained from a logistic regression model? > > For example, this code gives the difference for the predicted > probability of when x2==1 vs. when x2==0, holding x1 constant at its > mean: > > y=rbinom(100,1,.4) > x1=rnorm(100, 3, 2) > x2=rbinom(100, 1, .7) > mod=glm(y ~ x1 + x2, family=binomial) > pred=predict(mod, newdata=data.frame(cbind(x1=rep(mean(x1), 100), > x2)), type="response") > diff=unique(pred)[1]-unique(pred)[2] > diff > > I know that predict() will output SE's for each predicted value, but > how do I generate a SE for the difference in those predicted values? > > Thanks in advance! > > Andrew Miles > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reorder always returns "ordered"
Or at least is seems that way to me. It's not a big problem, but the behavior doesn't match the documentation. (I think r-help is the place to report this. ) > x <- factor(1:5) > x.ro <- reorder(x, rnorm(5)) > is.ordered(x.ro) # should be FALSE according to ?reorder [1] TRUE > > x.ro <- reorder(x, rnorm(5), ordered=FALSE) > is.ordered(x.ro) # should be FALSE [1] TRUE Here is my session info: > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines datasets utils stats graphics grDevices methods [8] base other attached packages: [1] Design_2.3-0Hmisc_3.8-3 survival_2.35-8 RODBC_1.3-2 [5] MASS_7.3-7 lattice_0.19-11 loaded via a namespace (and not attached): [1] cluster_1.13.1 grid_2.11.1tools_2.11.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reorder always returns "ordered"
Yes Hmisc was loaded. Thanks for the insight. I usually pay attention to which objects get "masked" ... but reorder was not in that list. Anyway thanks. Darin On Tue, Oct 05, 2010 at 03:50:35PM -0700, Phil Spector wrote: > Is it possible that the original poster had the Hmisc package loaded? > >> Hmisc::reorder.factor > function (x, v, FUN = mean, ...) ordered(x, levels(x)[order(tapply(v, x, > FUN, ...))]) > > > Without that package, reorder.default gets called: >> getAnywhere('reorder.default') > A single object matching ???reorder.default??? was found > It was found in the following places > registered S3 method for reorder from namespace stats > namespace:stats > with value > > function (x, X, FUN = mean, ..., order = is.ordered(x)) { > scores <- tapply(X, x, FUN, ...) > ans <- (if (order) > ordered > else factor)(x, levels = names(sort(scores))) > attr(ans, "scores") <- scores > ans > } > > > - Phil Spector >Statistical Computing Facility >Department of Statistics >UC Berkeley >spec...@stat.berkeley.edu > > > > On Tue, 5 Oct 2010, Joshua Wiley wrote: > >>> sessionInfo() >> R version 2.11.1 (2010-05-31) >> i486-pc-linux-gnu >> >>> x <- factor(1:5) >>> x.ro <- reorder(x, rnorm(5)) >>> is.ordered(x.ro) >> [1] FALSE >>> x.ro <- reorder(x, rnorm(5), ordered=FALSE) >>> is.ordered(x.ro) >> [1] FALSE >> >> >> On Tue, Oct 5, 2010 at 3:14 PM, Darin A. England wrote: >>> >>> Or at least is seems that way to me. It's not a big problem, but the >>> behavior doesn't match the documentation. (I think r-help is the >>> place to report this. ) >>> >>> > x <- factor(1:5) >>> > x.ro <- reorder(x, rnorm(5)) >>> > is.ordered(x.ro) ??# should be FALSE according to ?reorder >>> [1] TRUE >>> > >>> > x.ro <- reorder(x, rnorm(5), ordered=FALSE) >>> > is.ordered(x.ro) ??# should be FALSE >>> [1] TRUE >>> >>> >>> Here is my session info: >>> > sessionInfo() >>> R version 2.11.1 (2010-05-31) >>> x86_64-unknown-linux-gnu >>> >>> locale: >>> ??[1] LC_CTYPE=en_US.UTF-8 ?? ?? ?? LC_NUMERIC=C >>> ??[3] LC_TIME=en_US.UTF-8 ?? ?? ?? ??LC_COLLATE=en_US.UTF-8 >>> ??[5] LC_MONETARY=C ?? ?? ?? ?? ?? ?? ??LC_MESSAGES=en_US.UTF-8 >>> ??[7] LC_PAPER=en_US.UTF-8 ?? ?? ?? LC_NAME=C >>> ??[9] LC_ADDRESS=C ?? ?? ?? ?? ?? ?? ?? LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] splines ?? datasets ??utils ?? ?? stats ?? ?? graphics ??grDevices >>> methods >>> [8] base >>> >>> other attached packages: >>> [1] Design_2.3-0 ?? ??Hmisc_3.8-3 ?? ?? survival_2.35-8 RODBC_1.3-2 >>> [5] MASS_7.3-7 ?? ?? ??lattice_0.19-11 >>> >>> loaded via a namespace (and not attached): >>> [1] cluster_1.13.1 grid_2.11.1 ?? ??tools_2.11.1 >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sample in R
No. > ?sample to see what sample() does. On Tue, Oct 19, 2010 at 02:59:05AM -0700, emj83 wrote: > > Hi, > > Please can someone tell me if using sample() in R is actually a quick way of > doing the Inverse Transform Sampling Method? > > Many thanks Emma > -- > View this message in context: > http://r.789695.n4.nabble.com/Sample-in-R-tp3001818p3001818.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] varclus in Hmisc vs SAS PROC VARCLUS
My impression is that varclus from the Hmisc library is very convenient to use when you want to get an idea of the correlation among predictor variables in a regression setting, but if you want to perform cluster analysis in general, you may be better off using a different function in R, such as hclust or kmeans. Darin On Sun, Nov 07, 2010 at 11:00:00AM -0500, Lars Bishop wrote: > Hi, > > > > I'll apreciate your guidance on how can I re-create the output from SAS PROC > VARCLUS in R. I've found the varclus function in Hmisc. However, is it > possible to use that function to compute for each variable the 1-R**2 ratio > (this is the ratio of 1 minus the R-squared with Own Cluster to one minus > the R-squared in the Next Closest cluster)? > > > > Thanks in advance for any help, > > > > Lars. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random slopes in lmer
coef(mod1)$bird will give you a matrix with two columns. The first column is the intercept for each bird and the second column is the slope for each bird. ranef(mod1) will also give you a matrix of two columns. These represent the random effects. That is, how much the intercept (or slope) is shifted from overall mean. HTH, Darin On Thu, Aug 26, 2010 at 01:15:10PM +0100, Samantha Patrick wrote: > Hi > > I want to extract the random slopes from a lmer (I am doing a random > regression), but are the answers obtained from ranef or coef? > > My model is: mod1<-lmer(B~ A +(A|bird), family=quasibinomial) > > And I want to obtain a slope for each individual bird but am not sure which > output I need and can't find the answer anywhere. > > Thanks > > Sam > > > Dr Samantha Patrick > EU INTERREG Post Doc > Davy 618 > Marine Biology & Ecology Research Centre > University of Plymouth > Plymouth > PL4 8AA > > T: 01752 586165 > M: 07740472719 > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random slopes in lmer
I'm sure this has appeared before on this list, but the biggest help to me has been: "Gelman and Hill", Data Analysis and Regression Using Multilevel/Hierarchical Models, which contains clear explanations and the R code to go along. Also check out http://lme4.r-forge.r-project.org/book/ Looking at Sam's model specification, C is a fixed effect. On Thu, Aug 26, 2010 at 09:15:14AM -0700, Bert Gunter wrote: > ??? > You were provided exactly what you requested. I think you either need > to read up on what random effects models mean or more clearly > communicate what YOU mean. (It's unclear to me, anyway). > > -- > Bert Gunter > Genentech Nonclinical Statistics > > On Thu, Aug 26, 2010 at 8:57 AM, Samantha Patrick > wrote: > > Hi > > Thanks for your help - however if I have a model of: > > > > mod1<-lmer(B~ A+C+(A|bird), family=quasibinomial) > > > > coef then gives me an individual slope for factors A and C. ?However the > > random effect is only nested within factor - so I am only trying to allow > > the slope to vary in relation to effect A > > > > Many Thanks > > > > Sam > > > > Dr Samantha Patrick > > EU INTERREG Post Doc > > Davy 618 > > Marine Biology & Ecology Research Centre > > University of Plymouth > > Plymouth > > PL4 8AA > > > > T: 01752 586165 > > M: 07740472719 > > > > > > -Original Message- > > From: Darin A. England [mailto:engl...@cs.umn.edu] > > Sent: 26 August 2010 16:12 > > To: Samantha Patrick > > Cc: r-help@R-project.org > > Subject: Re: [R] Random slopes in lmer > > > > coef(mod1)$bird will give you a matrix with two columns. The first > > column is the intercept for each bird and the second column is the > > slope for each bird. > > > > ranef(mod1) will also give you a matrix of two columns. These > > represent the random effects. That is, how much the intercept (or > > slope) is shifted from overall mean. > > > > HTH, > > Darin > > > > On Thu, Aug 26, 2010 at 01:15:10PM +0100, Samantha Patrick wrote: > >> Hi > >> > >> I want to extract the random slopes from a lmer (I am doing a random > >> regression), but are the answers obtained from ranef or coef? > >> > >> My model is: mod1<-lmer(B~ A +(A|bird), family=quasibinomial) > >> > >> And I want to obtain a slope for each individual bird but am not sure > >> which output I need and can't find the answer anywhere. > >> > >> Thanks > >> > >> Sam > >> > >> > >> Dr Samantha Patrick > >> EU INTERREG Post Doc > >> Davy 618 > >> Marine Biology & Ecology Research Centre > >> University of Plymouth > >> Plymouth > >> PL4 8AA > >> > >> T: 01752 586165 > >> M: 07740472719 > >> > >> > >> ? ? ? [[alternative HTML version deleted]] > >> > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in the NY Times
On Wed, Jan 07, 2009 at 08:00:28AM -0600, Frank E Harrell Jr wrote: > This is great to see. It's interesting that SAS Institute feels that > non-peer-reviewed software with hidden implementations of analytic > methods that cannot be reproduced by others should be trusted when > building aircraft engines. > > Frank Unfortunately, that type of FUD issued by the SAS marketing person still works. I see it at my employer (a large healthcare company.) It's a battle to change a culture, but ironically the recession helps. People are now taking notice of the obscene licensing fees for SAS. Darin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with clustering
Have you tried using the cosine of the angle between two observations as the similarity measure? If you want to account for magnitudes, there is something called the jaccard coefficient (if I remember correctly) that can be used. Darin On Mon, Jan 26, 2009 at 10:41:40AM +0100, mau...@alice.it wrote: > I am going to try out a tentative clustering of some feature vectors. > The range of values spanned by the three items making up the features vector > is quite different: > > Item-1 goes roughly from 70 to 525 (integer numbers only) > Item-2 is in-between 0 and 1 (all real numbers between 0 and 1) > Item-3 goes from 1 to 10 (integer numbers only) > > In order to spread out Item-2 even further I might try to replace Item-2 with > Log10(Item-2). > > My concern is that, regardless the distance measure used, the item whose > order of magnitude is the highest may carry the highest weight in the process > of calculating the similarity matrix therefore fading out the influence of > the items with smaller variation in the resulting clusters. > Should I normalize all feature vector elements to 1 in advance of generating > the similarity matrix ? > > Thank you so much. > Maura > > > > > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to install R-2.8.1 on AIX
Search the R mailing list archive for other reports of success on AIX. I compiled a 64-bit version of R-2.8.0 successfully with the following options: export OBJECT_MODE=64 export CC="xlc_r -q64" export CFLAGS="-O -qstrict" export CXX="xlC_r -q64" export CXXFLAGS="-O -qstrict" export F77="xlf_r -q64" export AR="ar -X64" export CPPFLAGS="-I/usr/lpp/X11/include/X11" export LDFLAGS="-L/usr/lib -L/usr/X11R6/lib" export MAKE=gmake and have been using it for some time. Unfortunately, I still do not have it working with libiconv. Also, on AIX the following environment variable expands the memory limit for a process to 2GB export LDR_CNTRL="MAXDATA=0x8000" which is important for running R. Check the IBM documentation. I think you can try MAXDATA=0xC000 to expand to 3GB. Darin On Thu, Mar 12, 2009 at 07:09:49PM +0800, wrote: > Hi: >I can't install the R-2.8.1 on the machine IBM AIX according the > instruction configuratioin(OBJECT_MODE=64). I can successfully > ./configure ***, and get a Makefile. But during compiling the source code, > there is some problem I can't fix it. Please tell me some tips about "how to > install the R on my AIX. thankx very much. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.