A possiblie solution might be to use the survey package. You could specify
that the data is clustered using the svydesign function, and then speciy the
replicate weights using the as.svrepdesign function. And then, it would be
possible to use the withReplicates function to bootstrap the clusters
Honestly, I wasn't sure what you wanted to do with 'group'. Here it is with
the 'group' variable deleted
library(data.table)
dt=data.table(x[,-1])
dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby='myweek']
--
View this message in context:
http://r.789695.n4.nabble.com/weighted
If there are many variables, I'd then suggest the data.table package:
library(data.table)
dt=data.table(x)
dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby=c('group',
'myweek')]
The '.SD' is an abbreviation for all the variables in the data table
(excluding the grouping variables). T
The plyr package is very helpful for this:
library(plyr)
ddply(x ,.(myweek), summarize, m1=weighted.mean(var1,myweight),
m2=weighted.mean(var2,myweight))
--
View this message in context:
http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636816.html
Sent from the R help mailing list
I've had a little experience using the package, Amelia. Are you sure that
your nominal variables - race, south, etc - are in your ad04 data frame ?
David Freedman
--
View this message in context:
http://r.789695.n4.nabble.com/Amelia-error-tp4568455p4568600.html
Sent from the R help ma
2.13.0 looks fine with VIPRE
david freedman
atlanta
--
View this message in context:
http://r.789695.n4.nabble.com/Trojan-in-R-2-13-0-tp3450084p3450784.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org mailing
you might try merge_recurse(list(DF1,DF2,DF3,DF4,DF17)) in the reshape
package
--
View this message in context:
http://r.789695.n4.nabble.com/for-loop-to-merge-csvs-tp3298549p3298565.html
Sent from the R help mailing list archive at Nabble.com.
__
You might use the plyr package to get group-wise weighted means
library(plyr)
ddply(mydata,~group,summarise, b=mean(weights),
c=weighted.mean(response,weights))
hth
david freedman
--
View this message in context:
http://r.789695.n4.nabble.com/Using-summaryBy-with-weighted-data
I've had good success with the read.xport function in the SASxport (not
foreign) package. But couldn't you just use something like
mydata<-read.xport('http/demo_c.xpt')
The transport file looks like it's one of the NHANES demographic files.
--
View this message in context:
http://r.789695
Thanks for the information.
There was a discussion of different results obtained with the formula and
data.frame methods for a paired t-test -- there are many threads, but one is
at
http://r.789695.n4.nabble.com/Paired-t-tests-td2325956.html#a2326291
david freedman
--
View this message in
seems to delete the entire row (missing and non-missing values) if
there are any NAs. Wouldn't one expect that the 2 forms (data frame vs
formula) of aggregate would give the same result?
thanks very much
david freedman, atlanta
--
View this message in context:
http://r.7896
me form deletes gives an NA if there are missing values in
the group for the variable with missing values. But, the formula form
deletes the entire row (missing and non-missing values) if any of the values
are missing. Is this what was intended or the best option ?
thanks, david freedman
--
View t
I'm not sure that this is the problem, but are you certain that the variable
'con' is in audit ? You check outside the function just really tells you
that X and SEX are in audit.
hth, david freedman (good to see another CDCer using R)
--
View this message in context:
htt
how about:
d=data.frame(ht=rnorm(20,60,20),sex=sample(0:1,20,rep=T)); d
with(d,by(ht,sex,mean))
with(d,by(ht,sex,function(x)mean(x>=60)))
hth, david freedman
--
View this message in context:
http://r.789695.n4.nabble.com/percent-by-subset-tp2123057p2123079.html
Sent from the R help mail
The bootcov function for models fit using lrm in the rms package might also
be an option
hth
--
View this message in context:
http://r.789695.n4.nabble.com/binary-logistic-regression-taking-account-clustering-tp2122255p2122311.html
Sent from the R help mailing list archive at Nabble.com.
_
I'm not sure what to do with model.matrix, but you might look at some of the
code for ols in the rms package. The rms package (from frank harrell), has
several options for keeping the NAs in the output from regression models so
that residuals are correctly aligned.
hth
david freedman
--
thanks very much for the help - all of these suggestions were much better
than what I was doing
--
View this message in context:
http://r.789695.n4.nabble.com/get-means-of-elements-of-5-matrices-in-a-list-tp2067722p2068329.html
Sent from the R help mailing list archive at Nabble.com.
__
y much for any help
david freedman
ll=list(structure(c(9.7, 17.6, 20.8, 24.1, 33.8, 14.5, 25.7, 29.8,
33.6, 44.8, 21.8, 32.6, 37.5, 40.9, 53.3, 16.7, 26.1, 29.5, 32.7,
42.6, 26.2, 34.3, 37, 39.8, 47.1, 31.9, 40.3, 43.3, 46.2, 54.1
), .Dim = 5:6), structure(c(9.4, 17.7, 20.7, 24.1, 33.7, 14.5,
for a picture of the bagplot, try going to
http://www.statmethods.net/graphs/boxplot.html
--
View this message in context:
http://n4.nabble.com/Ellipse-that-Contains-95-of-the-Observed-Data-tp1694538p1695236.html
Sent from the R help mailing list archive at Nabble.com.
_
sorry - I use many abbreviations and I try to remove them before I post
questions/answers - 'set' is my abb. for subset
david
On 3/28/2010 8:27 PM, Jeff Brown [via R] wrote:
> What is the function "set()"? Is that a typo? When I type ?set I get
> nothing, and when I try to evaluate that code
how about:
d1=data.frame(pat=c(rep('a',3),'b','c',rep('d',2),rep('e',2),'f'),var=c(1,2,3,1,2,2,3,2,4,4))
ds=set(d1,var %in% c(2,3))
with(ds,tapply(var,pat,FUN=length))
hth,
David Freedman, CDC, Atlanta
--
View this message in context
Hi - do you want
sum(datjan$V4)
OR
sum(unique(datjan$V4)) ?
there's also a cumsum function that might be useful
hth,
David Freedman, Atlanta
Schmidt Martin wrote:
>
> Hello
>
> I'm working with R since a few month and have still many trivial
> question
ve off the tilde in your orderBy example?
hth
David Freedman, CDC Atlanta
--
View this message in context:
http://n4.nabble.com/ordering-columns-in-a-data-frame-tp1587294p1587318.html
Sent from the R help mailing list archive at Nabble.com.
_
try
as.numeric(read_data$DEC)
this should turn it into a numeric variable that you can work with
hth
David Freedman
CDC, Atlanta
Guy Green wrote:
>
> Hi Peter & others,
>
> Thanks (Peter) - that gets me really close to what I was hoping for.
>
> The one problem I
there's a recode function in the Hmisc package, but it's difficult (at least
for me) to find documentation for it
library(Hmisc)
week <- c('SAT', 'SUN', 'MON', 'FRI');
recode(week,c('SAT', 'SUN', 'MON', 'FRI'),1:4)
HTH
--
View this message in context:
http://n4.nabble.com/How-to-add-a-variable
You might want to look at the plot.Predict function in the rms package - it
allows you to plot the logits or probablities vs the predictor variable at
specified levels of other covariates (if any) in the model. There are many
examples in http://cran.r-project.org/web/packages/rms/rms.pdf
David
(i in
1:3){print(do.call(rbind,by(d[,i],d[,i+3],function(x)(c(min(x),max(x))}
Is there a way to replace the ugly for loop in the last line with some type
of apply function that would know that my continuous and indicator variable
are 3 variables apart in the dataframe?
Thanks very much
David
also,
library(plyr)
ddply(d,~grp,function(df) weighted.mean(df$x,df$w))
--
View this message in context:
http://n4.nabble.com/tapply-for-function-taking-of-1-argument-tp1460392p1461428.html
Sent from the R help mailing list archive at Nabble.com.
__
Subject: Re: Quartiles and Inter-Quartile Range
>
> It's looks like you think that type=2 are the 'true' quantiles, but the
> default method in R is type=7
>
> You might want to look at ?stats::quantile
>
> hth
> david freedman
>
>
w does one
calculate the median of 4 values?'
david freedman
Girish A.R. [via R] wrote:
> Interestingly, Hmisc::describe() and summary() seem to be using one
> Type, and stats::fivenum() seems to be using another Type.
>
> > fivenum(cbiomass)
> [1] 910.0 1039.0 1088.5 1
It's looks like you think that type=2 are the 'true' quantiles, but the
default method in R is type=7
You might want to look at ?stats::quantile
hth
david freedman
--
View this message in context:
http://n4.nabble.com/Quartiles-and-Inter-Quartile-Range-tp1145817p1213199.html
You'll probably want to look at the 'by' function
d=data.frame(sex=rep(1:2,50),x=rnorm(100))
d$y=d$x+rnorm(100)
head(d)
cor(d)
by(d[,-1],d['sex'],function(df)cor(df))
You might also want to look at the doBy package
--
View this message in context:
http://n4.nabble.com/Mutliple-sets-of-data-in-
You'll probably want to take a look at the CRAN Task View,
http://cran.r-project.org/web/views/ClinicalTrials.html
david freedman
Paul Miller wrote:
>
> Hello Everyone,
> Â
> Iâm a SAS user who has recently become interested in sequential clinical
> trials designs. Iâ
you might look at partial.r in the psych package
dadrivr wrote:
>
> I'm trying to write code to calculate partial correlations (along with
> p-values). I'm new to R, and I don't know how to do this. I have
> searched and come across different functions, but I haven't been able to
> get any of
I *think* this is from from 'StatsRUs' - how about
as.data.frame(lapply(df,function(x)rep(x,n)))
hth, david freedman
pengyu.ut wrote:
>
> I want to duplicate each line in 'df' 3 times. But I'm confused why
> 'z' is a 6 by 4 matrix. Could someb
In addition to using the survey package (and the svyby function), I've found
that many of the 'weighted' functions, such as wtd.mean, work well with the
plyr package. For example,
wtdmean=function(df)wtd.mean(df$obese,df$sampwt);
ddply(mydata, ~cut2(age,c(2,6,12,16)),'w
In addition to using the survey package (and the svyby function), I've found
that many of the 'weighted' functions, such as wtd.mean, work well with the
plyr package. For example,
wtdmean=function(df)wtd.mean(df$obese,df$sampwt);
ddply(mydata, ~cut2(age,c(2,6,12,16)),'w
Some variation of the following might be want you want:
df=data.frame(sex=sample(1:2,100,replace=T),snp.1=rnorm(100),snp.15=runif(100))
df$snp.1[df$snp.1>1.0]<-NA; #put some missing values into the data
x=grep('^snp',names(df)); x #which columns that begin with 'snp'
apply(df[,x],2,summary)
#or
a
you should save your 3 variables into a new *dataframe*
d<-mydata[,c("iq","education","achievement")]
and then the command would be
by(d,d$sex,function(df) cor.test(df$educ,df$achiev))
but you could also just use
by(mydata,mydata$sex,function(df) cor.test(df$
A better subject for your question might have been helpful. There are many
options for hist and truehist in the MASS package, but this might help:
x=rnorm(100, mean=5, sd=3000)
hist(x, prob=T)
x2=density(x)
lines(x2$x,x2$y)
KABELI MEFANE wrote:
>
> Dear All
>
> I hope you can help me wi
ry(Hmisc,T) before loading Design?
>
> Carlos
>
> ------
> From: "David Freedman" <3.14da...@gmail.com>
> Sent: Saturday, September 12, 2009 8:26 AM
> To:
> Subject: Re: [R] could not find function &q
nk* it's a problem with the package, rather than R 2.9.2, and I hope
the problem will soon be fixed. I was able to use predict.Design with 2.9.2
until I updated the Design package a few days ago.
david freedman
zhu yao wrote:
>
> I uses the Design library.
>
> take this exampl
After you fix your data frame and if you don't using 2 packages, you might
try something like:
lib(plyr) #for 'by' processing
lib(Hmisc) # for its wtd.mean function
d=data.frame(x=c(15,12,3,10,10),g=c(1,1,2,2,3),w=c(2,1,5,2,5)) ; d
ddply(d,~g,function(df) wtd.mean(df$x,df$w))
milton ruser wrot
istic (Stuart,
1955, Agresti, 2002, page 422, also known as Stuart-Maxwell test) is
computed."
hth, david freedman
Tal Galili wrote:
>
> Hello all,
>
> I wish to perform a mcnemar test for a 3 by 3 matrix.
> By running the slandered R command I am getting a result but I am not
If you apply the function to a simple dataframe or show your code, you might
be able to get more accurate help. I've used the IRR package in the past
and haven't noticed any problems (maybe I overlooked them ?)
david freedman
mehdi ebrahim wrote:
>
> Hi All,
>
> I am
You might also want to look at the doBy package - one function is summaryBy:
summaryBY(var1 + var2 ~ patient_type, data=d, FUN=summary)
david freedman
Hayes, Rachel M wrote:
>
> Hi All,
>
>
>
> I'm trying to automate a data summary using summary or describe from
Would the scale function work for this? Something like
new=scale(df, center=T)
HTH,
david freedman
cir p wrote:
>
>
> Dear group:
> sorry for my beginners question, but I'm rather new to R and was searching
> high and low without success:
>
> I have a data frame
Frank, would you feel comfortable giving us the reference to the NEJM article
with the 'missing vs <' error ? I'm sure that things like this happen
fairly often, and I'd like to use this example in teaching
thanks, david freedman
Frank E Harrell Jr wrote:
000 different
children, these functions take some time on my PCs - is there a faster way
to do this in R? My code on a small dataset follows.
Thanks very much, David Freedman
d<-data.frame(id=c(rep(1,3),rep(2,2),3),age=c(5,10,15,4,7,12),ldlc=c(132,120,125,105,142,160))
d$high.ldlc<-ifelse
You might want to try using a non-parametric test, such as wilcox.test.
How about some modification of the following:
d=data.frame(grp=rep(1:2,e=5),replicate(10,rnorm(100))); head(d)
lapply(d[,-1],function(.column)wilcox.test(.column~grp,data=d))
David Freedman
stephen sefick wrote:
>
&
I'm not quite sure what you want to do, but this might help:
d=data.frame(replicate(40, rnorm(20)))
d$sample=rep(c('a','b','c','d'),each=5)
lib(doBy)
summaryBy(.~sample,da=d)
David Freedman
Amit Patel-7 wrote:
>
>
> Hi
> I am try
sorry about the mistake - the -data$Type doesn't work: the '-' sign isn't
valid for factors. I *thought* I had checked this before submitting a
response !
HufferD wrote:
>
> On Thursday, May 07, 2009 7:45 PM, David Freedman wrote:
>
> > ...how about:
>
how about:
d=data[order(data$ID,-data$Type),]
d[!duplicated(d$ID),]
Max Webber wrote:
>
> Given a dataframe like
>
> > data
> ID Type N
> 1 45900A 1
> 2 45900B 2
> 3 45900C 3
> 4 45900D 4
> 5 45900E 5
> 6 45900F 6
> 7 45900I 7
> 8
Didn't a 2008 paper by Austin in J Clin Epidemiol show that bootstrapping was
just as bad as backward stepwise regression for finding the true predictors?
http://xrl.in/26em
Dimitris Rizopoulos-4 wrote:
>
> Greg Snow wrote:
>> There is not a meaningful alternative way since the way you propos
you might also look at matplot
d=data.frame(x=1:10,y1=sort(rnorm(10)),y2=sort(rnorm(10)))
matplot(d[,1],d[,2:3],type='l',lty=1:2)
David Freedman
Steve Murray-3 wrote:
>
>
> Dear R Users,
>
> I have a data frame of the nature:
>
>> head(aggregate_1986)
s with the by function, but there
must be an easier way.
Here's my code - id is id number, age is the age of the person, and seq is
the sequence variable that I've created. Thanks very much for the help.
david freedman, atlanta
ds=data.frame(list(id = c(1L, 1L, 1L, 1L, 8L, 8L, 16L, 16L
1/se,
subset=year>=1988, da=d);
points(d$year,predict(m,dafr(year=d$year)),type='l',lwd=2,col='red')
thanks very much
David Freedman
--
View this message in context:
http://www.nabble.com/piecewise-linear-regression-tp22388118p22388118.html
Sent from the R help mailin
The gplots package might help. Try:
library(gplots)
data(state)
plotmeans(state.area ~ state.region)
hth,
David Freedman
johnhj wrote:
>
> Hiii,
>
> I have some problems to plot "standard deviation and variance" from a
> texfile.
>
> Ich have the following c
The predictors and outcomes in lm can be matrices, so you could use something
like the following:
x.mat=cbind(x1=rnorm(20),x2=rnorm(20))
y.mat=cbind(y1=rnorm(20),y2=rnorm(20))
lm(y.mat~x.mat)
David Freedman
ivowel wrote:
>
> dear r-experts: there is probably a very easy way to do it,
the linear-by-linear association test for ordered data.
David Freedman
Michael Friendly wrote:
>
> In SAS, for a two-way (or 3-way, stratified) table, the CMH option in
> SAS PROC FREQ gives
> 3 tests that take ordinality of the factors into account, for both
> variables, just the
Would you be interested in the cross-validation that's on pp 3 and 4 of the
ROCR package PDF?
plot(perf,lwd=3,avg="vertical",spread.estimate="boxplot",add=TRUE)
there are various options for the 'spread.estimate'
David Freedman
marc bernard-2 wrote:
>
You could change the second 'plot' to 'points'
David Freedman
David Kaplan-2 wrote:
>
> Greetings all,
>
> I have two logistic plots coming from two calls to plogis. The code is
>
> .x <- seq(-7.6, 7.6, length=100)
> plot(.x, plogis(.x, l
You might want to look at the doBy package
For (1), you could use
summaryBy(value~Light+Feed,data=myD, FUN=mean)
and for (2), the transformBy function would be helpful
David Freedman
Patrick Hausmann wrote:
>
> Dear list,
>
> I have two things I am struggling...
>
> # Fi
Do you really want to use '==' ? How about '%in%', as in
findings1<-subset(findings,SUBJECTSID %in% ==SUBJECTS1$SUBJECTSID,
select=c(SUBJECTSID,ORGNUMRES))
David Freedman
dvkirankumar wrote:
>
> Hi all,
> I got one problem with"subset()"fun
or how about
lm(myData$response~as.matrix(myData[2:4]))
hth, david
Juliet Hannah wrote:
>
> Hi All,
>
> I had posted a question on a similar topic, but I think it was not
> focused. I am posting a modification that I think better accomplishes
> this.
> I hope this is ok, and I apologize if it
Hi - wouldn't it be possible to bootstrap the difference between the fit of
the 2 models? For example, if one had a *linear* regression problem, the
following script could be used (although I'm sure that it could be
improved):
library(MASS); library(boot)
#create intercorrelated data
Sigma <- ma
(2,1,100))
library(doBy)
summaryBy(wt+ht~sex,da=d,FUN=c(mean,sd))
David Freedman
SNN wrote:
>
>
> Hi,
>
> This is just for print out so it looks nice. what I have is a loop that
> loops over my variables and calculates the mean and the sd for these
> variables. Then I need t
ero.
>
> Thanks in advance,
> --
> ozan bakis
>--__--__--
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-g
You might take a look at the transformBy function in the doBy package
For example,
new.df=transformBy(~group,data=my.df, new=y/max(y))
David Freedman
baptiste auguie-2 wrote:
>
> Dear list,
>
> I have a data.frame with x, y values and a 3-level factor "group",
>
not sure which library 'pcor' is in, but why don't you just use the ranks of
the variables and then perform the correlation on the ranks:
x<-sample(1:10,10,rep=T)
y<-x+ sample(1:10,10,rep=T)
cor(x,y)
cor(rank(x),rank(y))
HTH
david freedman
Jürg Brendan Logue wrote:
&g
try
newdata=do.call(rbind,l)
David Freedman, Atlanta
Naira wrote:
>
> Dear all,
>
> I would like to know whether it is possible to unlist elements and keep
> the original format of the data.
> To make it more clear, let me give an exemple:
> I have a list l of data
You might want to take a look at
http://had.co.nz/ggplot2/
There are many examples of how to use this function. And, there's a book
coming out soon.
hope this helps
david freedman
Edna Bell wrote:
>
> Hi yet again!
>
> Thank you for being patient with me.
>
>
rs)')
lines(age,y.low,col='grey')
lines(age,y.high,col='grey')
Is it possible to fill the area between the 2 lines filled with, for
example, 'grey30' ?
thanks very much in advance,
David Freedman
-
David Freedman
Atlanta
--
View this message in context:
htt
73 matches
Mail list logo