If I have data:
dat<-data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20),d=rnorm(20),site=rep(letters[5:8],each=5))
And want to plot like this:
ctr<-1
for(i in c('a','b','c','d')){
png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
width=11,units='in',pointsize=9,res=300)
print(ggplot(dat[,names(dat) %in%
c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot
number',ctr,sep=' ')))
dev.off()
ctr<-ctr+1
}
Is there a way to do the same naming using plyr (or data.table or foreach
which I am not familiar with at all!)?
m.dat<-melt(dat,id.vars='site')
ddply(m.dat,.(variable),function(df)
print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()+ ..?)
And better yet, is there a way to do it using .parallel=T?
Faceting is not really an option (unless I can facet onto multiple pages of
a pdf or something) because these need to go into reports as individually
labelled and titled plots.
As a bit of a corollary, is it really worth the headache to resolve this if
I am only using melt/plyr to split on the four letter variables? With a
larger set of data (1e6 rows), the melt/plyr version takes a significant
amount of time but .parallel=T drops the time significantly. Is the right
answer a foreach loop and can I do that with the increasing counter? (I
haven't gotten beyond Hadley's .parallel feature in my parallel R
dealings.)
>
dat<-data.frame(a=rnorm(1e6),b=rnorm(1e6),c=rnorm(1e6),d=rnorm(1e6),site=rep(letters[5:8],each=2.5e5))
> ctr<-1
> system.time(for(i in c('a','b','c','d')){
+ png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
width=11,units='in',pointsize=9,res=300)
+ print(ggplot(dat[,names(dat) %in%
c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot
number',ctr,sep=' ')))
+ dev.off()
+ ctr<-ctr+1
+ })
user system elapsed
54.630 0.120 54.843
> system.time(
+ ddply(melt(dat,id.vars='site'),.(variable),function(df) {
+
png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300)
+ print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
+ dev.off()
+ },.parallel=F)
+ )
user system elapsed
58.40 0.13 58.63
> system.time(
+ ddply(melt(dat,id.vars='site'),.(variable),function(df) {
+
png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300)
+ print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
+ dev.off()
+ },.parallel=T)
+ )
user system elapsed
70.33 3.46 27.61
>
How might I speed this up and include the sequential plot names?
Thanks a bunch!
Justin
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.