Just trying to understand how geom_abline works with facets in ggplot.

 

By way of example, I have a dataset of student test scores. These are in a data 
table dt with 4 columns:

 

student: unique student ID

cohort: grouping factor for students (A, B, . H)

subject: subject of the test (English, Math, Science)

score: the test score for that student in that subject 

 

The goal is to compare cohorts.

 

## Code to generate dt

library(data.table)

## cohorts: list of cohorts with number of students in each

cohorts <- 
data.table(name=toupper(letters[1:8]),size=as.numeric(c(8,25,16,30,10,27,13,32)))

## base: assign students to cohorts

base    <- 
data.table(student=c(1:sum(cohorts$size)),cohort=rep(cohorts$name,cohorts$size))

## scores for each subject

english <- data.table(base,subject="English", score=rnorm(nrow(base), mean=45, 
sd=50))

math    <- data.table(base,subject="Math",    score=rnorm(nrow(base), mean=55, 
sd=25))

science <- data.table(base,subject="Science", score=rnorm(nrow(base), mean=70, 
sd=25))

## combine

dt      <- rbind(english,math,science)

## clip scores to (0,100)

dt$score<- (dt$score>=0) * dt$score

dt$score<- (dt$score<=100)*dt$score + (dt$score>100)*100

 

The following displays mean score by cohort with 95% CL, facetted by subject, 
and includes a (blue, dashed) reference line (using
geom_abline).

 

library(ggplot2)

library(Hmisc)

ggp <- ggplot(dt,aes(x=cohort, y=score)) + ylim(0,100)

ggp <- ggp + stat_summary(fun.data="mean_cl_normal")

ggp <- ggp + 
geom_abline(aes(slope=0,intercept=mean(score)),color="blue",linetype="dashed")

ggp <- ggp + facet_grid(subject~.)

ggp

 

The problem is that the reference line (from geom_abline) is the same in all 
facets (= the grand average score for all students and
all subjects). So stat_summary seems to respect the grouping implied in 
facet_grid (e.g., by subject), but abline does not. *Why*?

 

NB: I realize this problem can be solved by creating a table of group means and 
using that as the data source in geom_abline
(below), but *why is this necessary*?

 

means <- dt[,list(mean.score=mean(score)),by="subject"]

ggp <- ggplot(dt,aes(x=cohort, y=score)) + ylim(0,100)

ggp <- ggp + stat_summary(fun.data="mean_cl_normal")

ggp <- ggp + geom_abline(data=means, 
aes(slope=0,intercept=mean.score),color="blue",linetype="dashed")

ggp <- ggp + facet_grid(subject~.)

ggp

 


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to