Dear all,

I am trying to make a series of waffle plot-like figures for my data to
visualize the ratios of amino acid residues at each position. For each one
of 37 positions, there may be one to four different amino acid residues. So
the data consist of the positions, what residues are there, and the ratios
of residues. The ratios of residues at a position add up to 100, or close
to 100 (more on this soon)*. I am hoping to make a *square* waffle
plot-like figure for each position, and fill the 10 X 10 grids with colors
representing each amino acid residue and areas for grids of a certain color
corresponding to the ratio of that residue. Then I could line up all the
plots in one row from position 1 to position 37.
*: if the sum of the ratios is less than 100 at a position, that's because
of an unknown residue which I did not include in the table.

I am attaching the dput output for my data here:
structure(list(position = c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 7L,
8L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 22L, 23L, 24L, 25L, 26L,
26L, 27L, 28L, 29L, 29L, 30L, 31L, 32L, 33L, 34L, 34L, 35L, 35L,
36L, 36L, 36L, 37L, 37L), residue = structure(c(9L, 4L, 18L,
7L, 9L, 7L, 12L, 3L, 4L, 1L, 7L, 9L, 12L, 1L, 4L, 4L, 13L, 5L,
14L, 2L, 18L, 3L, 16L, 9L, 17L, 15L, 7L, 5L, 5L, 7L, 17L, 13L,
15L, 11L, 6L, 13L, 16L, 14L, 10L, 13L, 17L, 1L, 1L, 17L, 1L,
12L, 1L, 5L, 3L, 6L, 8L, 7L, 9L), .Label = c("A", "C", "D", "E",
"G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V",
"Y"), class = "factor"), ratio = c(99L, 100L, 100L, 1L, 99L,
100L, 100L, 1L, 98L, 100L, 10L, 87L, 3L, 79L, 9L, 12L, 84L, 99L,
1L, 83L, 13L, 100L, 100L, 100L, 100L, 99L, 100L, 100L, 100L,
98L, 2L, 100L, 100L, 100L, 2L, 98L, 100L, 100L, 1L, 99L, 100L,
100L, 98L, 100L, 95L, 5L, 98L, 2L, 3L, 95L, 1L, 1L, 98L)), .Names =
c("position",
"residue", "ratio"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "10", "11", "12", "13", "14", "15",
"17", "18", "19", "20", "23", "25", "27", "28", "29", "30", "31",
"32", "33", "34", "36", "37", "38", "39", "40", "42", "43", "44",
"45", "46", "47", "48", "50", "51", "52", "53", "54", "56", "57",
"58", "59", "60", "61", "62", "63", "64", "65"))

Inspired by a statexchange post, I am using these scripts to make the plots
:
library(ggplot2)
col4=c('#E66101','#FDB863','#B2ABD2','#5E3C99')
dflist=list()
for (i in 1:37){
residue_num=length(which(df$position==i))
dflist[[i]]=df[df$position==i,2:3]
waffle=expand.grid(y=1:residue_num,x=seq_len(ceiling(sum(dflist[[i]]$ratio)/residue_num)))
residuevec=rep(dflist[[i]]$residue,dflist[[i]]$ratio)
waffle$residue=c(as.vector(residuevec),rep(NA,nrow(waffle)-length(residuevec)))
png(paste('plot',i,'.png',sep=''))
print(ggplot(waffle, aes(x = x, y = y, fill = residue)) + geom_tile(color =
"white") + scale_fill_manual("residue",values = col4) + coord_equal() +
theme(panel.grid.minor=element_blank(),panel.grid.major=element_blank())
+ theme(axis.ticks=element_blank()) +
theme(axis.text.x=element_blank(),axis.text.y=element_blank()) +
theme(axis.title.x=element_blank(),axis.title.y=element_blank())
)
dev.off()}

With my scripts, I could make a waffle plot, but not a *square* 10 X 10
waffle plot. Also, the grid size differs for positions with different
numbers of residues. I am suspecting that I didn't use coord_equal()
correctly.

So I wonder how I can make the plots like I described above in ggplot2 or
with some other packages. Also, is there a way to assign a color to
different residues, say, purple for alanine, blue for glycine, etc, and
incorporate that information in the for loop?

Many thanks for any suggestion you may give me!

Zhao

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to