Dear all, I am trying to make a series of waffle plot-like figures for my data to visualize the ratios of amino acid residues at each position. For each one of 37 positions, there may be one to four different amino acid residues. So the data consist of the positions, what residues are there, and the ratios of residues. The ratios of residues at a position add up to 100, or close to 100 (more on this soon)*. I am hoping to make a *square* waffle plot-like figure for each position, and fill the 10 X 10 grids with colors representing each amino acid residue and areas for grids of a certain color corresponding to the ratio of that residue. Then I could line up all the plots in one row from position 1 to position 37. *: if the sum of the ratios is less than 100 at a position, that's because of an unknown residue which I did not include in the table.
I am attaching the dput output for my data here: structure(list(position = c(1L, 2L, 3L, 4L, 4L, 5L, 6L, 7L, 7L, 8L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 22L, 23L, 24L, 25L, 26L, 26L, 27L, 28L, 29L, 29L, 30L, 31L, 32L, 33L, 34L, 34L, 35L, 35L, 36L, 36L, 36L, 37L, 37L), residue = structure(c(9L, 4L, 18L, 7L, 9L, 7L, 12L, 3L, 4L, 1L, 7L, 9L, 12L, 1L, 4L, 4L, 13L, 5L, 14L, 2L, 18L, 3L, 16L, 9L, 17L, 15L, 7L, 5L, 5L, 7L, 17L, 13L, 15L, 11L, 6L, 13L, 16L, 14L, 10L, 13L, 17L, 1L, 1L, 17L, 1L, 12L, 1L, 5L, 3L, 6L, 8L, 7L, 9L), .Label = c("A", "C", "D", "E", "G", "H", "I", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "Y"), class = "factor"), ratio = c(99L, 100L, 100L, 1L, 99L, 100L, 100L, 1L, 98L, 100L, 10L, 87L, 3L, 79L, 9L, 12L, 84L, 99L, 1L, 83L, 13L, 100L, 100L, 100L, 100L, 99L, 100L, 100L, 100L, 98L, 2L, 100L, 100L, 100L, 2L, 98L, 100L, 100L, 1L, 99L, 100L, 100L, 98L, 100L, 95L, 5L, 98L, 2L, 3L, 95L, 1L, 1L, 98L)), .Names = c("position", "residue", "ratio"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "10", "11", "12", "13", "14", "15", "17", "18", "19", "20", "23", "25", "27", "28", "29", "30", "31", "32", "33", "34", "36", "37", "38", "39", "40", "42", "43", "44", "45", "46", "47", "48", "50", "51", "52", "53", "54", "56", "57", "58", "59", "60", "61", "62", "63", "64", "65")) Inspired by a statexchange post, I am using these scripts to make the plots : library(ggplot2) col4=c('#E66101','#FDB863','#B2ABD2','#5E3C99') dflist=list() for (i in 1:37){ residue_num=length(which(df$position==i)) dflist[[i]]=df[df$position==i,2:3] waffle=expand.grid(y=1:residue_num,x=seq_len(ceiling(sum(dflist[[i]]$ratio)/residue_num))) residuevec=rep(dflist[[i]]$residue,dflist[[i]]$ratio) waffle$residue=c(as.vector(residuevec),rep(NA,nrow(waffle)-length(residuevec))) png(paste('plot',i,'.png',sep='')) print(ggplot(waffle, aes(x = x, y = y, fill = residue)) + geom_tile(color = "white") + scale_fill_manual("residue",values = col4) + coord_equal() + theme(panel.grid.minor=element_blank(),panel.grid.major=element_blank()) + theme(axis.ticks=element_blank()) + theme(axis.text.x=element_blank(),axis.text.y=element_blank()) + theme(axis.title.x=element_blank(),axis.title.y=element_blank()) ) dev.off()} With my scripts, I could make a waffle plot, but not a *square* 10 X 10 waffle plot. Also, the grid size differs for positions with different numbers of residues. I am suspecting that I didn't use coord_equal() correctly. So I wonder how I can make the plots like I described above in ggplot2 or with some other packages. Also, is there a way to assign a color to different residues, say, purple for alanine, blue for glycine, etc, and incorporate that information in the for loop? Many thanks for any suggestion you may give me! Zhao [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.