Thanks - great, should have thought of option b)
-----Original Message----- From: Martin Morgan [mailto:[EMAIL PROTECTED] Sent: Friday, December 07, 2007 12:52 PM To: alison waller Cc: [EMAIL PROTECTED] Subject: Re: [R] finding most highly transcribed genes - ranking, sorting and subsets? Hi Alison -- It's a funny twist of terminology, isn't it? high rank (we're #1!) corresponds to low value. Maybe a wimpy stats joke? Anyway, (a) if m is assigned rownames (e.g., from the appropriate column of the 'genes' data frame in the limma object, rownames(m) <- maList$genes$GeneName) they'll be caried through the analysis and (b) if you've extracted m from a limma MAList, then subsetting the MAList with hrow (maList[hrow,]) will give you a new MAList with all the info carrying through. This would be the better way to go. Martin "alison waller" <[EMAIL PROTECTED]> writes: > Thanks so much Martin, > > This method is definitely more straightforward. And you are right I don't > think I was doing anything wrong before. However, I thought that rank, would > rank the highest 1st, however after looking at the results using your > methods, I realized it ranks the lowest number 1. So I modified it for > rank>18500. And now I'm getting 300 rows for which the intensity is > consistenly high. > > However, I am still laking some information. For the results I can get a > matrix of 300 rows and the corresponding intensities (from m) or rank (from > h), but what I really want is the name of the original row, which > corresponds to a specific spot on the array). > > I did msubset<-m[hrows,] and as mentioned I just get the rows numbered > 1-300, while I want to essentially pickout the 300 rows from the original > 19,000 rows maintaing the original row designation as it corresponds to a > specific gene. > > Thanks again for any suggestions, > > Alison > > -----Original Message----- > From: Martin Morgan [mailto:[EMAIL PROTECTED] > Sent: Thursday, December 06, 2007 4:06 PM > To: alison waller > Subject: Re: [R] finding most highly transcribed genes - ranking, sorting > and subsets? > > Hi Alison -- > > I'm not sure where your problem is coming from, but R can help you to > more efficiently do your task. Skipping the bioc terminology and data > structures, you have a matrix > >> m <- matrix(runif(100000), ncol=10) > > you'd like to determine the rank of values in each column > >> r <- apply(m, 2, rank) > > identfiy those with high rank > >> h <- r < 500 > > and find the rows for which the rank is always high > >> hrows <- apply(h, 1, all) > > you can then use hrows to subset your original matrix (m[hrows,]) or > otherwise, e.g., how many rows with high rank > >> sum(hrows) > [1] 0 > > or perhaps the distribution of the number of columns in which high > ranking genes occur. > >> table(apply(h, 1, sum)) > > 0 1 2 3 4 > 5996 3132 765 100 7 > > Martin > > "alison waller" <[EMAIL PROTECTED]> writes: > >> Hello, >> >> >> >> I am not only interested in finding out which genes are the most highly > up- >> or down-regulated (which I have done using the linear models and Bayesian >> statistics in Limma), but I also want to know which genes are consistently >> highly transcribed (ie. they have a high intensity in the channel of >> interest eg. Cy5 or Cy3 across the set of experiments). I might have > missed >> a straight forward way to do this, or a valuable function, but I've been >> using my own methods and going around in circles. >> >> >> >> So far I've normalized within and between arrays, then returned the RG >> values using RG<-RG.MA, then I ranked each R and G values for each array > as >> below. >> >> rankRG<-RG >> >> rankRG$R[,1]<-rank(rankRG$R[,1]) >> >> rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays, as >> well as the G's) >> >> >> >> then I thought I could pull out a subset of rankRG using something like; >> >> topRG<-rankRG >> >> topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500) >> >> >> >> However, this just returned me a matrix with one row of $R (the ranks were >> <500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6). However, > I >> can't believe that there is only one gene that is in the top 500 for R >> intensitiy among those three arrays. >> >> >> >> Am I doing something wrong? Can someone think of a better way of doing >> this? >> >> >> >> Thanks >> >> >> >> Alison >> >> >> >> >> >> ****************************************** >> Alison S. Waller M.A.Sc. >> Doctoral Candidate >> [EMAIL PROTECTED] >> 416-978-4222 (lab) >> Department of Chemical Engineering >> Wallberg Building >> 200 College st. >> Toronto, ON >> M5S 3E5 >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Dr. Martin Morgan, PhD > Computational Biology Shared Resource Director > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M2 B169 > Phone: (206) 667-2793 > -- Dr. Martin Morgan, PhD Computational Biology Shared Resource Director Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.