On Tue, Jan 06, 2009 at 07:21:48AM -0800, Sake wrote: > I'm heaving difficulties with a dataset containing gene names and positions > of those genes. > Not such a big problem, but each gene has multiple exons so it's hard to say > where de gene starts and where it ends. I want the starting and ending > position of each gene in my dataset. > Attached is the dataset: > http://www.nabble.com/file/p21312449/genlistchrompos.csv genlistchrompos.csv > Column 'B' is the gene name, 'G' is the starting position and 'H' is the > stop position.
I don't really see how 'if' and 'for loops' are involved in the question. You may want to give us a little more detail on what exactly you need and what you tried unsuccessfully. (By the way -- there are no columns labeled 'B', 'G' or 'H' in the file). Anyway - I believe this is what you are after: # get minimum start position by gene aggregate(dat[, c('Exon_Start.Chr.')], by=list(dat$Gene), min) # get maximum stop position by gene aggregate(dat[, c('Exon_Stop.Chr.')], by=list(dat$Gene), max) Of course, these will only reflect the real start and stop coordinates of the gene if ALL exons are given in the file. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.