> On Apr 5, 2016, at 10:27 AM, 何尧 <he...@pku.edu.cn> wrote: > > I do have a bunch of genes ( nearly ~50000) from the whole genome, which > read in genomic ranges > > A range(gene) can be seem as an observation has three columns chromosome, > start and end, like that > > seqnames start end width strand > > gene1 chr1 1 5 5 + > > gene2 chr1 10 15 6 + > > gene3 chr1 12 17 6 + > > gene4 chr1 20 25 6 + > > gene5 chr1 30 40 11 + > > I just wondering is there an efficient way to find overlapped, upstream and > downstream genes for each gene in the granges
The data.table package (in CRAN) and the iRanges package (in bioC) have formalized efficient approaches to those problems. > > For example, assuming all_genes_gr is a ~50000 genes genomic range, the > result I want like belows: > > gene_nameupstream_genedownstream_geneoverlapped_gene > gene1NAgene2NA > gene2gene1gene4gene3 > gene3gene1gene4gene2 > gene4gene3gene5NA > > Currently , the strategy I use is like that, > library(GenomicRanges) > find_overlapped_gene <- function(idx, all_genes_gr) { > #cat(idx, "\n") > curr_gene <- all_genes_gr[idx] > other_genes <- all_genes_gr[-idx] > n <- countOverlaps(curr_gene, other_genes) > gene <- subsetByOverlaps(curr_gene, other_genes) > return(list(n, gene)) > } > > system.time(lapply(1:100, function(idx) find_overlapped_gene(idx, > all_genes_gr))) > However, for 100 genes, it use nearly ~8s by system.time().That means if I > had 50000 genes, nearly one hour for just find overlapped gene. > > I am just wondering any algorithm or strategy to do that efficiently, perhaps > 50000 genes in ~10min or even less > I suspect this would happen on a much faster basis for such a small dataset. -- David. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.