> On Apr 5, 2016, at 10:27 AM, 何尧 <he...@pku.edu.cn> wrote:
> 
> I do have a bunch of genes ( nearly ~50000)  from the whole genome, which 
> read in genomic ranges
> 
> A range(gene) can be seem as an observation has three columns chromosome, 
> start and end, like that
> 
>       seqnames start end width strand
> 
> gene1     chr1     1   5     5      +
> 
> gene2     chr1    10  15     6      +
> 
> gene3     chr1    12  17     6      +
> 
> gene4     chr1    20  25     6      +
> 
> gene5     chr1    30  40    11      +
> 
> I just wondering is there an efficient way to find overlapped, upstream and 
> downstream genes for each gene in the granges

The data.table package (in CRAN) and the iRanges package (in bioC) have 
formalized efficient approaches to those problems.


> 
> For example, assuming all_genes_gr is a ~50000 genes genomic range, the 
> result I want like belows:
> 
> gene_nameupstream_genedownstream_geneoverlapped_gene
> gene1NAgene2NA
> gene2gene1gene4gene3
> gene3gene1gene4gene2
> gene4gene3gene5NA
> 
> Currently ,  the strategy I use is like that,  
> library(GenomicRanges)
> find_overlapped_gene <- function(idx, all_genes_gr) {
>  #cat(idx, "\n")
>  curr_gene <- all_genes_gr[idx]
>  other_genes <- all_genes_gr[-idx]
>  n <- countOverlaps(curr_gene, other_genes)
>  gene <- subsetByOverlaps(curr_gene, other_genes)
>  return(list(n, gene))
> }​
> 
> system.time(lapply(1:100, function(idx)  find_overlapped_gene(idx, 
> all_genes_gr)))
> However, for 100 genes, it use nearly ~8s by system.time().That means if I 
> had 50000 genes, nearly one hour for just find overlapped gene. 
> 
> I am just wondering any algorithm or strategy to do that efficiently, perhaps 
> 50000 genes in ~10min or even less
> 
I suspect this would happen on a much faster basis for such a small dataset.

-- 
David.



>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to