Dear Alessia,

 I am very new to R and wanted to know if there is a package that, given very
 long nucleotide sequences, searches and identifies short (7-10nt) motifs..  I
 would like to look for enrichment of certain motifs in genomic sequences.

 I tried using MEME (not an R package, I know), but the online version only
 allows sequences up to MAX 60000 nucleotides, and that's too short for my
 needs..

You may try this:

#
# Load the seqinr package:
#
  library(seqinr)
#
# A FASTA file example - that ships with seqinr - which contains
# the complete genome sequence of Chlamydia trachomatis :
#
  fastafile <- system.file("sequences/ct.fasta", package = "seqinr")
#
# Import the sequence as a string of characters:
#
  myseq <- read.fasta(fastafile, as.string = TRUE)
  nchar(myseq) # 1042519, that is a Mb sequence
#
# Look for motif "atatatat", with possible overlap:
#
  words.pos("atatatat", myseq, extended = TRUE)
#
# This returns the posistions where the motif is found, that
# is : 236501 236503 283987 687083 792792 792794
#
  substr(myseq, 236501, 236501 + 8)
#
# Should be
# [1] "atatatata"
#

HTH,

Jean
--
Jean R. Lobry            ([EMAIL PROTECTED])
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo  : +33 472 43 27 56     fax    : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to