Chris Lasher wrote:

I have a rather large (100+ MB) FASTA file from which I need to
access records in a random order.

I just came across this thread today and I don't understand why you are trying to reinvent the wheel instead of using Biopython which already has a solution to this problem, among others.

But actually I usually use formatdb, which comes with NCBI-BLAST to
create blastdb files that can also be used for BLAST.

[EMAIL PROTECTED] /data/blastdb/Users/mh5
$ python
Python 2.3.3 (#1, Jan 20 2004, 17:39:36) [C] on osf1V5
Type "help", "copyright", "credits" or "license" for more information.
import blastdb
>>> from tools2 import LightIterator
temp_file = 
blastdb.Database("mammals.peptides.faa").fetch_to_tempfile("004/04/m00404.peptide.faa")
LightIterator(temp_file).next()
('lcl|004/04/m00404.peptide.faa ENSMUSG00000022297 peptide', 
'MERSPFLLACILLPLVRGHSLFTCEPITVPRCMKMTYNMTFFPNLMGHYDQGIAAVEMGHFLHLANLECSPNIEMFLCQAFIPTCTEQIHVVLPCRKLCEKIVSDCKKLMDTFGIRWPEELECNRLPHCDDTVPVTSHPHTELSGPQKKSDQVPRDIGFWCPKHLRTSGDQGYRFLGIEQCAPPCPNMYFKSDELDFAKSFIGIVSIFCLCATLFTFLTFLIDVRRFRYPERPIIYYSVCYSIVSLMYFVGFLLGNSTACNKADEKLELGDTVVLGSKNKACSVVFMFLYFFTMAGTVWWVILTITWFLAAGRKWSCEAIEQKAVWFHAVAWGAPGFLTVMLLAMNKVEGDNISGVCFVGLYDLDASRYFVLLPLCLCVFVGLSLLLAGIISLNHVRQVIQHDGRNQEKLKKFMIRIGVFSGLYLVPLVTLLGCYVYELVNRITWEMTWFSDHCHQYRIPCPYQANPKARPELALFMIKYLMTLIVGISAVFWVGSKKTCTEWAGFFKRNRKRDPISESRRVLQESCEFFLKHNSKVKHKKKHGAPGPHRLKVISKSMGTSTGATTNHGTSAMAIADHDYLGQETSTEVHTSPEASVKEGRADRANTPSAKDRDCGESAGPSSKLSGNRNGRESRAGGLKERSNGSEGAPSEGRVSPKSSVPETGLIDCSTSQAASSPEPTSLKGSTSLPVHSASRARKEQGAGSHSDA')

tools2 has this in it:

class LightIterator(object):
    def __init__(self, handle):
        self._handle = handle
        self._defline = None

    def __iter__(self):
        return self

    def next(self):
        lines = []
        defline_old = self._defline

        while 1:
            line = self._handle.readline()
            if not line:
                if not defline_old and not lines:
                    raise StopIteration
                if defline_old:
                    self._defline = None
                    break
            elif line[0] == '>':
                self._defline = line[1:].rstrip()
                if defline_old or lines:
                    break
                else:
                    defline_old = self._defline
            else:
                lines.append(line.rstrip())

        return defline_old, ''.join(lines)

blastdb.py:

#!/usr/bin/env python
from __future__ import division

__version__ = "$Revision: 1.3 $"

"""
blastdb.py

access blastdb files
Copyright 2005 Michael Hoffman
License: GPL
"""

import os
import sys

try:
    from poly import NamedTemporaryFile # 
http://www.ebi.ac.uk/~hoffman/software/poly/
except ImportError:
    from tempfile import NamedTemporaryFile

FASTACMD_CMDLINE = "fastacmd -d %s -s %s -o %s"

class Database(object):
    def __init__(self, filename):
        self.filename = filename

    def fetch_to_file(self, query, filename):
        status = os.system(FASTACMD_CMDLINE % (self.filename, query, filename))
        if status:
            raise RuntimeError, "fastacmd returned %d" % os.WEXITSTATUS(status)

    def fetch_to_tempfile(self, query):
        temp_file = NamedTemporaryFile()
        self.fetch_to_file(query, temp_file.name)
        return temp_file
--
Michael Hoffman
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to