Re: remove last 76 letters from string

MRAB Wed, 05 Aug 2009 17:33:44 -0700

PeroMHC wrote:

Hi All, So here is the problem... I have a FASTA file (used for DNA
analyses) that looks like this:

...

gnl|SRA|SRR019045.10.1 SL-XAY_956090708:2:1:0:1028.1 length=152

NCTTTTTTTATTTTTTGTATAAATGAAGTTTCACTATATCGGACGAGCGGTTCAGCAGTCATTCCGAGAC
CGATATAGTGAAACTTCATTTCTACAAAAANTACCAAACGTCGCTCGGCAGAGCGTCGTGTTGGGCAAGA
GAGTAGCACTCG

gnl|SRA|SRR019045.11.1 SL-XAY_956090708:2:1:0:1151.1 length=152

NGGTNTGGNNNNCNCCNTNCTNCNNCNTCANCCTCCNGTCNCANNCCNCNTNNNNNCNNNNNCNNTNCTT
CTNCNNTCTCCATTCCTTCTTNATAGCCTGCTCCANCGCACGTTGAACCTTCTGCACCACGAACGCACTC
ACACCACTCATC

gnl|SRA|SRR019045.12.1 SL-XAY_956090708:2:1:0:1197.1 length=152

NGTCGGGTCTTCGCTATCACTGGACTGCTCCCATCAGCTATAGGTCCTCCCCGCCACACCCCATGCCCAC
CGCCTATCCACGTCTGTCACAACCTCATACATCAGACAGTCACACTTACCAACATATCCAAGCACCTCAA
GCAACACATCAT
...

This snippet represents 3 individual DNA sequences. Each sequences is
identified by the line starting with >
The complete file has about 10 million individual sequences.

A simple enough problem, I want to read in this data, and cut out the
last 76 letters (nucleotides) from each individual sequence and send
them to a new txt file with a similar format.

Any help on how to do this would be appreciated.
Thanks!


If the input file is large then you can reduce the amount of memory
needed by reading the input file a line at a time by iterating over the
file object:

    input_file = open(input_path)
    for line in input_file:
        ...
    input_file.close()

Each line will end with '\n', so use the 'rstrip' method to remove it,
and then slice the last 76 characters:

    last_part = line.rstrip()[-76 : ]
--
http://mail.python.org/mailman/listinfo/python-list

Re: remove last 76 letters from string

Reply via email to