PeroMHC wrote:
Hi All, So here is the problem... I have a FASTA file (used for DNA
analyses) that looks like this:

...
gnl|SRA|SRR019045.10.1 SL-XAY_956090708:2:1:0:1028.1 length=152
NCTTTTTTTATTTTTTGTATAAATGAAGTTTCACTATATCGGACGAGCGGTTCAGCAGTCATTCCGAGAC
CGATATAGTGAAACTTCATTTCTACAAAAANTACCAAACGTCGCTCGGCAGAGCGTCGTGTTGGGCAAGA
GAGTAGCACTCG
gnl|SRA|SRR019045.11.1 SL-XAY_956090708:2:1:0:1151.1 length=152
NGGTNTGGNNNNCNCCNTNCTNCNNCNTCANCCTCCNGTCNCANNCCNCNTNNNNNCNNNNNCNNTNCTT
CTNCNNTCTCCATTCCTTCTTNATAGCCTGCTCCANCGCACGTTGAACCTTCTGCACCACGAACGCACTC
ACACCACTCATC
gnl|SRA|SRR019045.12.1 SL-XAY_956090708:2:1:0:1197.1 length=152
NGTCGGGTCTTCGCTATCACTGGACTGCTCCCATCAGCTATAGGTCCTCCCCGCCACACCCCATGCCCAC
CGCCTATCCACGTCTGTCACAACCTCATACATCAGACAGTCACACTTACCAACATATCCAAGCACCTCAA
GCAACACATCAT
...

This snippet represents 3 individual DNA sequences. Each sequences is
identified by the line starting with >
The complete file has about 10 million individual sequences.

A simple enough problem, I want to read in this data, and cut out the
last 76 letters (nucleotides) from each individual sequence and send
them to a new txt file with a similar format.

Any help on how to do this would be appreciated.
Thanks!

If the input file is large then you can reduce the amount of memory
needed by reading the input file a line at a time by iterating over the
file object:

    input_file = open(input_path)
    for line in input_file:
        ...
    input_file.close()

Each line will end with '\n', so use the 'rstrip' method to remove it,
and then slice the last 76 characters:

    last_part = line.rstrip()[-76 : ]
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to