Hi all, I have found the solution to my "HELP FORMATING A FILE". Actually, I was already very close to the solution. In case someone was interested, here it is the script Regards, Pedro #!/usr/sbin/perl -w #use strict; if (!@ARGV) { print "usage: $0 blast_output \n"; exit 0; } while (<>) { if (/(>\S+)\s*/) { print "$1\n"; } next if (/Length/); next if (/^\s*$/); if (/Sbjct/) { chomp; my ($query, $number1, $sequence, $number2) = split; $sequence =~ tr/-//d; print "$sequence\n"; } } HI All, I have a file from a blast report output which looks like the following: gi|12383919|gb|BF981107.1|BF981107 602310351F1 NIH_MGC_88 H... 271 4e-72 gi|12168431|gb|BF825777.1|BF825777 MR2-HN0035-171100-001-a0... 242 3e-63 Alignments >gi|12383919|gb|BF981107.1|BF981107 602310351F1 NIH_MGC_88 Homo sapiens cDNA clone IMAGE:4401421 5'. Length = 967 Score = 271 bits (694), Expect = 4e-72 Identities = 135/141 (95%), Positives = 138/141 (97%) Frame = +3 Query: 17 QAGPWRVSAPPSGPPQFPAVVPGPSLEVARAHMLALGPQQLLAQDEEGDTLLHLFAARGL 76 +AGPWRVSAPPSGPPQFPAVVPGPSLEVARAHMLALGPQQLLAQDEEGDTLLHLFAARGL Sbjct: 15 EAGPWRVSAPPSGPPQFPAVVPGPSLEVARAHMLALGPQQLLAQDEEGDTLLHLFAARGL 194 Query: 77 RWAAYAAAEVLQVYRRLDIREHKGKTPLLVAAAANQPLIVEDLLNLGAEPNAADHQGRSV 136 RWAAYAAAEVLQVYRRLDIREHKGKTPLLVAAAANQPLIVEDLLNLGAEPNAADHQGRSV Sbjct: 195 RWAAYAAAEVLQVYRRLDIREHKGKTPLLVAAAANQPLIVEDLLNLGAEPNAADHQGRSV 374 Query: 137 LHVAATYGLPGVLAVFKSGIQ 157 LHVAATYGLPGVL V+ +G Q Sbjct: 375 LHVAATYGLPGVLLVWPAGRQ 437 Score = 32.7 bits (73), Expect = 4.4 Identities = 21/46 (45%), Positives = 25/46 (53%), Gaps = 11/46 (23%) Frame = +2 Query: 133 GRSVLHVAAT------YGLPGVLAVFK-----SGIQVDLEARDFEG 167 GR V + A+ Y P V +F SG+QVDLEARDFEG Sbjct: 452 GRLVAQILASRPGGQGYPYPAVCLLFLPGCAYSGVQVDLEARDFEG 589 >gi|12168431|gb|BF825777.1|BF825777 MR2-HN0035-171100-001-a09 HN0035 Homo sapiens cDNA. Length = 598 Score = 242 bits (618), Expect = 3e-63 Identities = 136/184 (73%), Positives = 139/184 (74%), Gaps = 33/184 (17%) Frame = +1 Query: 16 PQAGPWRVSA-----PPSGPPQFPAVVPGPSLEVARAHMLALGPQQLLAQDEEGDT---- 66 PQA WR+ P PPQFPAVVPGPSLEVARAHMLALGPQQLLAQDEEGDT Sbjct: 31 PQA--WRLDPGEFLHPLQ*PPQFPAVVPGPSLEVARAHMLALGPQQLLAQDEEGDT*V*G 204 Query: 67 -----------------------LLHLFAARGLRWAAYAAAEVLQVYRRLDIREHKGKTP 103 LLHLFAARGLRWAAYAAAEVLQVYRRLDIREHKGKTP Sbjct: 205 IGLSADSWLGGGCSHGCPPPVLRLLHLFAARGLRWAAYAAAEVLQVYRRLDIREHKGKTP 384 Query: 104 LLVAAAANQPLIVEDLLNLGAEPNAADHQGRSVLHVAATYGLPGV-LAVFKSGIQVDLEA 162 LLV AAANQPLIVEDLLNLGAEPNAADHQGRSVLHV ATYGLPGV LAV SG+ V+LEA Sbjct: 385 LLVVAAANQPLIVEDLLNLGAEPNAADHQGRSVLHVGATYGLPGVLLAVLNSGVHVELEA 564 Query: 163 RDFE 166 RDFE Sbjct: 565 RDFE 576 and bassically I want to extract the "Sbjct" lines under every ">" initiated record and come out with a file that for the above case will look as follows: >gi|12383919|gb|BF981107.1|BF981107 EAGPWRVSAPPSGPPQFPAVVPGPSLEVARAHMLALGPQQLLAQDEEGDTLLHLFAARGL RWAAYAAAEVLQVYRRLDIREHKGKTPLLVAAAANQPLIVEDLLNLGAEPNAADHQGRSV LHVAATYGLPGVLLVWPAGRQ >gi|12168431|gb|BF825777.1|BF825777 PQAWRLDPGEFLHPLQPPQFPAVVPGPSLEVARAHMLALGPQQLLAQDEEGDTVG IGLSADSWLGGGCSHGCPPPVLRLLHLFAARGLRWAAYAAAEVLQVYRRLDIREHKGKTP LLVVAAANQPLIVEDLLNLGAEPNAADHQGRSVLHVGATYGLPGVLLAVLNSGVHVELEA RDFE The sequence of strings under the line starting with ">" could be in a single line. The code under these lines is doing something to one of the ">" started record, but still is not right. Moreover, I do not know how to make the program jump from one ">" record to next one. Please help. #!/usr/sbin/perl -w use strict; if (!@ARGV) { print "usage: $0 blast_output \n"; exit 0; } while (<>) { if (/(>\S+)\s*/) { print "$1\n"; } next if (/Length/); next if (/^\s*$/); if (/Query/) { chomp; my ($query, $number1, $sequence, $number2) = split; $sequence =~ tr/-//d; $sequence.= $sequence; } } print "$sequence\n"; -- *************************************************************************** PEDRO a. RECHE gallardo, pHD TL: 617 632 3824 Scientist, Mol.Immnunol.Foundation, FX: 617 632 3351 Dana-Farber Cancer Institute, EM: [EMAIL PROTECTED] Harvard Medical School, URL: http://www.reche.org 44 Binney Street, D610C, Boston, MA 02115 ***************************************************************************