chen li wrote:

You are 50% right. This method is not correct for the
first record(which actually contains ">' only) but it
is correct for the last record(and others in between).

I want to edit the file first and try to delete the
first ">" in this big file. I browse Programming Perl
and Perl Cookbook there is not such  example: just
delete the first charater in a file. But they have
examples to delete the last line from a file. It seems
odd to me.

First, I'd recommend against changing a large file. Unless your program is the only user of the file, you would have to change all the other programs. And in this case, you would be unable to distinguish one record from another.

There are three ways to distinguish records in a file: by record separators, by beginning-of-record tokens, and by end-of-record tokens. Your file may use one, two or all three methods. When writing code, your preference should be (in order) record separator, end-of-record token, and finally beginning-of-record token.

In Perl, the variable $/ is used to distinguish the end-of-record token; even though it is called the INPUT_RECORD_SEPARATOR. Its name is misleading. If it was a true record separator, your code would never have to process the record separator; it would be discarded at a lower level.

The records in your file are distinguished only by a beginning-of-record token, specifically a greater-than sign at the beginning of a record. You can process the file in two ways: treat the beginning-of record token as an end-of-record token, or read ahead in the file and process the record only after reading the beginning of the next record. Both have the advantages and disadvantages.

If you want to treat the beginning-of-record token as an end-of-record one, your records are going to have some anomalies. The first record is going to have a beginning-of-record token attached to it. Your last record is not going to have an end-of-record token. For your case, it would look something like this:

my $beginning_token = '>';
my $end_token = "\n$beginning_token";
$/ = $end_token;
my $first = 1;
while( <FH> ){
  if( $first ){
    s/^\Q$beginning_token//;
    $first = 0;
  }
  s/\Q$end_token\E$//;
  process_record( $_ );
}

If you want to use only the beginning-of-record token, you will have to do at least a partial read ahead. This means you have to store the read ahead and the last record will be processed outside the read loop. For you case:

my $beginning_token = '>';
my $record = '';
while( <FH> ){
  if( /^\Q$beginning_token/ ){
    if( $record =~ /^\Q$beginning_token/ ){
      process_record( $record );
    }
    $record = '';
  }
  $record .= $_;
}
if( $record =~ /^\Q$beginning_token/ ){
  process_record( $record );
}



--

Just my 0.00000002 million dollars worth,
   --- Shawn

"Probability is now one. Any problems that are left are your own."
   SS Heart of Gold, _The Hitchhiker's Guide to the Galaxy_

* Perl tutorials at http://perlmonks.org/?node=Tutorials
* A searchable perldoc is available at http://perldoc.perl.org/

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to