chen li wrote:
You are 50% right. This method is not correct for the
first record(which actually contains ">' only) but it
is correct for the last record(and others in between).
I want to edit the file first and try to delete the
first ">" in this big file. I browse Programming Perl
and Perl Cookbook there is not such example: just
delete the first charater in a file. But they have
examples to delete the last line from a file. It seems
odd to me.
First, I'd recommend against changing a large file. Unless your program
is the only user of the file, you would have to change all the other
programs. And in this case, you would be unable to distinguish one
record from another.
There are three ways to distinguish records in a file: by record
separators, by beginning-of-record tokens, and by end-of-record tokens.
Your file may use one, two or all three methods. When writing code, your
preference should be (in order) record separator, end-of-record token,
and finally beginning-of-record token.
In Perl, the variable $/ is used to distinguish the end-of-record token;
even though it is called the INPUT_RECORD_SEPARATOR. Its name is
misleading. If it was a true record separator, your code would never
have to process the record separator; it would be discarded at a lower
level.
The records in your file are distinguished only by a beginning-of-record
token, specifically a greater-than sign at the beginning of a record.
You can process the file in two ways: treat the beginning-of record
token as an end-of-record token, or read ahead in the file and process
the record only after reading the beginning of the next record. Both
have the advantages and disadvantages.
If you want to treat the beginning-of-record token as an end-of-record
one, your records are going to have some anomalies. The first record is
going to have a beginning-of-record token attached to it. Your last
record is not going to have an end-of-record token. For your case, it
would look something like this:
my $beginning_token = '>';
my $end_token = "\n$beginning_token";
$/ = $end_token;
my $first = 1;
while( <FH> ){
if( $first ){
s/^\Q$beginning_token//;
$first = 0;
}
s/\Q$end_token\E$//;
process_record( $_ );
}
If you want to use only the beginning-of-record token, you will have to
do at least a partial read ahead. This means you have to store the read
ahead and the last record will be processed outside the read loop. For
you case:
my $beginning_token = '>';
my $record = '';
while( <FH> ){
if( /^\Q$beginning_token/ ){
if( $record =~ /^\Q$beginning_token/ ){
process_record( $record );
}
$record = '';
}
$record .= $_;
}
if( $record =~ /^\Q$beginning_token/ ){
process_record( $record );
}
--
Just my 0.00000002 million dollars worth,
--- Shawn
"Probability is now one. Any problems that are left are your own."
SS Heart of Gold, _The Hitchhiker's Guide to the Galaxy_
* Perl tutorials at http://perlmonks.org/?node=Tutorials
* A searchable perldoc is available at http://perldoc.perl.org/
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>