Andrew Westcott wrote:
>
> I'm new to perl but need to write a script that takes a file and formats
> lines.
>
> The file has to 2 fields that are tab separated and each field is made up of
> items separated by some type of linefeed character. The end of the second
> field is identified by another type of linefeed character.
>
> When I view the file in VIM the second linefeed shows as a ^M so there must
> be a way of identifying these separately.
>
> I have tried searching for \r \n  %CR %LF $VT $FF but nothing seems to give
> the required effect.
>
> I need to run the script on a PC.
>
> Please can you offer some advice or possible places to look.

Hi Andy.

Where does this data come from? If it arrives as stream input from
a file then Perl usually does a good job of translating the line
terminators to a simple "\n" character before passing the data to
the program. I may be that you need to read it differently.

However, ^M looks like control-M to me, which is a carriage-return.
This is one of the 'whitespace' characters which can be removed
by simply stripping trailing whitespace from the record:

  $record =~ s/\s+$//;
  my @fields = split /\t/, $record;

but if trailing whitespace is significant in your field values then
you can't do this, especially if the final field could be all spaces
when you'll lose the whole of that field and the last tab separator.
In this case you should just get rid of CR characters with

  $record =~ tr/\r//d;

and all should be well.

HTH,

Rob





-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to