On Fri, Feb 22, 2013 at 3:08 PM, Tiago Hori <tiago.h...@gmail.com> wrote:
> Hi All,
>
> One problem that often encounter is with line endings. I have to parse
> several kinds of files routinely and often these are generated in excel and
> therefore it is hard to anticipate which line ending I actually have. It
> seems to me that during while loops, if the line ending is not a LF, the
> line are not read properly. What often do is to open the file on gedit and
> save as for unix.
>
> What I was wondering is: is there any way to force perl to use other line
> ending characters, like MacOS CR?
>
> Thanks!
>
> T.


Answering directly to the OP because everyone is giving ten year old answers : (

Perl 5.10 and newer has the \R construct in regexes, which matches a
'generic newline'; it's something like /(?>\r\n)|\v/. Unfortunately,
you can't set $/ to a regex, yet, so if you have a file with differing
newlines, the only correct way forward is slurping the entire file,
then splitting on /\R/.

However, the above isn't really feasible for larger files, where
you'll probably need something like

use strict;
use 5.010; # For \R
use Fcntl;
use Encode;

my $file = shift;

sysopen( my $fh, $file, O_RDONLY );

my $over = '';
while ( sysread( $fh, $over, 8192, length($over) ) ) {
    while ( $over =~ /\R/ ) {
        my $line = encode('UTF-8', substr($over, 0, $+[0], ''));
        do_something_with_line($line);
    }
}
if ( $! ) {
    die "sysread error on $file: $!";
}

close $fh or die "Couldn't close $file: ";

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to