Re: Regular Expression change

Rob Dixon Wed, 20 Jul 2011 09:26:14 -0700

On 15/07/2011 16:42, David Wagner wrote:


        I have the following map:
        
                map{[$_,(/^\d/ ? 1 : 0) . /^([^;]+)/,  
/[^;]+;[^;]*;[^;]+;[^;]+;([^;]+);/]}

        I had a failure during the night because some data field(s) had
a semi-colon in the data. So what I have is a pre-defined data separator
that would not normally appear in data. What I have selected and have
been using is ;';  . I was going to do this, until I got down to this
map and I am unsure how to change ([^;]+) or [^;]+ to have ;'; as the
separator of my fields. What I am doing is reports and scrapping the
data, collecting and then reformatting to send out as emails.

        Any thoughts on what could be done??

       Thanks for any insights you might on this...

Wags ;)


Hello David.

Fiest of all, setting aside your embedded field separators, may I make
some comments on your code?

- I find it a little impregnable, and think you could make it more
readable by assing some whitespace.

- The second element of your anonymous array seems a little strange, but
it looks like you want the first field in the data, preceded by '1' or
'0' according to whether it starts with a digit. But your regex is in
scalar context so, instead of extracting the first field, you will get
'1' or '' according to the success of the match. To extract the value of
the field itself you must apply list context - something like

  (/^\d/ ? 1 : 0) . (/^([^;]+)/)[0]

- The regex generating the third field can be written more readably as

  / (?: [^;]+ ;){4} ([^;]+); /x

So as a first improvement I suggest

  map { [
    $_,
    (/^\d/ ? 1 : 0) . (/^([^;]+)/)[0],
    / (?: [^;]+ ;){4} ([^;]+); /x
  ] }

But I think it would be best to use split rather than regexes to first
separate the data into fields and then manipulate them individually.

map {
  my @fields = split /;/;
  [
    $_,
    ($fields[0] =~ /^\d/ ? 1 : 0) . $fields[0],
    $fields[4]
  ]
}

Finally, to handle the embedded semicolons properly, simply replace the
split with a call to Text::CSV as Ruud recommends. Without knowing how
your data distinguishes between separators and data I cannot be sure how
this should be coded, but by default the module assumes double-quotes
around fields that must not be split.

  use Text::CSV;

  my $csv = Text::CSV->new({sep_char => ';'});

  map {
    $csv->parse($_) or die $csv->error_diag;
    my @fields = $csv->fields;
    [
      $_,
      ($fields[0] =~ /^\d/ ? 1 : 0) . $fields[0],
      $fields[4]
    ]
  }

One last thought - I think map is probably a poor choice in this case,
but I cannot tell from only a fragment of your code. I would prefer to
see a 'foreach' or a 'while iterating over the source data, and the
corresponding translation pushed onto a target array.

I hope this helps,

Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regular Expression change

Reply via email to