On 15/07/2011 16:42, David Wagner wrote:
I have the following map: map{[$_,(/^\d/ ? 1 : 0) . /^([^;]+)/, /[^;]+;[^;]*;[^;]+;[^;]+;([^;]+);/]} I had a failure during the night because some data field(s) had a semi-colon in the data. So what I have is a pre-defined data separator that would not normally appear in data. What I have selected and have been using is ;'; . I was going to do this, until I got down to this map and I am unsure how to change ([^;]+) or [^;]+ to have ;'; as the separator of my fields. What I am doing is reports and scrapping the data, collecting and then reformatting to send out as emails. Any thoughts on what could be done?? Thanks for any insights you might on this... Wags ;)
Hello David. Fiest of all, setting aside your embedded field separators, may I make some comments on your code? - I find it a little impregnable, and think you could make it more readable by assing some whitespace. - The second element of your anonymous array seems a little strange, but it looks like you want the first field in the data, preceded by '1' or '0' according to whether it starts with a digit. But your regex is in scalar context so, instead of extracting the first field, you will get '1' or '' according to the success of the match. To extract the value of the field itself you must apply list context - something like (/^\d/ ? 1 : 0) . (/^([^;]+)/)[0] - The regex generating the third field can be written more readably as / (?: [^;]+ ;){4} ([^;]+); /x So as a first improvement I suggest map { [ $_, (/^\d/ ? 1 : 0) . (/^([^;]+)/)[0], / (?: [^;]+ ;){4} ([^;]+); /x ] } But I think it would be best to use split rather than regexes to first separate the data into fields and then manipulate them individually. map { my @fields = split /;/; [ $_, ($fields[0] =~ /^\d/ ? 1 : 0) . $fields[0], $fields[4] ] } Finally, to handle the embedded semicolons properly, simply replace the split with a call to Text::CSV as Ruud recommends. Without knowing how your data distinguishes between separators and data I cannot be sure how this should be coded, but by default the module assumes double-quotes around fields that must not be split. use Text::CSV; my $csv = Text::CSV->new({sep_char => ';'}); map { $csv->parse($_) or die $csv->error_diag; my @fields = $csv->fields; [ $_, ($fields[0] =~ /^\d/ ? 1 : 0) . $fields[0], $fields[4] ] } One last thought - I think map is probably a poor choice in this case, but I cannot tell from only a fragment of your code. I would prefer to see a 'foreach' or a 'while iterating over the source data, and the corresponding translation pushed onto a target array. I hope this helps, Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/