> Got a combination that sort of works. It returns all the required
> fields but truncates any line where $usd_unit or $units_usd has more
> than 1 digit before the decimal point. There can be as many as (8)
> digits before and (10) digits after the decimal point in both cases.
>
> Here's the regex I'm using:
>
> ($cur_sym, $cur_desc, $usd_unit, $units_usd) =
> /^([A-Z]{3})+\s+([A-Za-z\s]{28})+\s+(\d+\.\d+)+\s+(\d+\.\d+)/;
/^
matches start of line. Ok.
[A-Z]{3}
matches 3 uppercase letters. Ok.
([A-Z]{3})+
matches 3, 6, 9, ... uppercase letters and puts the
last set of 3 in to $cur_sym. Probably not what you
meant. You should stick to what we had before:
([A-Z]{3})
with a space following as the next matching character
of the pattern.
\s+
matches one or more spaces. Ok.
([A-Za-z\s]{28})
matches the next 28 alpha or whitespace characters.
(whitespace means spaces or tabs or newlines.) Ok.
+
matches the previous 28 character atom 1 or more
times, and returns the last 28 character match as the
second variable ($cur_desc). Not what you want.
Remove this extraneous +.
\s+
matches one or more whitespace characters. ok.
(\d+\.\d+)
matches one or more digits, followed by one space,
followed by one or more digits. ok.
matches the previous atom 1 or more times. Again,
not what you want. Remove the extraneous +.
\s+
matches one or more whitespace characters. ok.
(\d+\.\d+)
matches one or more digits, followed by one space,
followed by one or more digits. ok.
/;
means that anything can follow the rest of the pattern.
I'd recommend tightening the pattern up by making the
end be:
\s+$/;
which matches any amount of whitespace and then the
end of the line.
Did you spot your mistake? I didn't, but I'll let you tidy
up your regex first and see if you don't spot your problem.