> Got a combination that sort of works.  It returns all the required
> fields but truncates any line where $usd_unit or $units_usd has more
> than 1 digit before the decimal point.  There can be as many as (8)
> digits before and (10) digits after the decimal point in both cases.
> 
> Here's the regex I'm using:
> 
> ($cur_sym, $cur_desc, $usd_unit, $units_usd) =
> /^([A-Z]{3})+\s+([A-Za-z\s]{28})+\s+(\d+\.\d+)+\s+(\d+\.\d+)/;

    /^

matches start of line. Ok.

    [A-Z]{3}

matches 3 uppercase letters. Ok. 

    ([A-Z]{3})+

matches 3, 6, 9, ... uppercase letters and puts the
last set of 3 in to $cur_sym. Probably not what you
meant. You should stick to what we had before:

    ([A-Z]{3})

with a space following as the next matching character
of the pattern.

    \s+

matches one or more spaces. Ok.

    ([A-Za-z\s]{28})

matches the next 28 alpha or whitespace characters.
(whitespace means spaces or tabs or newlines.) Ok.

    +

matches the previous 28 character atom 1 or more
times, and returns the last 28 character match as the
second variable ($cur_desc). Not what you want.
Remove this extraneous +.

    \s+

matches one or more whitespace characters. ok.

    (\d+\.\d+)

matches one or more digits, followed by one space,
followed by one or more digits. ok.

matches the previous atom 1 or more times. Again,
not what you want. Remove the extraneous +.

    \s+

matches one or more whitespace characters. ok.

    (\d+\.\d+)

matches one or more digits, followed by one space,
followed by one or more digits. ok.

    /;

means that anything can follow the rest of the pattern.

I'd recommend tightening the pattern up by making the
end be:

    \s+$/;

which matches any amount of whitespace and then the
end of the line.

Did you spot your mistake? I didn't, but I'll let you tidy
up your regex first and see if you don't spot your problem.

Reply via email to