Use LWP to get web data - not lynx and the like unless you can't help it. I
prefer using Web::Scraper to parse html but either way it's probably best
not to use a regex (see SO and similar for discussions on the like).

On Feb 23, 2014 8:13 AM, "Wernher Eksteen" <crypt...@gmail.com> wrote:
>
> Hi,
>
> Thanks, but how do I assign the value found by the regex to a variable so
that the "1.2.4" from 6 file names in the array @fileList are print only
once, and if there are other versions found say 1.2.5 and 1.2.6 to print
the unique values from all.
>
> This is my script thus far. The aim of this script is to connect to the
site, remove all html tags and obtain only the file names I need.
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> # initiating package names to be used later
> my @getList;
> my @fileList;
>
> # get files using lynx and parse through it
> my $url = "http://mathias-kettner.com/download";;
> open my $in, "lynx -dump $url |" or die $!;
>
> # get the bits we need and push it to an array to further filter what we
need
> while(<$in>){
>  chomp;
>   if( /\[(\d+)\](.+)/ ){
>    next if $1 == 1;
>     push @getList, "$2\n";
>  }
> }
>
> # filter only the files we need into final array
> foreach my $i (@getList) {
>   my @list = split /\s+/, $i;
>   push @fileList, "$list[0]\n", if $i =~ /rpm|tar/ && $i !~ /[0-9][a-z]/;
> }
>
> # print the list
> print "\nList of files to be retrieved from $url:\n\n @fileList\n";
>
> The output is then:
>
> List of files to be retrieved from http://mathias-kettner.com/download:
>
>
>  check_mk-1.2.4.tar.gz
>  check_mk-agent-1.2.4-1.noarch.rpm
>  check_mk-agent-logwatch-1.2.4-1.noarch.rpm
>  check_mk-agent-oracle-1.2.4-1.noarch.rpm
>  mk-livestatus-1.2.4.tar.gz
>  mkeventd-1.2.4.tar.gz
>
> From that I want to get the value 1.2.4 and assign it to a variable, if
there are more than one value such as 1.2.5 and 1.2.6 as well, it should
print them too, but only the unique values.
>
> My attempt shown below to print only the value 1.2.4 is as follow, but it
prints out "1.2.41.2.41.2.41.2.41.2.41.2.4" next to each other, if I pass a
newline to $i such as "$i\n" it then prints "111111" ?
>
> foreach my $i (@fileList) {
>         print $i =~  /\b(\d+\.\d+\.\d+)\b/;
> }
>

The 1s are all of the returns of true (or one match). You want to print
"$i\n" if (foo)

Reply via email to