Re: Want to write a script to note specific IP addresses.

John W. Krahn Thu, 22 Oct 2009 10:17:40 -0700

hongyi.z...@gmail.com wrote:

On Thursday, October 22, 2009 at 15:08, mrdanwal...@gmail.com wrote:

2009/10/19 Hongyi Zhao <hongyi.z...@gmail.com>:

I want to write a script to note specific IP
Â addresses by appending the corresponding location informations. Â For

I suggest you take a look at Geo::IP [0]. I've used it quite a few
times in the past for similar tasks to what you're describing. I could
supply an example, but you'll learn more by trying yourself. If you
get stuck, post back, and we'll take a look.

Although, if your list is really custom, and you can't (or won't) use
Geo::IP instead, then there's probably still a module on the cpan to
help you out. A quick search suggests NetAddr::IP or Net::Netmask
might be good starting points.

Hope this helps. :)

[0] http://search.cpan.org/perldoc?Geo::IP


Firstly, thank you very much for your help.

Here,  I'll  describe the solution used by me now and some issues when
use this method:

Currently,  I  use  a complete binary IPdatabase named QQWry.Dat which
include  373374  IP  blocks oall over the world, you can download this
IPdatabase from the following url:

http://update.cz88.net/soft/qqwry.rar

Then  you  can  use the following perl script to read the IP addresses
and  the  corresponding  location informations to note the specific IP
addresses:

http://www.ieasy.org/download/qqwry.pl

The above qqwry.pl has the following content:

------------
sub ipwhere {
        my $ipbegin,$ipend,$ipData1,$ipData2,$DataSeek,$ipFlag;

It is usually better to declare variables in the smallest possible scopeinstead of all at the beginning of the program.

        my $ip=shift;
        my @ip=split(/\./,$ip);
        my $ipNum = $ip[0]*16777216+$ip[1]*65536+$ip[2]*256+$ip[3];

You should probably use the Socket module to convert IP addresses tonumbers.

        my $ipfile="./QQWry.Dat";
        open(FILE,"$ipfile");


You should *always* verify that the file opened correctly.

        open FILE, '<', $ipfile or die "Cannot open '$ipfile' $!";

        binmode(FILE);
        sysread(FILE,$ipbegin,4);
        sysread(FILE,$ipend,4);

Why are you using sysread() only here but not elsewhere? You should*always* verify that system functions operated correctly.


        read FILE, my $ipbegin, 4 or die "Cannot read from '$ipfile' $!";
        read FILE, my $ipend,   4 or die "Cannot read from '$ipfile' $!";

         $ipbegin=unpack("L",$ipbegin);
         $ipend=unpack("L",$ipend);
        my $ipAllNum = ($ipend-$ipbegin)/7+1;

        my $BeginNum=0;
        my $EndNum=$ipAllNum;

        Bgn:
        my $Middle= int(($EndNum+$BeginNum)/2);

        seek(FILE,$ipbegin+7*$Middle,0);

You should use the "SEEK_SET", "SEEK_CUR", and "SEEK_END" constants fromthe Fcntl module instead numerical values that may or may not be correct.

        read(FILE,$ipData1,4);


You should *always* verify that system functions operated correctly.

seek FILE, $ipbegin + 7 * $Middle, SEEK_SET or die "Cannot seekon '$ipfile' $!";;

        read FILE, $ipData1, 4 or die "Cannot read from '$ipfile' $!";;

        my $ip1num=unpack("L",$ipData1);
        if ($ip1num > $ipNum) {

Why convert $ip1num and $ipNum to numbers to compare them when you canjust compare them as strings?

                $EndNum=$Middle;
                goto Bgn;

Using goto is bad practice. You should probably redesign the algorithmto avoid the use of goto.

        }

        read(FILE,$DataSeek,3);


You should *always* verify that system functions operated correctly.

        $DataSeek=unpack("L",$DataSeek."\0");
        seek(FILE,$DataSeek,0);
        read(FILE,$ipData2,4);


You should *always* verify that system functions operated correctly.

        my $ip2num=unpack("L",$ipData2);
        if ($ip2num < $ipNum) {
                goto nd if ($Middle==$BeginNum);
                $BeginNum=$Middle;
                goto Bgn;
        }

        $/="\0";
        read(FILE,$ipFlag,1);
        if ($ipFlag eq "\1") {
                my $ipSeek;
                read(FILE,$ipSeek,3);
                $ipSeek = unpack("L",$ipSeek."\0");
                seek(FILE,$ipSeek,0);
                read(FILE,$ipFlag,1);
        }
        if ($ipFlag eq "\2") {
                my $AddrSeek;
                read(FILE,$AddrSeek,3);
                read(FILE,$ipFlag,1);
                if($ipFlag eq "\2") {
                        my $AddrSeek2;
                        read(FILE,$AddrSeek2,3);
                        $AddrSeek2 = unpack("L",$AddrSeek2."\0");
                        seek(FILE,$AddrSeek2,0);
                }
                else {
                        seek(FILE,-1,1);
                }
                $ipAddr2=<FILE>;
                $AddrSeek = unpack("L",$AddrSeek."\0");
                seek(FILE,$AddrSeek,0);
                $ipAddr1=<FILE>;
        }
        else {
                seek(FILE,-1,1);
                $ipAddr1=<FILE>;
                read(FILE,$ipFlag,1);
                if($ipFlag eq "\2") {
                        my $AddrSeek2;
                        read(FILE,$AddrSeek2,3);
                        $AddrSeek2 = unpack("L",$AddrSeek2."\0");
                        seek(FILE,$AddrSeek2,0);
                }
                else {
                        seek(FILE,-1,1);
                }
                $ipAddr2=<FILE>;
        }

        nd:
        chomp($ipAddr1,$ipAddr2);
        $/="\n";

Why are you modifying a global variable inside a subroutine? You shouldlocalize the changes to global variables.

        close(FILE);

$ipAddr2="" if($ipAddr2=~/http/i);

        my $ipaddr="$ipAddr1 $ipAddr2";
        $ipaddr =~ s/CZ88\.NET//isg;

The /s option effects whether . matches a newline or not but you are notusing . in your pattern.

        $ipaddr="æœªçŸ¥åœ°åŒº" if ($ipaddr=~/æœªçŸ¥|http/i || $ipaddr eq "");
        return $ipaddr;
}

sub osinfo {
   local $os="",$Agent;

Why are you using local? What effect do you think local() will have onthe $os variable? On the $Agent variable?

   $Agent = $ENV{'HTTP_USER_AGENT'};
   if (($Agent =~ /win/i)&&($Agent =~ /95/i)) {

What effect do you thing the /i option will have on the '9' or the '5'characters?

      $os="Windows 95";
   }
   elsif (($Agent =~ /win 9x/i)&&($Agent =~ /4.90/i)) {

What effect do you thing the /i option will have on the '4', '9' or the'0' characters? Did you want to match "any character" between the '4'and the '9' characters?

      $os="Windows ME";
   }
   elsif (($Agent =~ /win/i)&&($Agent =~ /98/i)) {
      $os="Windows 98";
   }
   elsif (($Agent =~ /win/i)&&($Agent =~ /nt 5\.0/i)) {
      $os="Windows 2000";
   }
   elsif (($Agent =~ /win/i)&&($Agent =~ /nt 5\.1/i)) {
      $os="Windows XP";
   }
   elsif (($Agent =~ /win/i)&&($Agent =~ /nt 5\.2/i)) {
      $os="Windows 2003";
   }
   elsif (($Agent =~ /win/i)&&($Agent =~ /nt/i)) {
      $os="Windows NT";
   }
   elsif (($Agent =~ /win/i)&&($Agent =~ /32/i)) {
      $os="Windows 32";
   }
   elsif ($Agent =~ /linux/i) {
      $os="Linux";
   }
   elsif ($Agent =~ /unix/i) {
      $os="Unix";
   }
   elsif (($Agent =~ /sun/i)&&($Agent =~ /os/i)) {
      $os="SunOS";
   }
   elsif (($Agent =~ /ibm/isg)&&($Agent =~ /os/isg)) {

The /s option effects whether . matches a newline or not but you are notusing . in your pattern.

      $os="IBM OS/2";
   }
   elsif (($Agent =~ /Mac/i)&&($Agent =~ /PC/i)) {
      $os="Macintosh";
   }
   elsif ($Agent =~ /FreeBSD/i) {
      $os="FreeBSD";
   }
   elsif ($Agent =~ /PowerPC/i) {
      $os="PowerPC";
   }
   elsif ($Agent =~ /AIX/i) {
      $os="AIX";
   }
   elsif ($Agent =~ /HPUX/i) {
      $os="HPUX";
   }
   elsif ($Agent =~ /NetBSD/i) {
      $os="NetBSD";
   }
   elsif ($Agent =~ /BSD/i) {
      $os="BSD";
   }
   elsif ($Agent =~ /OSF1/i) {
      $os="OSF1";
   }
   elsif ($Agent =~ /IRIX/i) {
      $os="IRIX";
   }
   elsif ($Agent =~ /google/i) {
      $os = "GoogleBot";
   }
   elsif ($Agent =~ /Yahoo/i) {
      $os = "YahooBot";
   }
  $os = "Unknown"  if ($os eq '');
  $os =~ s/[\a\f\n\e\0\r\t\)\(\*\+\?]//isg;

The /s option effects whether . matches a newline or not but you are notusing . in your pattern. The /i option only effects alphabeticcharacters but there are no alphabetic characters in your pattern. ')','(', '*', '+' and '?' are not special inside a character class so theydon't have to be escaped. Using tr/// would be more efficient:


  $os =~ tr/\a\f\n\e\0\r\t)(*+?//d;

  $os = substr($os, 0, 15) if (length($os) > 15);


Why make a copy of $os when you can just modify it in place:

  substr( $os, 15 ) = '' if length( $os ) > 15;

  return $os;
}
sub browseinfo {
       my $browser = "";
       my $browserver = "";
       my ($Agent, $Part, $browseinfo);
       $Agent = $ENV{"HTTP_USER_AGENT"};

       if ($Agent =~ /Lynx/i)
       {
               $browser = "Lynx";
       }
       elsif ($Agent =~ /MOSAIC/i)
       {
               $browser = "MOSAIC";
       }
       elsif ($Agent =~ /AOL/i)
       {
               $browser = "AOL";
       }
       elsif ($Agent =~ /Lynx/i)
       {
               $browser = "Lynx";
       }
       elsif ($Agent =~ /Opera/i)
       {
               $browser = "Opera";
       }
       elsif ($Agent =~ /JAVA/i)
       {
               $browser = "JAVA";
       }
       elsif ($Agent =~ /MacWeb/i)
       {
               $browser = "MacWeb";
       }
       elsif ($Agent =~ /WebExplorer/i)
       {
               $browser = "WebExplorer";
       }
       elsif ($Agent =~ /OmniWeb/i)
       {
               $browser = "OmniWeb";
       }
       elsif ($Agent =~ /Mozilla/i)
       {
               if ($Agent =~ "MSIE")
               {
                       if ($Agent =~ /MyIE(\d*)/)
                       {
                               $browserver = $1;
                               $browser = "MyIE";
                       }
                       else
                       {
                               $Part = (split(/\(/, $Agent))[1];
                               $Part = (split(/\;/,$Part))[1];
                               $browserver = (split(/ /,$Part))[2];

You should probably use a single regular expression instead of split()three times.

                               $browserver =~ s/([\d\.]+)/$1/isg;

Why are you are replacing a string with itself? This statement issuperfluous (as are the /i and /s options.)

                               $browser = "Internet Explorer";
                       }
               }
               elsif ($Agent =~ "Opera")
               {
                       $Part = (split(/\(/, $Agent))[1];
                       $browserver = (split(/\)/, $Part))[1];
                       $browserver = (split(/ /,$browserver))[2];
                       $browserver =~ s/([\d\.]+)/$1/isg;


Same as above.

                       $browser = "Opera";
               }
               else
               {
                       $Part = (split(/\(/, $Agent))[0];
                       $browserver = (split(/\//, $Part))[1];
                       $browserver = (split(/ /,$browserver))[0];
                       $browserver =~ s/([\d\.]+)/$1/isg;


Same as above.

                       $browser = "Netscape Navigator";
               }
       }
       elsif ($Agent =~ /google/i)
       {
               $browser = "GoogleBot";
       }
       elsif ($Agent =~ /Yahoo/i)
       {
               $browser = "YahooBot";
       }

       if ($browser ne '')
       {
               $browserver =~ s/[^0-9\.b]//isg;
               $browserver = &lbhz($browserver, 4) if (length($browserver) > 
10);

Where is the lbhz() subroutine defined? Why are you calling it with a'&' at the beginning?

               $browseinfo = "$browser $browserver";
       }
       else
       {
               $browseinfo = "Unknown";
       }
       $browseinfo =~ s/[\a\f\n\e\0\r\t\)\(\*\+\?]//isg;
  $browseinfo =~ s/[\a\f\n\e\0\r\t\)\(\*\+\?]//isg;
  $browseinfo = substr($browseinfo, 0, 28) if (length($browseinfo) > 28);


Why make a copy of $browseinfo when you can just modify it in place:

  substr( $browseinfo, 28 ) = '' if length( $browseinfo ) > 28;

       return $browseinfo;
}
1;
------------

Then  we  use  the  following  perl  script named PrintIPWhere.pl to invoke the 
qqwry.pl and
QQWry.Dat to do the trick.

-----------------
#!/usr/bin/perl
use strict;
require "qqwry.pl";
print ipwhere("${ARGV[0]}");


perldoc -q "What's wrong with always quoting "$vars"?"

print ipwhere( $ARGV[ 0 ] ), "\n";

print "\n";
-----------------

Below is a example for your information:

Suppose we have some IP addresses stored in IPfile like this:

168.105.234.109
65.24.0.0
0.0.0.0
141.0.0.0
140.255.0.0

Then  put  IPfile,  qqwry.pl,  PrintIPWhere.pl, and QQWry.Dat into the
same directory, and run the following commands will be OK:

$ for i in `cat IPfile`; do echo $i#`./PrintIPWhere.pl $i`;done
168.105.234.109#ç¾Žå›½ University of Hawaii-Hawaii Medical Network
65.24.0.0#ç¾Žå›½ Ohio State University
0.0.0.0#IANA
141.0.0.0#
140.255.0.0#æœªçŸ¥åœ°åŒº

My  issue  is that: if the IPfile is a huge one, say including several
thousands  entries  in  it, the above process will time consuming.  So
how  can I revise the above perl script with perl's multithread module
to improve the efficiency?

Thanks in advance.





John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity.               -- Damian Conway

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Want to write a script to note specific IP addresses.

Reply via email to