hongyi.z...@gmail.com wrote:
On Thursday, October 22, 2009 at 15:08, mrdanwal...@gmail.com wrote:
2009/10/19 Hongyi Zhao <hongyi.z...@gmail.com>:
I want to write a script to note specific IP
 addresses by appending the corresponding location informations.  For
I suggest you take a look at Geo::IP [0]. I've used it quite a few
times in the past for similar tasks to what you're describing. I could
supply an example, but you'll learn more by trying yourself. If you
get stuck, post back, and we'll take a look.
Although, if your list is really custom, and you can't (or won't) use
Geo::IP instead, then there's probably still a module on the cpan to
help you out. A quick search suggests NetAddr::IP or Net::Netmask
might be good starting points.
Hope this helps. :)
[0] http://search.cpan.org/perldoc?Geo::IP
Firstly, thank you very much for your help.
Here, I'll describe the solution used by me now and some issues when
use this method:
Currently, I use a complete binary IPdatabase named QQWry.Dat which
include 373374 IP blocks oall over the world, you can download this
IPdatabase from the following url:
http://update.cz88.net/soft/qqwry.rar
Then you can use the following perl script to read the IP addresses
and the corresponding location informations to note the specific IP
addresses:
http://www.ieasy.org/download/qqwry.pl
The above qqwry.pl has the following content:
------------
sub ipwhere {
my $ipbegin,$ipend,$ipData1,$ipData2,$DataSeek,$ipFlag;
It is usually better to declare variables in the smallest possible scope
instead of all at the beginning of the program.
my $ip=shift;
my @ip=split(/\./,$ip);
my $ipNum = $ip[0]*16777216+$ip[1]*65536+$ip[2]*256+$ip[3];
You should probably use the Socket module to convert IP addresses to
numbers.
my $ipfile="./QQWry.Dat";
open(FILE,"$ipfile");
You should *always* verify that the file opened correctly.
open FILE, '<', $ipfile or die "Cannot open '$ipfile' $!";
binmode(FILE);
sysread(FILE,$ipbegin,4);
sysread(FILE,$ipend,4);
Why are you using sysread() only here but not elsewhere? You should
*always* verify that system functions operated correctly.
read FILE, my $ipbegin, 4 or die "Cannot read from '$ipfile' $!";
read FILE, my $ipend, 4 or die "Cannot read from '$ipfile' $!";
$ipbegin=unpack("L",$ipbegin);
$ipend=unpack("L",$ipend);
my $ipAllNum = ($ipend-$ipbegin)/7+1;
my $BeginNum=0;
my $EndNum=$ipAllNum;
Bgn:
my $Middle= int(($EndNum+$BeginNum)/2);
seek(FILE,$ipbegin+7*$Middle,0);
You should use the "SEEK_SET", "SEEK_CUR", and "SEEK_END" constants from
the Fcntl module instead numerical values that may or may not be correct.
read(FILE,$ipData1,4);
You should *always* verify that system functions operated correctly.
seek FILE, $ipbegin + 7 * $Middle, SEEK_SET or die "Cannot seek
on '$ipfile' $!";;
read FILE, $ipData1, 4 or die "Cannot read from '$ipfile' $!";;
my $ip1num=unpack("L",$ipData1);
if ($ip1num > $ipNum) {
Why convert $ip1num and $ipNum to numbers to compare them when you can
just compare them as strings?
$EndNum=$Middle;
goto Bgn;
Using goto is bad practice. You should probably redesign the algorithm
to avoid the use of goto.
}
read(FILE,$DataSeek,3);
You should *always* verify that system functions operated correctly.
$DataSeek=unpack("L",$DataSeek."\0");
seek(FILE,$DataSeek,0);
read(FILE,$ipData2,4);
You should *always* verify that system functions operated correctly.
my $ip2num=unpack("L",$ipData2);
if ($ip2num < $ipNum) {
goto nd if ($Middle==$BeginNum);
$BeginNum=$Middle;
goto Bgn;
}
$/="\0";
read(FILE,$ipFlag,1);
if ($ipFlag eq "\1") {
my $ipSeek;
read(FILE,$ipSeek,3);
$ipSeek = unpack("L",$ipSeek."\0");
seek(FILE,$ipSeek,0);
read(FILE,$ipFlag,1);
}
if ($ipFlag eq "\2") {
my $AddrSeek;
read(FILE,$AddrSeek,3);
read(FILE,$ipFlag,1);
if($ipFlag eq "\2") {
my $AddrSeek2;
read(FILE,$AddrSeek2,3);
$AddrSeek2 = unpack("L",$AddrSeek2."\0");
seek(FILE,$AddrSeek2,0);
}
else {
seek(FILE,-1,1);
}
$ipAddr2=<FILE>;
$AddrSeek = unpack("L",$AddrSeek."\0");
seek(FILE,$AddrSeek,0);
$ipAddr1=<FILE>;
}
else {
seek(FILE,-1,1);
$ipAddr1=<FILE>;
read(FILE,$ipFlag,1);
if($ipFlag eq "\2") {
my $AddrSeek2;
read(FILE,$AddrSeek2,3);
$AddrSeek2 = unpack("L",$AddrSeek2."\0");
seek(FILE,$AddrSeek2,0);
}
else {
seek(FILE,-1,1);
}
$ipAddr2=<FILE>;
}
nd:
chomp($ipAddr1,$ipAddr2);
$/="\n";
Why are you modifying a global variable inside a subroutine? You should
localize the changes to global variables.
close(FILE);
$ipAddr2="" if($ipAddr2=~/http/i);
my $ipaddr="$ipAddr1 $ipAddr2";
$ipaddr =~ s/CZ88\.NET//isg;
The /s option effects whether . matches a newline or not but you are not
using . in your pattern.
$ipaddr="未知地区" if ($ipaddr=~/未知|http/i || $ipaddr eq "");
return $ipaddr;
}
sub osinfo {
local $os="",$Agent;
Why are you using local? What effect do you think local() will have on
the $os variable? On the $Agent variable?
$Agent = $ENV{'HTTP_USER_AGENT'};
if (($Agent =~ /win/i)&&($Agent =~ /95/i)) {
What effect do you thing the /i option will have on the '9' or the '5'
characters?
$os="Windows 95";
}
elsif (($Agent =~ /win 9x/i)&&($Agent =~ /4.90/i)) {
What effect do you thing the /i option will have on the '4', '9' or the
'0' characters? Did you want to match "any character" between the '4'
and the '9' characters?
$os="Windows ME";
}
elsif (($Agent =~ /win/i)&&($Agent =~ /98/i)) {
$os="Windows 98";
}
elsif (($Agent =~ /win/i)&&($Agent =~ /nt 5\.0/i)) {
$os="Windows 2000";
}
elsif (($Agent =~ /win/i)&&($Agent =~ /nt 5\.1/i)) {
$os="Windows XP";
}
elsif (($Agent =~ /win/i)&&($Agent =~ /nt 5\.2/i)) {
$os="Windows 2003";
}
elsif (($Agent =~ /win/i)&&($Agent =~ /nt/i)) {
$os="Windows NT";
}
elsif (($Agent =~ /win/i)&&($Agent =~ /32/i)) {
$os="Windows 32";
}
elsif ($Agent =~ /linux/i) {
$os="Linux";
}
elsif ($Agent =~ /unix/i) {
$os="Unix";
}
elsif (($Agent =~ /sun/i)&&($Agent =~ /os/i)) {
$os="SunOS";
}
elsif (($Agent =~ /ibm/isg)&&($Agent =~ /os/isg)) {
The /s option effects whether . matches a newline or not but you are not
using . in your pattern.
$os="IBM OS/2";
}
elsif (($Agent =~ /Mac/i)&&($Agent =~ /PC/i)) {
$os="Macintosh";
}
elsif ($Agent =~ /FreeBSD/i) {
$os="FreeBSD";
}
elsif ($Agent =~ /PowerPC/i) {
$os="PowerPC";
}
elsif ($Agent =~ /AIX/i) {
$os="AIX";
}
elsif ($Agent =~ /HPUX/i) {
$os="HPUX";
}
elsif ($Agent =~ /NetBSD/i) {
$os="NetBSD";
}
elsif ($Agent =~ /BSD/i) {
$os="BSD";
}
elsif ($Agent =~ /OSF1/i) {
$os="OSF1";
}
elsif ($Agent =~ /IRIX/i) {
$os="IRIX";
}
elsif ($Agent =~ /google/i) {
$os = "GoogleBot";
}
elsif ($Agent =~ /Yahoo/i) {
$os = "YahooBot";
}
$os = "Unknown" if ($os eq '');
$os =~ s/[\a\f\n\e\0\r\t\)\(\*\+\?]//isg;
The /s option effects whether . matches a newline or not but you are not
using . in your pattern. The /i option only effects alphabetic
characters but there are no alphabetic characters in your pattern. ')',
'(', '*', '+' and '?' are not special inside a character class so they
don't have to be escaped. Using tr/// would be more efficient:
$os =~ tr/\a\f\n\e\0\r\t)(*+?//d;
$os = substr($os, 0, 15) if (length($os) > 15);
Why make a copy of $os when you can just modify it in place:
substr( $os, 15 ) = '' if length( $os ) > 15;
return $os;
}
sub browseinfo {
my $browser = "";
my $browserver = "";
my ($Agent, $Part, $browseinfo);
$Agent = $ENV{"HTTP_USER_AGENT"};
if ($Agent =~ /Lynx/i)
{
$browser = "Lynx";
}
elsif ($Agent =~ /MOSAIC/i)
{
$browser = "MOSAIC";
}
elsif ($Agent =~ /AOL/i)
{
$browser = "AOL";
}
elsif ($Agent =~ /Lynx/i)
{
$browser = "Lynx";
}
elsif ($Agent =~ /Opera/i)
{
$browser = "Opera";
}
elsif ($Agent =~ /JAVA/i)
{
$browser = "JAVA";
}
elsif ($Agent =~ /MacWeb/i)
{
$browser = "MacWeb";
}
elsif ($Agent =~ /WebExplorer/i)
{
$browser = "WebExplorer";
}
elsif ($Agent =~ /OmniWeb/i)
{
$browser = "OmniWeb";
}
elsif ($Agent =~ /Mozilla/i)
{
if ($Agent =~ "MSIE")
{
if ($Agent =~ /MyIE(\d*)/)
{
$browserver = $1;
$browser = "MyIE";
}
else
{
$Part = (split(/\(/, $Agent))[1];
$Part = (split(/\;/,$Part))[1];
$browserver = (split(/ /,$Part))[2];
You should probably use a single regular expression instead of split()
three times.
$browserver =~ s/([\d\.]+)/$1/isg;
Why are you are replacing a string with itself? This statement is
superfluous (as are the /i and /s options.)
$browser = "Internet Explorer";
}
}
elsif ($Agent =~ "Opera")
{
$Part = (split(/\(/, $Agent))[1];
$browserver = (split(/\)/, $Part))[1];
$browserver = (split(/ /,$browserver))[2];
$browserver =~ s/([\d\.]+)/$1/isg;
Same as above.
$browser = "Opera";
}
else
{
$Part = (split(/\(/, $Agent))[0];
$browserver = (split(/\//, $Part))[1];
$browserver = (split(/ /,$browserver))[0];
$browserver =~ s/([\d\.]+)/$1/isg;
Same as above.
$browser = "Netscape Navigator";
}
}
elsif ($Agent =~ /google/i)
{
$browser = "GoogleBot";
}
elsif ($Agent =~ /Yahoo/i)
{
$browser = "YahooBot";
}
if ($browser ne '')
{
$browserver =~ s/[^0-9\.b]//isg;
$browserver = &lbhz($browserver, 4) if (length($browserver) >
10);
Where is the lbhz() subroutine defined? Why are you calling it with a
'&' at the beginning?
$browseinfo = "$browser $browserver";
}
else
{
$browseinfo = "Unknown";
}
$browseinfo =~ s/[\a\f\n\e\0\r\t\)\(\*\+\?]//isg;
$browseinfo =~ s/[\a\f\n\e\0\r\t\)\(\*\+\?]//isg;
$browseinfo = substr($browseinfo, 0, 28) if (length($browseinfo) > 28);
Why make a copy of $browseinfo when you can just modify it in place:
substr( $browseinfo, 28 ) = '' if length( $browseinfo ) > 28;
return $browseinfo;
}
1;
------------
Then we use the following perl script named PrintIPWhere.pl to invoke the
qqwry.pl and
QQWry.Dat to do the trick.
-----------------
#!/usr/bin/perl
use strict;
require "qqwry.pl";
print ipwhere("${ARGV[0]}");
perldoc -q "What's wrong with always quoting "$vars"?"
print ipwhere( $ARGV[ 0 ] ), "\n";
print "\n";
-----------------
Below is a example for your information:
Suppose we have some IP addresses stored in IPfile like this:
168.105.234.109
65.24.0.0
0.0.0.0
141.0.0.0
140.255.0.0
Then put IPfile, qqwry.pl, PrintIPWhere.pl, and QQWry.Dat into the
same directory, and run the following commands will be OK:
$ for i in `cat IPfile`; do echo $i#`./PrintIPWhere.pl $i`;done
168.105.234.109#美国 University of Hawaii-Hawaii Medical Network
65.24.0.0#美国 Ohio State University
0.0.0.0#IANA
141.0.0.0#
140.255.0.0#未知地区
My issue is that: if the IPfile is a huge one, say including several
thousands entries in it, the above process will time consuming. So
how can I revise the above perl script with perl's multithread module
to improve the efficiency?
Thanks in advance.
John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/