Does anyone have some good tricks for mapping IP addresses to ASN numbers in 
very large volumes?

This is probably more a "how would you approach this problem?" than "can you 
help me tweak this query" 

I have a very large number of IP addresses (many millions) that are found in 
some security related logs.  I need a fast way to categorize these back to 
their ASN numbers and Country codes.  I have a table that was generated from 
BGP routing files to create a list of netblocks and their corresponding ASNs.  
For small groups of IP addresses (hundreds or thousands) it has always been 
fast enough to easily just do something like:

select count(*), asn_number from logaddr a, ipv4 b where a.ip <<= b.net_block 

where logaddr has information about IP addresses encountered and
where "ipv4" has a list of mappings of which netblocks belong to which ASNs 
(yeah, it is 525,000+ Netblocks ...)

I'm trying to figure out if there is a better way to use the "cidr" netblock to 
speed up the look up and matching.

I've tried playing with going from ASN to the matching IP addresses, or the 
other way around ... for example, here is the Explain  for looking at all the 
IP addresses in the logaddr (currently about 9 million records) that use ASN = 
2119:


explain select count(distinct ip), country_code, asn_number from ipv4 a, 
logaddr b 
where b.ip <<= a.net_block and a.asn_number = 2119 
group by country_code, asn_number;


"GroupAggregate  (cost=36851922.32..38215407.51 rows=4 width=18)"
"  ->  Sort  (cost=36851922.32..37192793.61 rows=136348515 width=18)"
"        Sort Key: a.country_code, a.asn_number"
"        ->  Nested Loop  (cost=0.00..4448316.05 rows=136348515 width=18)"
"              Join Filter: ((b.ip)::inet <<= (a.net_block)::inet)"
"              ->  Seq Scan on logaddr b  (cost=0.00..347394.01 rows=9089901 
width=7)"
"              ->  Materialize  (cost=0.00..10466.66 rows=30 width=18)"
"                    ->  Seq Scan on ipv4 a  (cost=0.00..10466.51 rows=30 
width=18)"
"                          Filter: (asn_number = 2119)"


Just the first netblock from that ASN (2.148.0.0/14) occurs around 7,300 times, 
and does ok if I was just looking for that ... but I need a way to do "big 
aggregate queries" like "what were the most common ASNs in this whole log 
file?"  or "how often does country XX occur in this log file?"



----------------------------------------------------------

Gary Warner
Director of Research in Computer Forensics
The University of Alabama at Birmingham
Center for Information Assurance and Joint Forensics Research
205.422.2113
g...@cis.uab.edu

-----------------------------------------------------------



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to