On Nov 30, 2012, at 11:09 AM, William Herrin <b...@herrin.us> wrote:
> On Fri, Nov 30, 2012 at 9:45 AM, Ray Soucy <r...@maine.edu> wrote: >> I'll see your disagree and raise you another ;-) >> >> I would say you almost never want to store addresses as character data >> unless the only thing you're using them for is logging (even then it's >> questionable). I run into people who do this all the time and it's a >> nightmare. >> >> It's easy to store a v6 address as a string, but when you want to select a >> range of IPv6 addresses from a database, not having them represented as >> integers means you can't do efficient numerical comparisons in your SQL >> statements, it also makes indexing your table slower; to put it simply, it >> doesn't scale well. > > Hi Ray, > > If you've stored them in the string format I suggested, the string > comparison *is* an efficient numerical comparison. On a CISC processor > it may even be implemented with a single instruction byte string > comparison. Go test. You may be surprised at the results. > > The one useful function you can't do directly from a string format is > apply an AND mask (netmask). More often than not this is irrelevant: > you don't want to load the data and then apply the mask, you want the > mask to constrain the data which you load from the database. You'd > need the database software to understand the address type and index it > with a radix tree, something it can do with neither a string format > nor your split 64-bit format. > Since non-contiguous masking is rare, this can, actually be pretty efficient for contiguous masking because you have a ¼ chance that the mask aligns with a character (the more I think about this, the more I think storing the address as a 32-character string without colons makes the most sense). If it's not aligned on a nibble boundary, then you can either do ranged comparisons as suggested below, or, you can do a two-step process like this: Let's say we want to look for addresses within 2001:db8::/29. This would mean we need to match all strings starting with 2001:0db8 through 2001:0dbf. We could easily grab everything that begins with '20010db%' and then select the masked values matching from the 8th column where (atoi(concat("0x",substr(addr,8,1))) & 0x8). Forgive me if I don't get the SQL syntax exactly right or have a wrong function name… I do more C than SQL. Both of these comparisons could be performed in a single select like: SELECT * FROM <table> WHERE ip6addr is like '20010db%' and \ (atoi(concat('0x', substr(ip6addr,8,1))) & 0x8) This should be relatively efficient because the more expensive second test will only be performed on records that first pass the relatively cheap match of the first 7 characters. Owen >