Steven D'Aprano <[EMAIL PROTECTED]> writes: > > unique_id = itertools.count(1234567890) > > Sweet! > > I really must make itertools second-nature. I always forget it.
Beware that count() output (like enumerate) is always <= sys.maxint. It raises an exception if it overflows. IMO this is a bug. > > def unique_id(): > > return os.urandom(10).encode('hex') > > Any time I see something using a random number to generate IDs, I worry > about collisions. Am I being paranoid? (But even paranoids write code > with bugs...) Well, the idea is to make the string long enough (I shouldn't have picked 10 bytes) to make the probability of collisions acceptably low. The probability is about exp(-k**2/(2*N)) where k is the number of id's you use and N is the number of possible ID's. So with os.urandom(20), if you use 1 billion (= approx 2**30) id's, the probability is about exp(-(2**60 / 2*2**160)) = 1/exp(2**101) which is extremely small. Using random strings is probably safer from collisions than maintain keep distinct initial counters across multiple runs of a program, on multiple computers, etc. The classic example is ethernet cards. Each one is supposed to have a unique hardware 48-bit MAC address, with routers etc. relying on the uniqueness. Some organization is supposed to assign a unique code to each manufacturer, and then the manufacturer uses that code as a prefix and assigns addresses in sequence within that space. That works fine until sales go up, manufacturers start opening multiple factories operating out of the same space, low-rent manufacturers start making up their own codes, etc. So companies that buy large numbers of cards often find multiple cards with the same address, causing various hassles. If they just assigned 128-bit MAC addresses at random it's very unlikely that there would ever be a collision. > Here's something which is a little less predictable than a straight > counter: It's still very easy to generate valid id's from that, or guess the next one with non-negligible probability. IMO it's almost never worth messing with a "little less predictable". If you're trying to prevent an actual opponent from guessing something, use full-strength cryptography. You could try something like this: import sha, os, itertools radix = 2**32 # keep this == 2**(some even number <= 80) secret_key = os.urandom(20) def prp(key, n): # pseudo-random permutation (8-round Feistel network) # transform n to a unique number < radix**2 assert 0 <= n < radix**2 def F(i,x): return int(sha.new('%s,%x,%x'%(key,i,x)).hexdigest(), 16) % radix a,b = divmod(n, radix) for i in xrange(8): a ^= F(i,b) a,b = b,a return radix*a + b unique_id = (prp(secret_key, i) for i in itertools.count()) It should be pretty secure and (unless I made an error) is a permutation from [0:radix**2] to [0:radix**2], similar to DES. It's invertible if you know the secret key (I'll leave that as an exercise). 8 rounds is probably overkill for radix 2**32. -- http://mail.python.org/mailman/listinfo/python-list