On Thu, Nov 19, 2015 at 9:27 PM, Paul Rubin <no.email@nospam.invalid> wrote:
> You can't improve much. A decimal digit carries log(10,2)=3.32 bits > of information. A reasonable character set for Twitter-style links > might have 80 or so characters (upper/lower alphabetic, digits, and > a dozen or so punctuation characters), or log(80,2)= > Where do I find out more about the how to calculate information per digit? Lots of nice little tricks you used below. Thanks for sharing. > Here is my shortened version: > > import string > > # alphabet here is 83 chars > alphabet = string.ascii_lowercase + \ > string.ascii_uppercase +'!"#$%&\'()*+,-./:;<=>?@[]^_`{|}~' > alphabet_size = len(alphabet) > > decoderdict = dict((b,a) for a,b in enumerate(alphabet)) > > def encoder(integer): > a,b = divmod(integer, alphabet_size) > if a == 0: return alphabet[b] > return encoder(a) + alphabet[b] > > def decoder(code): > return reduce(lambda n,d: n*alphabet_size + decoderdict[d], code, 0) > > def test(): > n = 92928729379271 > short = encoder(n) > backagain = decoder(short) > nlen = len(str(n)) > print (nlen, len(short), float(len(short))/nlen) > assert n==backagain, (n,short,b) > > test() > Vincent Davis 720-301-3003 -- https://mail.python.org/mailman/listinfo/python-list