2009/4/29 Tim Chase <python.l...@tim.thechases.com>: >>> You omit some key details -- namely how do you know that >>> "1001" is 4 bits and not "00001001" (8-bits)? If it's a >>> string (as your current code shows), you can determine the >>> length. However, if they are actually ints, your code should work fine & >>> be O(1). >> >> Actually, what I have is a list of integer numbers >> [3,55,99,44], and by using Huffman coding or fixed length >> coding, I will know how the bits-length for each number. When >> I try to concatenate them (say 10,000 items in the list) all >> together, the speed is going do > > Ah, this makes more sense! > > I'd try creating a bitwriter object that takes fragments of bytes and writes > them sequentially to a file-like object. Something like this untested > class: > > class BitWriter: > def __init__(self, outstream): > self.out = outstream > self.datum = 0 # the data accrued > self.bits = 0 # the number of bits we've accrued > def write(self, byte_data, bit_count): > # maybe add some code to ensure > # that byte_data doesn't have any > # significant bits beyond bit_count > datum = (self.datum << bit_count) | byte_data > self.bits += bit_count > if self.bits >= 8: > overflow = self.bits - 8 > self.out.write(chr(datum >> overflow)) > #discard the stuff we wrote > datum -= (datum >> overflow) << overflow > self.bits -= 8 > def close(self): > if self.bits: # we have unflushed data > data = self.datum << (8 - self.bits) > self.out.write(chr(data)) > self.out.close() > > out = file('compressed.dat', 'wb') > bw = BitWriter(out) > for data, bit_count in source(): > out.write(data, bit_count) > out.close() > > As mentioned, it's 100% untested, I don't know what sort of performance it > has, nor the endian'ness of your data streams, so you might have to twiddle > bits in the opposite directions and test the performance.
The method looks great, I thought I got your idea: when the len(bits)>8, we output 8bits and discard them. I believe this will work efficiently, because I have implemented the algorithm by manipulating strings ('111001' character string, not bits) in the similar way, and worked quite fast.:), Thank you very much. You mentioned "twiddle bits in the opposite directions", that reminds me of another possible solutions for this problem. Say, I have bits '1101' and '110', instead of doing (1101<<3) |110, we could try the opposite way: firstly, inverse the '1101' to '1011', and '110' to '011' then, (011 << 4) | 1011, if there comes another bit '10100' inverse it again 10100 to 00101 add it to the long bits: (00101 << 7) | 0111011 so on and so forth. After all the process is done, we inverse the final long bits Because in this process you do not need to shift the whole long bits, but only shift the much shorter bits every time, so the time efficiency should be not bad. However, there is one issue I cannot not figure it out: How to inverse the bits efficiently? do you have any idea? Thank you very much:!:) Best regards, Li > > -tkc > > > > > > > > -- Li ------ Time is all we have and you may find one day you have less than you think -- http://mail.python.org/mailman/listinfo/python-list