You omit some key details -- namely how do you know that
"1001" is 4 bits and not "00001001" (8-bits)? If it's a
string (as your current code shows), you can determine the
length. However, if they are actually ints, your code
should work fine & be O(1).
Actually, what I have is a list of integer numbers
[3,55,99,44], and by using Huffman coding or fixed length
coding, I will know how the bits-length for each number. When
I try to concatenate them (say 10,000 items in the list) all
together, the speed is going do
Ah, this makes more sense!
I'd try creating a bitwriter object that takes fragments of bytes
and writes them sequentially to a file-like object. Something
like this untested class:
class BitWriter:
def __init__(self, outstream):
self.out = outstream
self.datum = 0 # the data accrued
self.bits = 0 # the number of bits we've accrued
def write(self, byte_data, bit_count):
# maybe add some code to ensure
# that byte_data doesn't have any
# significant bits beyond bit_count
datum = (self.datum << bit_count) | byte_data
self.bits += bit_count
if self.bits >= 8:
overflow = self.bits - 8
self.out.write(chr(datum >> overflow))
#discard the stuff we wrote
datum -= (datum >> overflow) << overflow
self.bits -= 8
def close(self):
if self.bits: # we have unflushed data
data = self.datum << (8 - self.bits)
self.out.write(chr(data))
self.out.close()
out = file('compressed.dat', 'wb')
bw = BitWriter(out)
for data, bit_count in source():
out.write(data, bit_count)
out.close()
As mentioned, it's 100% untested, I don't know what sort of
performance it has, nor the endian'ness of your data streams, so
you might have to twiddle bits in the opposite directions and
test the performance.
-tkc
--
http://mail.python.org/mailman/listinfo/python-list