You omit some key details -- namely how do you know that
"1001" is 4 bits and not "00001001" (8-bits)?  If it's a
string (as your current code shows), you can determine the
length. However, if they are actually ints, your code should work fine & be O(1).

Actually, what I have is a list of integer numbers
[3,55,99,44], and by using Huffman coding or fixed length
coding, I will know how the bits-length for each number. When
I try to concatenate them (say 10,000 items in the list) all
together, the speed is going do

Ah, this makes more sense!

I'd try creating a bitwriter object that takes fragments of bytes and writes them sequentially to a file-like object. Something like this untested class:

  class BitWriter:
    def __init__(self, outstream):
      self.out = outstream
      self.datum = 0  # the data accrued
      self.bits = 0   # the number of bits we've accrued
    def write(self, byte_data, bit_count):
      # maybe add some code to ensure
      # that byte_data doesn't have any
      # significant bits beyond bit_count
      datum = (self.datum << bit_count) | byte_data
      self.bits += bit_count
      if self.bits >= 8:
        overflow = self.bits - 8
        self.out.write(chr(datum >> overflow))
        #discard the stuff we wrote
        datum -= (datum >> overflow) << overflow
        self.bits -= 8
    def close(self):
      if self.bits: # we have unflushed data
        data = self.datum << (8 - self.bits)
        self.out.write(chr(data))
      self.out.close()

  out = file('compressed.dat', 'wb')
  bw = BitWriter(out)
  for data, bit_count in source():
    out.write(data, bit_count)
  out.close()

As mentioned, it's 100% untested, I don't know what sort of performance it has, nor the endian'ness of your data streams, so you might have to twiddle bits in the opposite directions and test the performance.

-tkc







--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to