On 2017-10-23 04:21, Steve D'Aprano <steve+pyt...@pearwood.info> wrote: > On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote: >> > If the probability of certain codes (either single codes, or sequences of > codes) are non-equal, then you can take advantage of that by encoding the > common cases into a short representation, and the uncommon and rare cases > into a longer representation. As you say: > > >> Otherwise, if ( 0, 0 ) is much more frequent, >> we can encode ( 0, 0 ) by "0" and >> >> ( 0, 1 ) by "101", >> ( 1, 0 ) by "110", and >> ( 1, 1 ) by "111". >> >> And we could then use /less/ than two bits on the >> average. > > That's incorrect. On average you use 2.5 bits. > > (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.)
I disagree. If the distribution is not equal, then the average needs to take the different probabilities into account. Let's assume that (0, 0) has a probability of 90 %, (0, 1) a probability of 10 % and (1, 0) and (1, 1) a probability of 5 % each. Then the average length is 0.9 * 1 bit + 0.1 * 3 bits + 0.05 * 3 bits + 0.05 * 3 bits = 1.5 bits. hp -- _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung: |_|_) | | Man feilt solange an seinen Text um, bis | | | h...@hjp.at | die Satzbestandteile des Satzes nicht mehr __/ | http://www.hjp.at/ | zusammenpaĆt. -- Ralph Babel -- https://mail.python.org/mailman/listinfo/python-list