Re: [Qemu-devel] DMG chunk size independence

Ashijeet Acharya Tue, 25 Apr 2017 03:57:46 -0700

Hi,

cc'ing Peter Wu in...


Currently I have completed the task for zlib, uncompressed and zeroed
chunks in a DMG file using the approach we discussed earlier.
Unfortunately, this approach is not appropriate for bz2 chunks since
we cannot restart our decompression from the access point we cached
since bz2 decompression checks for a special magic key 'BZh' before it
starts decompressing. Since our cached point can be pointing to any
random location inside the compressed stream and not necessarily the
start of a "block", dmg_uncompress_bz2_do() fails with an error value
BZ_DATA_ERROR_MAGIC (-5) and thus our approach fails.
This blog post here explains this limitation too ->
https://blastedbio.blogspot.in/2011/11/random-access-to-bzip2.html

Now, there is an interesting thing I found out about bz2 compressed
streams i.e. the size of a compressed block varies from 0 to a max of
900 KiB. This is guaranteed and can be verified because each block has
a 4 byte header attached to it at the beginning in which the first
three bytes are the magic key "BZh" followed by a number from 1-9.
These help us find the max size that block will have as the size
increments by 100KiB for each value (eg. BZh3 has a max of 300KiB).

Now the wikipedia page here
(https://en.wikipedia.org/wiki/Bzip2#File_format) states that a 900KiB
block can expand to a max of 46MiB in its uncompressed form. Thus we
need not worry about QEMU allocating wild sized memory at once as we
have a limit of 64MiB as of now and stick to the approach of
decompressing the whole block every time we enter it. This solves our
problem of caching an access point and ultimately failing with this
error value BZ_DATA_ERROR_MAGIC (-5).

I am hesitant in this approach because I am not sure yet that "blocks"
and "chunks" mean the same thing and are just two different
terminologies (i.e. chunks == blocks) OR chunks are made up of blocks
(i.e chunks = x * blocks).

I approached Peter Wu (who worked on DMG a few years ago) about this
and he's not sure either.

(Peter, you may skip this part as I already explained you this earlier :-) )
I did a little naive test of my own, where I downloaded one of the bz2
DMG images and tried reading it with a HEX editor.

First, I manually calculated the size between the appearance of two
sequential magic keys ('BZh') offsets which marked the length of a
block starting at the offset of first magic key. Next I compared it to
the size of the corresponding chunk whose size (s->lenghts[chunk]) we
get by reading the mish blocks and all that stuff while opening the
image in QEMU, and interestingly both the sizes appeared to be equal.
I repeated it for quite a few chunks and this test stayed valid for
all.

Peter thinks we cannot rely on this test thus I wouldn't mind more
views on it...

Ashijeet

Re: [Qemu-devel] DMG chunk size independence

Reply via email to