On Wed, Apr 20, 2016 at 01:55:42PM -0700, Kees Cook wrote: ... > diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S > index 6236b9ec4b76..6b8f8728c1fa 100644 > --- a/arch/x86/boot/header.S > +++ b/arch/x86/boot/header.S > @@ -440,6 +440,93 @@ setup_data: .quad 0 # > 64-bit physical pointer to > > pref_address: .quad LOAD_PHYSICAL_ADDR # preferred > load addr > > +# > +# Getting to provably safe in-place decompression is hard. Worst case > +# behaviours need be analyzed. Here let's take decompressing gzip-compressed > +# kernel as example to illustrate it: > +# > +# The file layout of gzip compressed kernel is as follows. For more > +# information, please refer to RFC 1951 and RFC 1952. > +# > +# magic[2] > +# method[1] > +# flags[1] > +# timestamp[4] > +# extraflags[1] > +# os[1] > +# compressed data blocks[N] > +# crc[4] orig_len[4] > +# > +# resulting in 18 bytes of non compressed data overhead.
What's "non compressed data overhead"? Does that want to say: "... resulting in 18 bytes overhead of uncompressed data." perhaps? > +# > +# Files divided into blocks > +# 1 bit (last block flag) > +# 2 bits (block type) > +# > +# 1 block occurs every 32K -1 bytes or when there 50% compression > +# has been achieved. The smallest block type encoding is always used. > +# > +# stored: > +# 32 bits length in bytes. > +# > +# fixed: > +# magic fixed tree. > +# symbols. > +# > +# dynamic: > +# dynamic tree encoding. > +# symbols. > +# > +# > +# The buffer for decompression in place is the length of the uncompressed > +# data, plus a small amount extra to keep the algorithm safe. The > +# compressed data is placed at the end of the buffer. The output pointer > +# is placed at the start of the buffer and the input pointer is placed > +# where the compressed data starts. Problems will occur when the output > +# pointer overruns the input pointer. > +# > +# The output pointer can only overrun the input pointer if the input > +# pointer is moving faster than the output pointer. A condition only > +# triggered by data whose compressed form is larger than the uncompressed > +# form. > +# > +# The worst case at the block level is a growth of the compressed data > +# of 5 bytes per 32767 bytes. > +# > +# The worst case internal to a compressed block is very hard to figure. > +# The worst case can at least be bounded by having one bit that represents > +# 32764 bytes and then all of the rest of the bytes representing the very > +# very last byte. > +# > +# All of which is enough to compute an amount of extra data that is required > +# to be safe. To avoid problems at the block level allocating 5 extra bytes > +# per 32767 bytes of data is sufficient. To avoid problems internal to a > +# block adding an extra 32767 bytes (the worst case uncompressed block size) > +# is sufficient, to ensure that in the worst case the decompressed data for > +# block will stop the byte before the compressed data for a block begins. > +# To avoid problems with the compressed data's meta information an extra 18 > +# bytes are needed. Leading to the formula: > +# > +# extra_bytes = (uncompressed_size >> 12) + 32768 + 18 + decompressor_size > +# > +# Adding 8 bytes per 32K is a bit excessive but much easier to calculate. > +# Adding 32768 instead of 32767 just makes for round numbers. > +# Adding the decompressor_size is necessary as it musht live after all > +# of the data as well. Last I measured the decompressor is about 14K. > +# 10K of actual data and 4K of bss. I guess reflow the paragraphs while at it, as well? "Adding 8 bytes per 32K is a bit excessive but much easier to calculate. Adding 32768 instead of 32767 just makes for round numbers. Adding the decompressor_size is necessary as it musht live after all of the data as well. Last I measured the decompressor is about 14K. 10K of actual data and 4K of bss." and so on... -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --