On 2014-09-25 Henrique de Moraes Holschuh wrote: > On Thu, 25 Sep 2014, Riku Voipio wrote: > > On Wed, Sep 24, 2014 at 03:18:02PM -0300, Henrique de Moraes > > Holschuh wrote: > > > OTOH, using -z9 on datasets smaller than the -z8 dictionary size > > > *is* a waste of memory (I don't know about cpu time, and xz(1) > > > doesn't say anything on that matter). The same goes for -z8 and > > > datasets smaller than -z7 dictionary size, and so on. > > > > > It is rather annoying that xz is not smart enough to detect this > > > and downgrade the -z level when it is given a seekable file > > > (which it can stat() and know the full size of beforehand). > > > > This wouldn't seem too hard to implement in xz - have you asked > > upstream about it? > > No, I haven't. Feel free to do it!
This is a known issue. It's not too hard to fix if it is OK that the *same* xz binary creates different output with the same compression options depending on whether the input size is known or unknown. Most of the time it doesn't matter but sometimes it can be at least mildly annoying. If the input size is unknown but the output is seekable, then one could even go back and rewrite the header after compression. The problem of different output from the same xz version remains though. Maybe there could be an option to enable this or an option to turn it off, depending on which behavior is the default. I don't promise anything now. LZMA Utils created different output depending on if the input size was known, but it was for a different reason. XZ Utils <= 4.999.9beta created different valid output on little and big endian systems. People did complain about that so it was changed. When compressing or decompressing multiple files with the same encoder or decoder instance, using the same dictionary size for all files avoids reallocating memory in liblzma which can be good for performance with tiny files. It's not the most typical use case though. Using uselessly high compression level wastes some encoder memory and makes the decoder allocate unneeded memory. However, as long as resource limits allow the allocation to succeed, the actual decompressor memory usage won't differ and it's not a huge difference in encoding either. This is because kernels don't physically allocate large memory allocations until they actually get used, and it's done in steps, not the whole buffer at once. You can see this in "top" right after launching xz: VIRT doesn't change while RES keeps growing. Uselessly high compression level doesn't affect encoding speed with preset levels 6-9 (compressing tiny files may be an exception, but then it only matters if compressing very many files). There's no effect on the decoder speed. When compressing official Debian packages, I think one should first decide what to put to the "RAM (minimal)" column in Debian system requirements, then choose the xz compression level based on decompressor memory usage and use that level for all packages. (Maybe some big packages that won't run on a low-end system anyway could use a higher compression level if it improves compression.) For example, if 64 MiB of RAM is the minimum, then xz -8 (32 MiB dictionary) is the highest possibly acceptable level, but xz -7 (16 MiB) or even xz -6 (8 MiB) would be safer. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20140928235921.6e90b...@tukaani.org