[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

Artem Belevich via cfe-commits Mon, 04 Mar 2024 11:28:58 -0800

================
@@ -906,6 +906,16 @@ CreateFileHandler(MemoryBuffer &FirstInput,
 }
 
 OffloadBundlerConfig::OffloadBundlerConfig() {
+  if (llvm::compression::zstd::isAvailable()) {
+    CompressionFormat = llvm::compression::Format::Zstd;
+    // Use a high zstd compress level by default for better size reduction.
----------------
Artem-B wrote:


I'd add more details here. While higher compression levels usually do improve 
compression ratio, in typical use case it's an incremental improvement. Here, 
we do it to achieve dramatic increase in compression ratio by exploiting the 
fact that we carry multiple sets of very similar large bitcode blobs, and that 
we need compression level high enough to fit one complete blob into compression 
window. At least that's the theory. 

Should we print a warning (or just document it?) when compression level ends up 
being below of what we'd expect? Considering that good compression starts at 
zstd-20, I suspect that compression level will go back to ~2.5x if the binary 
size for one GPU doubles in size and no longer fits. On top of that compression 
time will also increase, a lot. That will be a rather unpleasant surprise for 
whoever runs into it.

ZSTD's current compression parameters are set this way:
https://github.com/facebook/zstd/blob/dev/lib/compress/clevels.h#L47

```
{ 23, 24, 22,  7,  3,256, ZSTD_btultra2},  /* level 19 */
{ 25, 25, 23,  7,  3,256, ZSTD_btultra2,  /* level 20 */
```
First three numbers are log2 of (largest match distance, fully searched 
segment, dispatch table).

2^25 = 32MB which happens to be about the size of the single GPU binary in your 
example. I'm pretty sure this explains why `zstd-20` works so well on it, while 
zstd-19 does not. It will work well for the smaller binaries, but I'm pretty 
sure it will regress for a slightly larger binary.

I think it may be worth experimenting with fine-tuning compression settings and 
instead of blindly setting `zstd-20`, consider the size of the binary we need 
to deal with, and adjust only windowLog/chainLog appropriately.

Or we could set the default to lower compression level + large windowLog. This 
should still give us most of the compression benefits for the binaries that 
would fit into the window, but would avoid the performance cliff if the binary 
is too large.

I may be overcomplicating it too much, too. If someone does run into the 
problem, they now have a way to work around it by tweaking the compression 
level.


https://github.com/llvm/llvm-project/pull/83605
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [HIP] add --offload-compression-level= option (PR #83605)

Reply via email to