atharvalade opened a new issue, #3104: URL: https://github.com/apache/iggy/issues/3104
## Description The S3 sink connector (`iggy_connector_s3_sink`) added in #2976 currently writes uncompressed files to S3. For production workloads with high message throughput, compression is essential to reduce storage costs and upload times. ## Proposed Changes Add a `compression` config option to the S3 sink connector supporting: - **`none`** (default, current behavior) - **`gzip`** — widely supported, good compatibility with downstream tools (Athena, Spark, etc.) - **`zstd`** — better compression ratio and speed, growing ecosystem support ### Config Example ```toml [plugin_config] compression = "gzip" # none | gzip | zstd ``` #### Implementation Notes - Compress the finalized buffer bytes before uploading to S3 (after finalize_buffer, before upload_with_retry) - Append the appropriate file extension (.jsonl.gz, .jsonl.zst, .json.gz, etc.) — the path module already derives extensions from OutputFormat, this needs to account for compression - Set the correct Content-Encoding header on the S3 put_object call - Use flate2 for gzip and zstd crate for zstd — check if they're already in the workspace, otherwise add them - Add unit tests for round-trip compress/decompress verification - Update the connector README and config.toml example -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
