Hi everyone, As we are adding Aliyun as a new vendor integration in the upcoming release, we are discussing the strategy we should take to integrate the iceberg-aliyun package with all the engine runtimes.
For some background, we had some discussions about this topic when releasing Nessie and AWS modules in https://github.com/apache/iceberg/issues/1887. In summary: 1. The iceberg-<vendor> package is always added to the engine runtimes to avoid the need for users to load them manually. 1. Use 1MB as a threshold. If the total size of the vendor's dependencies is less than 1MB, just include it in engine runtime. Otherwise the vendor dependencies are marked as provided and not bundled in the runtime jar. However, Aliyun is proposing a different approach, which: 1. Does not include the vendor package in engine runtime 2. Have an additional iceberg-<vendor>-runtime package that bundles all the vendor dependencies, so user just need to specify 1 additional jar to use the vendor AWS did not choose the approach proposed by Aliyun because AWS users usually maintain their own version of AWS SDK and would like to upgrade them independent of the AWS SDK version used by Iceberg. Although currently it takes more effort for users to specify all the compile-only dependencies, compute vendor services like AWS EMR are going to offer all the jars directly in the classpath to avoid such need in the very near future, and EMR will maintain their AWS SDK version upgrade independently. But the approach proposed by Aliyun seems to fit the use case of Aliyun users better. For more context, please read https://github.com/apache/iceberg/pull/3270 for the discussion between me and Openinx and https://github.com/apache/iceberg/pull/3684 for the approach proposed. I think we should consolidate the vendor integration strategy going forward. It could be we support both approaches, or just choose one approach going forward. It would be great if people with similar experience or need could provide some insights. Best, Jack Ye