Hi everyone,

As we are adding Aliyun as a new vendor integration in the upcoming
release, we are discussing the strategy we should take to integrate the
iceberg-aliyun package with all the engine runtimes.

For some background, we had some discussions about this topic when
releasing Nessie and AWS modules in
https://github.com/apache/iceberg/issues/1887. In summary:

1. The iceberg-<vendor> package is always added to the engine runtimes to
avoid the need for users to load them manually.
1. Use 1MB as a threshold. If the total size of the vendor's dependencies
is less than 1MB, just include it in engine runtime. Otherwise the vendor
dependencies are marked as provided and not bundled in the runtime jar.

However, Aliyun is proposing a different approach, which:
1. Does not include the vendor package in engine runtime
2. Have an additional iceberg-<vendor>-runtime package that bundles all the
vendor dependencies, so user just need to specify 1 additional jar to use
the vendor

AWS did not choose the approach proposed by Aliyun because AWS users
usually maintain their own version of AWS SDK and would like to upgrade
them independent of the AWS SDK version used by Iceberg. Although currently
it takes more effort for users to specify all the compile-only
dependencies, compute vendor services like AWS EMR are going to offer all
the jars directly in the classpath to avoid such need in the very near
future, and EMR will maintain their AWS SDK version upgrade independently.

But the approach proposed by Aliyun seems to fit the use case of Aliyun
users better. For more context, please read
https://github.com/apache/iceberg/pull/3270 for the discussion between me
and Openinx and https://github.com/apache/iceberg/pull/3684 for the
approach proposed.

I think we should consolidate the vendor integration strategy going
forward. It could be we support both approaches, or just choose one
approach going forward. It would be great if people with similar experience
or need could provide some insights.

Best,
Jack Ye

Reply via email to