I support the idea of updating the docs to replace the Hadoop catalog
example, but I'm wondering why not use a REST Catalog example instead?  I
saw Ajantha proposed adding Docker images for a REST Catalog adapter [1] so
we could potentially use this with a JDBC Catalog backed by SQLite file as
a convenient quickstart example which shows a REST Catalog configuration.
I'm thinking the REST Catalog would be preferred to the JDBC catalog as a
best practice, since it's technology agnostic (on the server side) and the
protocol allows for more advanced functionality (ie. multi table commits,
credentials vending, etc).

[1] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq

On Tue, Oct 8, 2024 at 1:18 PM Kevin Liu <kevin.jq....@gmail.com> wrote:

> Hi all,
>
> I wanted to bring up a suggestion regarding our current documentation. The
> existing examples for Iceberg often use the Hadoop catalog, as seen in:
>
>    - Adding a Catalog - Spark Quickstart [1]
>    - Adding Catalogs - Spark Getting Started [2]
>
> Since we generally advise against using Hadoop catalogs in production
> environments, I believe it would be beneficial to replace these examples
> with ones that use the JDBC catalog. The JDBC catalog, configured with a
> local SQLite database file, offers similar convenience but aligns better
> with production best practices.
>
> I've created an issue [3] and a PR [4] to address this. Please take a
> look, and I'd love to hear your thoughts on whether this is a direction we
> want to pursue.
>
> Best,
> Kevin Liu
>
> [1] https://iceberg.apache.org/spark-quickstart/#adding-a-catalog
> [2]
> https://iceberg.apache.org/docs/nightly/spark-getting-started/#adding-catalogs
> [3] https://github.com/apache/iceberg/issues/11284
> [4] https://github.com/apache/iceberg/pull/11285
>
>

Reply via email to