I support the idea of updating the docs to replace the Hadoop catalog example, but I'm wondering why not use a REST Catalog example instead? I saw Ajantha proposed adding Docker images for a REST Catalog adapter [1] so we could potentially use this with a JDBC Catalog backed by SQLite file as a convenient quickstart example which shows a REST Catalog configuration. I'm thinking the REST Catalog would be preferred to the JDBC catalog as a best practice, since it's technology agnostic (on the server side) and the protocol allows for more advanced functionality (ie. multi table commits, credentials vending, etc).
[1] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq On Tue, Oct 8, 2024 at 1:18 PM Kevin Liu <kevin.jq....@gmail.com> wrote: > Hi all, > > I wanted to bring up a suggestion regarding our current documentation. The > existing examples for Iceberg often use the Hadoop catalog, as seen in: > > - Adding a Catalog - Spark Quickstart [1] > - Adding Catalogs - Spark Getting Started [2] > > Since we generally advise against using Hadoop catalogs in production > environments, I believe it would be beneficial to replace these examples > with ones that use the JDBC catalog. The JDBC catalog, configured with a > local SQLite database file, offers similar convenience but aligns better > with production best practices. > > I've created an issue [3] and a PR [4] to address this. Please take a > look, and I'd love to hear your thoughts on whether this is a direction we > want to pursue. > > Best, > Kevin Liu > > [1] https://iceberg.apache.org/spark-quickstart/#adding-a-catalog > [2] > https://iceberg.apache.org/docs/nightly/spark-getting-started/#adding-catalogs > [3] https://github.com/apache/iceberg/issues/11284 > [4] https://github.com/apache/iceberg/pull/11285 > >