Hi: Here's what I propose as a middle-ground. > > 1. We replace the Hadoop catalog example with a JDBC catalog backed by > an in-memory datastore. This allows users to get started without needing > additional infrastructure, which was one of the main benefits of the Hadoop > catalog. > > > 1. We add a new section describing the REST catalog, its benefits, and > how to set one up. We can use the REST catalog adapter [1], with the > adapter using the JDBC catalog as its internal catalog. > > +1 for this approach. As a quick start example, I love things that are easy to set up as much as possible. The REST catalog is important and deserves another section.
On Thu, Oct 17, 2024 at 2:41 AM Kevin Liu <kevin.jq....@gmail.com> wrote: > Hey folks, > > > Thanks for the discussions. > > > It seems everyone is in favor of replacing the Hadoop catalog example, and > the question now is whether to replace it with the JDBC catalog or the REST > catalog. > > > I originally proposed the JDBC catalog as a replacement primarily due to > its ease of use. Users can quickly set up a JDBC catalog backed by an > in-memory or file-based datastore without needing additional > infrastructure. It also aligns with the quick-start ethos of "it just > works." That said, I agree that an example of setting up the REST catalog > should be part of the getting-started guide since it’s the catalog the > community has aligned on. > > > Here's what I propose as a middle-ground. > > 1. We replace the Hadoop catalog example with a JDBC catalog backed by > an in-memory datastore. This allows users to get started without needing > additional infrastructure, which was one of the main benefits of the Hadoop > catalog. > 2. We add a new section describing the REST catalog, its benefits, and > how to set one up. We can use the REST catalog adapter [1], with the > adapter using the JDBC catalog as its internal catalog. > > > This approach gives users a way to quickly prototype while also guiding > them toward the REST catalog for production use cases. > > > Looking forward to hearing more from you all. > > > Best, > > Kevin Liu > > > [1] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq > > > > On Thu, Oct 10, 2024 at 3:44 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> I would prefer to advocate for the REST catalog in those examples/docs >> (similar to how the Spark quickstart example >> <https://iceberg.apache.org/spark-quickstart/> uses the REST catalog). >> The docs could then refer to the quickstart example to indicate what's >> required in terms of services to be started before a user can spawn a spark >> shell. >> >> On Thu, Oct 10, 2024 at 12:15 PM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >>> Hi >>> >>> As we are talking about "documentation" (quick start/readme), I would >>> rather propose to use the REST catalog here instead of JDBC. >>> >>> As it's the catalog we "promote", I think it would be valuable for >>> users to start with the "right thing". >>> >>> JDBC Catalog is interesting for quick test/started guide, but we know >>> how it goes: it will be heavily use (see what happened with the >>> HadoopCatalog used in production whereas it should not :) ). >>> >>> Regards >>> JB >>> >>> On Tue, Oct 8, 2024 at 12:18 PM Kevin Liu <kevin.jq....@gmail.com> >>> wrote: >>> > >>> > Hi all, >>> > >>> > I wanted to bring up a suggestion regarding our current documentation. >>> The existing examples for Iceberg often use the Hadoop catalog, as seen in: >>> > >>> > Adding a Catalog - Spark Quickstart [1] >>> > Adding Catalogs - Spark Getting Started [2] >>> > >>> > Since we generally advise against using Hadoop catalogs in production >>> environments, I believe it would be beneficial to replace these examples >>> with ones that use the JDBC catalog. The JDBC catalog, configured with a >>> local SQLite database file, offers similar convenience but aligns better >>> with production best practices. >>> > >>> > I've created an issue [3] and a PR [4] to address this. Please take a >>> look, and I'd love to hear your thoughts on whether this is a direction we >>> want to pursue. >>> > >>> > Best, >>> > Kevin Liu >>> > >>> > [1] https://iceberg.apache.org/spark-quickstart/#adding-a-catalog >>> > [2] >>> https://iceberg.apache.org/docs/nightly/spark-getting-started/#adding-catalogs >>> > [3] https://github.com/apache/iceberg/issues/11284 >>> > [4] https://github.com/apache/iceberg/pull/11285 >>> > >>> >>