Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

Renjie Liu Wed, 16 Oct 2024 18:31:52 -0700

Hi:

Here's what I propose as a middle-ground.
>
>    1. We replace the Hadoop catalog example with a JDBC catalog backed by
>    an in-memory datastore. This allows users to get started without needing
>    additional infrastructure, which was one of the main benefits of the Hadoop
>    catalog.
>
>
>    1. We add a new section describing the REST catalog, its benefits, and
>    how to set one up. We can use the REST catalog adapter [1], with the
>    adapter using the JDBC catalog as its internal catalog.
>
> +1 for this approach. As a quick start example, I love things that are
easy to set up as much as possible. The REST catalog is important and
deserves another section.


On Thu, Oct 17, 2024 at 2:41 AM Kevin Liu <kevin.jq....@gmail.com> wrote:

> Hey folks,
>
>
> Thanks for the discussions.
>
>
> It seems everyone is in favor of replacing the Hadoop catalog example, and
> the question now is whether to replace it with the JDBC catalog or the REST
> catalog.
>
>
> I originally proposed the JDBC catalog as a replacement primarily due to
> its ease of use. Users can quickly set up a JDBC catalog backed by an
> in-memory or file-based datastore without needing additional
> infrastructure. It also aligns with the quick-start ethos of "it just
> works." That said, I agree that an example of setting up the REST catalog
> should be part of the getting-started guide since it’s the catalog the
> community has aligned on.
>
>
> Here's what I propose as a middle-ground.
>
>    1. We replace the Hadoop catalog example with a JDBC catalog backed by
>    an in-memory datastore. This allows users to get started without needing
>    additional infrastructure, which was one of the main benefits of the Hadoop
>    catalog.
>    2. We add a new section describing the REST catalog, its benefits, and
>    how to set one up. We can use the REST catalog adapter [1], with the
>    adapter using the JDBC catalog as its internal catalog.
>
>
> This approach gives users a way to quickly prototype while also guiding
> them toward the REST catalog for production use cases.
>
>
> Looking forward to hearing more from you all.
>
>
> Best,
>
> Kevin Liu
>
>
> [1] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq
>
>
>
> On Thu, Oct 10, 2024 at 3:44 AM Eduard Tudenhöfner <
> etudenhoef...@apache.org> wrote:
>
>> I would prefer to advocate for the REST catalog in those examples/docs
>> (similar to how the Spark quickstart example
>> <https://iceberg.apache.org/spark-quickstart/> uses the REST catalog).
>> The docs could then refer to the quickstart example to indicate what's
>> required in terms of services to be started before a user can spawn a spark
>> shell.
>>
>> On Thu, Oct 10, 2024 at 12:15 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi
>>>
>>> As we are talking about "documentation" (quick start/readme), I would
>>> rather propose to use the REST catalog here instead of JDBC.
>>>
>>> As it's the catalog we "promote", I think it would be valuable for
>>> users to start with the "right thing".
>>>
>>> JDBC Catalog is interesting for quick test/started guide, but we know
>>> how it goes: it will be heavily use (see what happened with the
>>> HadoopCatalog used in production whereas it should not :) ).
>>>
>>> Regards
>>> JB
>>>
>>> On Tue, Oct 8, 2024 at 12:18 PM Kevin Liu <kevin.jq....@gmail.com>
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I wanted to bring up a suggestion regarding our current documentation.
>>> The existing examples for Iceberg often use the Hadoop catalog, as seen in:
>>> >
>>> > Adding a Catalog - Spark Quickstart [1]
>>> > Adding Catalogs - Spark Getting Started [2]
>>> >
>>> > Since we generally advise against using Hadoop catalogs in production
>>> environments, I believe it would be beneficial to replace these examples
>>> with ones that use the JDBC catalog. The JDBC catalog, configured with a
>>> local SQLite database file, offers similar convenience but aligns better
>>> with production best practices.
>>> >
>>> > I've created an issue [3] and a PR [4] to address this. Please take a
>>> look, and I'd love to hear your thoughts on whether this is a direction we
>>> want to pursue.
>>> >
>>> > Best,
>>> > Kevin Liu
>>> >
>>> > [1] https://iceberg.apache.org/spark-quickstart/#adding-a-catalog
>>> > [2]
>>> https://iceberg.apache.org/docs/nightly/spark-getting-started/#adding-catalogs
>>> > [3] https://github.com/apache/iceberg/issues/11284
>>> > [4] https://github.com/apache/iceberg/pull/11285
>>> >
>>>
>>

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

Reply via email to