Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

Jean-Baptiste Onofré Thu, 17 Oct 2024 02:04:55 -0700

Hi Kevin

It sounds reasonable to me. I would just mention that the REST catalog
is the preferred one.


Regards
JB

On Wed, Oct 16, 2024 at 8:40 PM Kevin Liu <[email protected]> wrote:
>
> Hey folks,
>
>
> Thanks for the discussions.
>
>
> It seems everyone is in favor of replacing the Hadoop catalog example, and 
> the question now is whether to replace it with the JDBC catalog or the REST 
> catalog.
>
>
> I originally proposed the JDBC catalog as a replacement primarily due to its 
> ease of use. Users can quickly set up a JDBC catalog backed by an in-memory 
> or file-based datastore without needing additional infrastructure. It also 
> aligns with the quick-start ethos of "it just works." That said, I agree that 
> an example of setting up the REST catalog should be part of the 
> getting-started guide since it’s the catalog the community has aligned on.
>
>
> Here's what I propose as a middle-ground.
>
> We replace the Hadoop catalog example with a JDBC catalog backed by an 
> in-memory datastore. This allows users to get started without needing 
> additional infrastructure, which was one of the main benefits of the Hadoop 
> catalog.
> We add a new section describing the REST catalog, its benefits, and how to 
> set one up. We can use the REST catalog adapter [1], with the adapter using 
> the JDBC catalog as its internal catalog.
>
>
> This approach gives users a way to quickly prototype while also guiding them 
> toward the REST catalog for production use cases.
>
>
> Looking forward to hearing more from you all.
>
>
> Best,
>
> Kevin Liu
>
>
> [1] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq
>
>
>
> On Thu, Oct 10, 2024 at 3:44 AM Eduard Tudenhöfner <[email protected]> 
> wrote:
>>
>> I would prefer to advocate for the REST catalog in those examples/docs 
>> (similar to how the Spark quickstart example uses the REST catalog). The 
>> docs could then refer to the quickstart example to indicate what's required 
>> in terms of services to be started before a user can spawn a spark shell.
>>
>> On Thu, Oct 10, 2024 at 12:15 PM Jean-Baptiste Onofré <[email protected]> 
>> wrote:
>>>
>>> Hi
>>>
>>> As we are talking about "documentation" (quick start/readme), I would
>>> rather propose to use the REST catalog here instead of JDBC.
>>>
>>> As it's the catalog we "promote", I think it would be valuable for
>>> users to start with the "right thing".
>>>
>>> JDBC Catalog is interesting for quick test/started guide, but we know
>>> how it goes: it will be heavily use (see what happened with the
>>> HadoopCatalog used in production whereas it should not :) ).
>>>
>>> Regards
>>> JB
>>>
>>> On Tue, Oct 8, 2024 at 12:18 PM Kevin Liu <[email protected]> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I wanted to bring up a suggestion regarding our current documentation. 
>>> > The existing examples for Iceberg often use the Hadoop catalog, as seen 
>>> > in:
>>> >
>>> > Adding a Catalog - Spark Quickstart [1]
>>> > Adding Catalogs - Spark Getting Started [2]
>>> >
>>> > Since we generally advise against using Hadoop catalogs in production 
>>> > environments, I believe it would be beneficial to replace these examples 
>>> > with ones that use the JDBC catalog. The JDBC catalog, configured with a 
>>> > local SQLite database file, offers similar convenience but aligns better 
>>> > with production best practices.
>>> >
>>> > I've created an issue [3] and a PR [4] to address this. Please take a 
>>> > look, and I'd love to hear your thoughts on whether this is a direction 
>>> > we want to pursue.
>>> >
>>> > Best,
>>> > Kevin Liu
>>> >
>>> > [1] https://iceberg.apache.org/spark-quickstart/#adding-a-catalog
>>> > [2] 
>>> > https://iceberg.apache.org/docs/nightly/spark-getting-started/#adding-catalogs
>>> > [3] https://github.com/apache/iceberg/issues/11284
>>> > [4] https://github.com/apache/iceberg/pull/11285
>>> >

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

Reply via email to