Hi Qinhua,

+1 for what Ryan says, it would be great to have a PR to analyze the
features you list in detail. I have the following questions and comments:

> Namespace management and configuration
This is something we decided to implement in a second iteration when people
have a need, so if you have a desire to add namespace feature it would be
great to directly add it to the existing JdbcCatalog, I can review in more
details when you have the PR out.

> Each namespace can be backed by a different S3 bucket. This allows fine
grained access control at the namespace level.
This is common feature across all catalog implementations that supports
namespace. Typically a LocationUri is stored as namespace property, which
can be used to override default table location in that namespace.

> At namespace creation time, users can choose either 1) use a pre-existing
bucket; 2) let the Catalog create a new bucket.
I think we need to make this more generic, likely with the mechanism I
described that is used for all other catalog implementations. Bucket is a
resource with a maximum cap, so it would not scale well for multi-tenant
use case for users with an unbounded number of logical namespaces.

> Isolate logical TableIdentifiers from physical S3 locations.
> Support rename table within the same namespace without touching S3.
Iceberg table has UUID, I think these are achievable directly based on
Iceberg catalog and storage design. We should also be bale to rename table
across namespace. Could you describe in more details for what feature is
added here?

> Support various kinds of databases
Is there anything that the current implementation not support? I don't
think it uses any SQL dialect that is incompatible across databases, but
maybe I overlooked something here.

> Use Jooq <https://www.jooq.org/>to connect to the database and to ensure
SQL semantics.
I think we should evaluate a little bit about the difference of this
library and the current implementation. I checked that the license of the
library is fine, could you describe a bit why this is a better choice than
the existing implementation?
We typically do not want to have too many third party dependencies. Given
JdbcCatalog is in the core library, we should be very careful when adding
new dependencies.

> Provide database initialization scripts for Postgres.
I think infrastructure setup like database initialization is out of scope
of Iceberg library. We can add this as a part of the JDBC documentation.

Best,
Jack Ye






On Fri, Aug 27, 2021 at 12:54 PM Ryan Blue <b...@tabular.io> wrote:

> Qinhua, thanks for sharing this. It sounds great to add more features to
> the JDBC catalog.
>
> Could you share a link to the implementation or a PR? I have lots more
> questions like how you implemented namespaces, but those can probably be
> answered by looking at the code if you're able to share it.
>
> Thanks!
>
> Ryan
>
> On Fri, Aug 27, 2021 at 11:48 AM Qinhua Yan <qinhua....@twosigma.com>
> wrote:
>
>> Hi there,
>>
>>
>>
>> We’d like to share our JdbcCatalog impl with the community and welcome
>> any discussion.
>>
>> We are aware of the existing JdbcCatalog impl
>> <https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java>,
>> however, it has some feature gaps and doesn’t work for our use case.
>> Therefore, we implemented a SQL-database backed Catalog with the following
>> enhancements.
>>
>> 1.       Namespace management and configuration
>>
>> ·       Each namespace can be backed by a different S3 bucket. This
>> allows fine grained access control at the namespace level.
>>
>> ·       At namespace creation time, users can choose either 1) use a
>> pre-existing bucket; 2) let the Catalog create a new bucket.
>>
>> ·       Isolate logical TableIdentifiers from physical S3 locations.
>>
>> ·       Support rename table within the same namespace without touching
>> S3.
>>
>> 2.       Support various kinds of databases
>>
>> ·       Use Jooq <https://www.jooq.org/>to connect to the database and
>> to ensure SQL semantics.
>>
>> ·       Easy to support different kinds of SQL without touching the core
>> Catalog code.
>>
>> ·       Provide database initialization scripts for Postgres.
>>
>>
>>
>> This Catalog implementation can be easily extended to support some
>> advanced features such as undelete tables and
>> namespace-backed-by-multiple-backends.
>>
>>
>>
>> Any comments and discussions are welcomed!
>>
>>
>>
>> Thank you!
>>
>> Qinhua Yan
>>
>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to