Re: Iceberg catalog questions

Jack Ye Tue, 11 May 2021 12:14:23 -0700

For your subsequent questions:

2. mapping namespace name to the file path is only a convention, and can be
overridden at both namespace and table level. The table root path can be
customized to be at any location, and we actually recommend that for cloud
storage use cases to reduce throttling.

3. access control has to be done across systems. For example, in the AWS
Glue + S3 use case, the caller has to have permission to access both Glue
and S3 with the correct IAM resource permissions. The permission control
capability really depends on the platform you are operating on. It is a bit
tricky for a relational database where you have to basically manage
row-level access control, but it is technically achievable.

4. The behavior varies among catalog implementations. Technically,
CatalogUtil.dropTableData is called to clean up files when purge is enabled
in most implementations. In that case, it cleans up the metadata file, the
manifest lists, the manifests and data files. That means if purge is not
enabled, those files are still there and you can recover the table if you
can rebuild the table pointer in the catalog. But this is a manual action,
there is no Iceberg API support for it. In addition, If your storage has a
file retention feature and you can recover the file, you can recover the
dropped table version, but that is a storage-level feature but not an
Iceberg feature.

-Jack

On Tue, May 11, 2021 at 11:55 AM Jack Ye <[email protected]> wrote:

> Yes there is one, but unfortunately we lost attention after some time:
> https://github.com/apache/iceberg/pull/1870
>
> I think the PR is close to be merged with quite a few rounds of review
> already, we should add it as a milestone of 0.12.
>
> -Jack
>
> On Tue, May 11, 2021 at 11:46 AM Mayur Srivastava <
> [email protected]> wrote:
>
>> Hi,
>>
>> I’m looking to use/implement a PostgreSQL based Iceberg catalog. I’m
>> wondering if one already exists and also have a few questions. I would
>> really appreciate any help I can get with the questions.
>>
>> 1.      Does Iceberg have a catalog that is compatible with PostgreSQL
>> (or any storage backend that is compatible with PostgreSQL)?
>>
>> a.      If there are similar implementations, could someone share their
>> experience with the database schema used for the catalog? E.g. does a
>> namespace map to a database in the catalog backend?
>>
>> b.      Is there an existing abstract base class that I can use to
>> implement the catalog that talks to PostgreSQL?
>>
>> 2.      Mapping catalog namespace with S3 bucket: does someone have a
>> recommendation of managing catalog namespace along with AWS S3 (or GCS)
>> buckets? For example, when a top level namespace is created in the catalog,
>> do users map it to a bucket or a sub-directory structure on S3? (this may
>> be useful for setting the similar access control for both catalog namespace
>> and the S3 bucket.)
>>
>> 3.      Table access permission management: since metadata is stored in
>> two separate systems (table metadata in S3 and namespace/table location in
>> catalog), how are table access permissions kept in sync in these storage
>> systems? E.g. if a catalog is used with GCS, how are the
>> namespace/bucket/table access permissions kept in sync?
>>
>> 4.      Undeleting or recovering a dropped table: does the catalog
>> support undelete operation? If the underlying S3 data is not purged, can
>> the catalog be used to recover the dropped table?
>>
>>
>>
>> Thanks,
>>
>> Mayur
>>
>>
>>
>

Re: Iceberg catalog questions

Reply via email to