Re: Storing catalog directly on object store

Alex Merced Wed, 27 Nov 2024 08:42:22 -0800

Ignore the last email, just re-read the proposal earlier in the email chain


On Wed, Nov 27, 2024 at 11:37 AM Alex Merced <alex.mer...@dremio.com> wrote:

> This is just a quick thought to put out there: If there will be a new
> reimagining of a file system catalog, would it be worth adding a
> multi-table layer on top?
>
> *As a rough example:*
>
> - At the TOP is a JSON file that is just a mapping of the table name to
> the directory where VERSION-HINT would be found (this is so the file is
> only updated when tables are created or dropped)
> - Then Engine finds the directory and uses the VERSION-HINT like normal to
> discover metadata and plan the scan
>
> This way, you have a listing of all your tables, so you don't have to
> re-register each table with each tool but still can avoid having to run a
> full service on top for basic application
>
> *Governance in this Type of Catalog:*
>
> - You can group different tables into different JSON files/catalogs
> - Then file access controls on the JSON file can be used as a simple way
> to control user access to groups of tables
>
>
> On Wed, Nov 27, 2024 at 8:27 AM Manu Zhang <owenzhang1...@gmail.com>
> wrote:
>
>> I think one major issue with current HadoopCatalog is that there's no way
>> to manage tables by name. If adding one metadata layer on top of it, we
>> need to handle more consistency challenges.
>>
>> Manu
>>
>> On Wed, Nov 27, 2024 at 8:03 PM Gabor Kaszab <gaborkas...@apache.org>
>> wrote:
>>
>>> Hi All,
>>>
>>> Xuanwo, I recall the reasoning against HadoopCatalog was the other way
>>> around: even though it is safe to use on HDFS, it is unsafe on object
>>> storage. I believe that this gap of functionalities of object stores seems
>>> to go away, so for me HadoopCatalog would even make more sense now than
>>> before. The name might not be straightforward as it's not just for Hadoop.
>>>
>>> Regards,
>>> Gabor
>>>
>>>
>>> On Wed, Nov 27, 2024 at 9:02 AM Xuanwo <xua...@apache.org> wrote:
>>>
>>>> Hi
>>>>
>>>> I believe we still need to deprecate HadoopCatalog since the operation
>>>> is still not safe on Hadoop. As raised by Jack Ye before, I suggest we
>>>> consider having a StorageCatalog or ObjectStorageCatalog that can only be
>>>> used with storage services supporting conditional writes. That would be a
>>>> good approach.
>>>>
>>>> On Wed, Nov 27, 2024, at 15:47, Nikhil Benesch wrote:
>>>> > Makes sense! I'd be eager to chat more about this but I'm afraid I
>>>> won't be at
>>>> > re:Invent. Maybe we plan to circle back after re:Invent, once we see
>>>> what AWS
>>>> > announces?
>>>> >
>>>> > On Tue, Nov 26, 2024 at 2:58 PM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>> >>
>>>> >> Hi Nikhil
>>>> >>
>>>> >> Thanks for your message, very interesting.
>>>> >>
>>>> >> I think it would be great to involve the Polaris project here as
>>>> well,
>>>> >> as a REST Catalog implementation.
>>>> >> The Polaris community is discussing storage/backend right now, so it
>>>> >> would be the perfect timing to consider leveraging S3 conditional
>>>> >> writes (as a plugin for instance first).
>>>> >>
>>>> >> I would be happy to connect and know more about your perspective
>>>> about that.
>>>> >>
>>>> >> Thanks,
>>>> >> Regards
>>>> >> JB
>>>> >>
>>>> >> PS: I will be at AWS re:Invent next week, so maybe we can connect
>>>> there.
>>>> >>
>>>> >> On Tue, Nov 26, 2024 at 6:35 PM Nikhil Benesch <
>>>> nikhil.bene...@gmail.com> wrote:
>>>> >> >
>>>> >> > Hi all,
>>>> >> >
>>>> >> > With Amazon S3 announcing support for the If-Match header
>>>> yesterday [0], all the
>>>> >> > major object store implementations now support a compare-and-swap
>>>> operation.
>>>> >> >
>>>> >> > As far as I can tell, this opens up the possibility of storing
>>>> Iceberg
>>>> >> > catalogs directly on object storage, without the need for a
>>>> separate metastore,
>>>> >> > and without violating any of Iceberg's ACID guarantees.
>>>> >> >
>>>> >> > It seems the immediate next step is to build an independent Java
>>>> or REST catalog
>>>> >> > backend to prove this concept out. Long term, though, the ideal
>>>> would be to
>>>> >> > have such a catalog backend be a first class citizen in the
>>>> Iceberg project.
>>>> >> >
>>>> >> > Is anyone else in the Iceberg community barking up this tree? I'm
>>>> a long term
>>>> >> > Iceberg enthusiast, but new to the community. I'd very much
>>>> appreciate any
>>>> >> > pointers to current or past discussions on the topic. So far all
>>>> I've been
>>>> >> > able to turn up is some light chatter from myself and others on
>>>> Bluesky and
>>>> >> > Hacker News ([1][2][3]).
>>>> >> >
>>>> >> > Cheers,
>>>> >> > Nikhil
>>>> >> >
>>>> >> > [0]:
>>>> https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/
>>>> >> > [1]:
>>>> https://bsky.app/profile/benesch.bsky.social/post/3lauesxg3ic2c
>>>> >> > [2]:
>>>> https://bsky.app/profile/eatonphil.bsky.social/post/3lbskq3jwk22e
>>>> >> > [3]: https://news.ycombinator.com/item?id=42240370
>>>>
>>>> --
>>>> Xuanwo
>>>>
>>>> https://xuanwo.io/
>>>>
>>>
>
> --
>
> *Alex Merced <https://bio.alexmerced.com/data> *
> *Senior Tech Evangelist, Dremio **Dremio.com*
> <https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature>*/
> **Follow Us on LinkedIn!* <https://www.linkedin.com/company/dremio>
> *Resources for Getting Hands-on with Apache Iceberg/Dremio*
> <https://medium.com/data-engineering-with-dremio/a-deep-intro-to-apache-iceberg-and-resources-for-learning-more-be51535cff74>
>


-- 

*Alex Merced <https://bio.alexmerced.com/data> *
*Senior Tech Evangelist, Dremio **Dremio.com*
<https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature>*/
**Follow Us on LinkedIn!* <https://www.linkedin.com/company/dremio>
*Resources for Getting Hands-on with Apache Iceberg/Dremio*
<https://medium.com/data-engineering-with-dremio/a-deep-intro-to-apache-iceberg-and-resources-for-learning-more-be51535cff74>

Re: Storing catalog directly on object store

Reply via email to