Ignore the last email, just re-read the proposal earlier in the email chain
On Wed, Nov 27, 2024 at 11:37 AM Alex Merced <alex.mer...@dremio.com> wrote: > This is just a quick thought to put out there: If there will be a new > reimagining of a file system catalog, would it be worth adding a > multi-table layer on top? > > *As a rough example:* > > - At the TOP is a JSON file that is just a mapping of the table name to > the directory where VERSION-HINT would be found (this is so the file is > only updated when tables are created or dropped) > - Then Engine finds the directory and uses the VERSION-HINT like normal to > discover metadata and plan the scan > > This way, you have a listing of all your tables, so you don't have to > re-register each table with each tool but still can avoid having to run a > full service on top for basic application > > *Governance in this Type of Catalog:* > > - You can group different tables into different JSON files/catalogs > - Then file access controls on the JSON file can be used as a simple way > to control user access to groups of tables > > > On Wed, Nov 27, 2024 at 8:27 AM Manu Zhang <owenzhang1...@gmail.com> > wrote: > >> I think one major issue with current HadoopCatalog is that there's no way >> to manage tables by name. If adding one metadata layer on top of it, we >> need to handle more consistency challenges. >> >> Manu >> >> On Wed, Nov 27, 2024 at 8:03 PM Gabor Kaszab <gaborkas...@apache.org> >> wrote: >> >>> Hi All, >>> >>> Xuanwo, I recall the reasoning against HadoopCatalog was the other way >>> around: even though it is safe to use on HDFS, it is unsafe on object >>> storage. I believe that this gap of functionalities of object stores seems >>> to go away, so for me HadoopCatalog would even make more sense now than >>> before. The name might not be straightforward as it's not just for Hadoop. >>> >>> Regards, >>> Gabor >>> >>> >>> On Wed, Nov 27, 2024 at 9:02 AM Xuanwo <xua...@apache.org> wrote: >>> >>>> Hi >>>> >>>> I believe we still need to deprecate HadoopCatalog since the operation >>>> is still not safe on Hadoop. As raised by Jack Ye before, I suggest we >>>> consider having a StorageCatalog or ObjectStorageCatalog that can only be >>>> used with storage services supporting conditional writes. That would be a >>>> good approach. >>>> >>>> On Wed, Nov 27, 2024, at 15:47, Nikhil Benesch wrote: >>>> > Makes sense! I'd be eager to chat more about this but I'm afraid I >>>> won't be at >>>> > re:Invent. Maybe we plan to circle back after re:Invent, once we see >>>> what AWS >>>> > announces? >>>> > >>>> > On Tue, Nov 26, 2024 at 2:58 PM Jean-Baptiste Onofré <j...@nanthrax.net> >>>> wrote: >>>> >> >>>> >> Hi Nikhil >>>> >> >>>> >> Thanks for your message, very interesting. >>>> >> >>>> >> I think it would be great to involve the Polaris project here as >>>> well, >>>> >> as a REST Catalog implementation. >>>> >> The Polaris community is discussing storage/backend right now, so it >>>> >> would be the perfect timing to consider leveraging S3 conditional >>>> >> writes (as a plugin for instance first). >>>> >> >>>> >> I would be happy to connect and know more about your perspective >>>> about that. >>>> >> >>>> >> Thanks, >>>> >> Regards >>>> >> JB >>>> >> >>>> >> PS: I will be at AWS re:Invent next week, so maybe we can connect >>>> there. >>>> >> >>>> >> On Tue, Nov 26, 2024 at 6:35 PM Nikhil Benesch < >>>> nikhil.bene...@gmail.com> wrote: >>>> >> > >>>> >> > Hi all, >>>> >> > >>>> >> > With Amazon S3 announcing support for the If-Match header >>>> yesterday [0], all the >>>> >> > major object store implementations now support a compare-and-swap >>>> operation. >>>> >> > >>>> >> > As far as I can tell, this opens up the possibility of storing >>>> Iceberg >>>> >> > catalogs directly on object storage, without the need for a >>>> separate metastore, >>>> >> > and without violating any of Iceberg's ACID guarantees. >>>> >> > >>>> >> > It seems the immediate next step is to build an independent Java >>>> or REST catalog >>>> >> > backend to prove this concept out. Long term, though, the ideal >>>> would be to >>>> >> > have such a catalog backend be a first class citizen in the >>>> Iceberg project. >>>> >> > >>>> >> > Is anyone else in the Iceberg community barking up this tree? I'm >>>> a long term >>>> >> > Iceberg enthusiast, but new to the community. I'd very much >>>> appreciate any >>>> >> > pointers to current or past discussions on the topic. So far all >>>> I've been >>>> >> > able to turn up is some light chatter from myself and others on >>>> Bluesky and >>>> >> > Hacker News ([1][2][3]). >>>> >> > >>>> >> > Cheers, >>>> >> > Nikhil >>>> >> > >>>> >> > [0]: >>>> https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/ >>>> >> > [1]: >>>> https://bsky.app/profile/benesch.bsky.social/post/3lauesxg3ic2c >>>> >> > [2]: >>>> https://bsky.app/profile/eatonphil.bsky.social/post/3lbskq3jwk22e >>>> >> > [3]: https://news.ycombinator.com/item?id=42240370 >>>> >>>> -- >>>> Xuanwo >>>> >>>> https://xuanwo.io/ >>>> >>> > > -- > > *Alex Merced <https://bio.alexmerced.com/data> * > *Senior Tech Evangelist, Dremio **Dremio.com* > <https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature>*/ > **Follow Us on LinkedIn!* <https://www.linkedin.com/company/dremio> > *Resources for Getting Hands-on with Apache Iceberg/Dremio* > <https://medium.com/data-engineering-with-dremio/a-deep-intro-to-apache-iceberg-and-resources-for-learning-more-be51535cff74> > -- *Alex Merced <https://bio.alexmerced.com/data> * *Senior Tech Evangelist, Dremio **Dremio.com* <https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature>*/ **Follow Us on LinkedIn!* <https://www.linkedin.com/company/dremio> *Resources for Getting Hands-on with Apache Iceberg/Dremio* <https://medium.com/data-engineering-with-dremio/a-deep-intro-to-apache-iceberg-and-resources-for-learning-more-be51535cff74>