This is just a quick thought to put out there: If there will be a new reimagining of a file system catalog, would it be worth adding a multi-table layer on top?
*As a rough example:* - At the TOP is a JSON file that is just a mapping of the table name to the directory where VERSION-HINT would be found (this is so the file is only updated when tables are created or dropped) - Then Engine finds the directory and uses the VERSION-HINT like normal to discover metadata and plan the scan This way, you have a listing of all your tables, so you don't have to re-register each table with each tool but still can avoid having to run a full service on top for basic application *Governance in this Type of Catalog:* - You can group different tables into different JSON files/catalogs - Then file access controls on the JSON file can be used as a simple way to control user access to groups of tables On Wed, Nov 27, 2024 at 8:27 AM Manu Zhang <owenzhang1...@gmail.com> wrote: > I think one major issue with current HadoopCatalog is that there's no way > to manage tables by name. If adding one metadata layer on top of it, we > need to handle more consistency challenges. > > Manu > > On Wed, Nov 27, 2024 at 8:03 PM Gabor Kaszab <gaborkas...@apache.org> > wrote: > >> Hi All, >> >> Xuanwo, I recall the reasoning against HadoopCatalog was the other way >> around: even though it is safe to use on HDFS, it is unsafe on object >> storage. I believe that this gap of functionalities of object stores seems >> to go away, so for me HadoopCatalog would even make more sense now than >> before. The name might not be straightforward as it's not just for Hadoop. >> >> Regards, >> Gabor >> >> >> On Wed, Nov 27, 2024 at 9:02 AM Xuanwo <xua...@apache.org> wrote: >> >>> Hi >>> >>> I believe we still need to deprecate HadoopCatalog since the operation >>> is still not safe on Hadoop. As raised by Jack Ye before, I suggest we >>> consider having a StorageCatalog or ObjectStorageCatalog that can only be >>> used with storage services supporting conditional writes. That would be a >>> good approach. >>> >>> On Wed, Nov 27, 2024, at 15:47, Nikhil Benesch wrote: >>> > Makes sense! I'd be eager to chat more about this but I'm afraid I >>> won't be at >>> > re:Invent. Maybe we plan to circle back after re:Invent, once we see >>> what AWS >>> > announces? >>> > >>> > On Tue, Nov 26, 2024 at 2:58 PM Jean-Baptiste Onofré <j...@nanthrax.net> >>> wrote: >>> >> >>> >> Hi Nikhil >>> >> >>> >> Thanks for your message, very interesting. >>> >> >>> >> I think it would be great to involve the Polaris project here as well, >>> >> as a REST Catalog implementation. >>> >> The Polaris community is discussing storage/backend right now, so it >>> >> would be the perfect timing to consider leveraging S3 conditional >>> >> writes (as a plugin for instance first). >>> >> >>> >> I would be happy to connect and know more about your perspective >>> about that. >>> >> >>> >> Thanks, >>> >> Regards >>> >> JB >>> >> >>> >> PS: I will be at AWS re:Invent next week, so maybe we can connect >>> there. >>> >> >>> >> On Tue, Nov 26, 2024 at 6:35 PM Nikhil Benesch < >>> nikhil.bene...@gmail.com> wrote: >>> >> > >>> >> > Hi all, >>> >> > >>> >> > With Amazon S3 announcing support for the If-Match header yesterday >>> [0], all the >>> >> > major object store implementations now support a compare-and-swap >>> operation. >>> >> > >>> >> > As far as I can tell, this opens up the possibility of storing >>> Iceberg >>> >> > catalogs directly on object storage, without the need for a >>> separate metastore, >>> >> > and without violating any of Iceberg's ACID guarantees. >>> >> > >>> >> > It seems the immediate next step is to build an independent Java or >>> REST catalog >>> >> > backend to prove this concept out. Long term, though, the ideal >>> would be to >>> >> > have such a catalog backend be a first class citizen in the Iceberg >>> project. >>> >> > >>> >> > Is anyone else in the Iceberg community barking up this tree? I'm a >>> long term >>> >> > Iceberg enthusiast, but new to the community. I'd very much >>> appreciate any >>> >> > pointers to current or past discussions on the topic. So far all >>> I've been >>> >> > able to turn up is some light chatter from myself and others on >>> Bluesky and >>> >> > Hacker News ([1][2][3]). >>> >> > >>> >> > Cheers, >>> >> > Nikhil >>> >> > >>> >> > [0]: >>> https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/ >>> >> > [1]: >>> https://bsky.app/profile/benesch.bsky.social/post/3lauesxg3ic2c >>> >> > [2]: >>> https://bsky.app/profile/eatonphil.bsky.social/post/3lbskq3jwk22e >>> >> > [3]: https://news.ycombinator.com/item?id=42240370 >>> >>> -- >>> Xuanwo >>> >>> https://xuanwo.io/ >>> >> -- *Alex Merced <https://bio.alexmerced.com/data> * *Senior Tech Evangelist, Dremio **Dremio.com* <https://www.dremio.com/?utm_medium=email&utm_source=signature&utm_term=na&utm_content=email-signature&utm_campaign=email-signature>*/ **Follow Us on LinkedIn!* <https://www.linkedin.com/company/dremio> *Resources for Getting Hands-on with Apache Iceberg/Dremio* <https://medium.com/data-engineering-with-dremio/a-deep-intro-to-apache-iceberg-and-resources-for-learning-more-be51535cff74>