Appreciate the thoughtful comments!
On Thu, Jul 18, 2024 at 10:29 AM Jack Ye <yezhao...@gmail.com> wrote: > Thank you for bringing this up Ryan. I have been also in the camp of > saying HadoopCatalog is not recommended, but after thinking about this more > deeply last night, I now have mixed feelings about this topic. Just to > comment on the reasons you listed first: > > * For reason 1 & 2, it looks like the root cause is that people try to use > HadoopCatalog outside native HDFS because there are HDFS connectors to > other storages like S3AFileSystem. However, the norm for such usage has > been that those connectors do not strictly follow HDFS semantics, and it is > assumed that people acknowledge the implication of such usage and accept > the risk. For example, S3AFileSystem was there even before S3 was strongly > consistent, but people have been using that to write files. > > * For reason 3, there are multiple catalogs that do not support all > operations (e.g. Glue for atomic table rename) and people still widely use > it. > > * For reason 4, I see that more as a missing feature. More features could > definitely be developed in that catalog implementation. > > So the key question to me is, how can we prevent people from using > HadoopCatalog outside native HDFS. We know HadoopCatalog is popular because > it is a storage only solution. For object storages specifically, > HadoopCatalog is not suitable for 2 reasons: > > (1) file write does not enforce mutual exclusion, thus cannot enforce > Iceberg optimistic concurrency requirement (a.k.a. cannot do atomic and > swap) > > (2) directory-based design is not preferred in object storage and will > result in bad performance. > > However, now I look at these 2 issues, they are getting outdated. > > (1) object storage is starting to enforce file mutual exclusion. GCS > supports file generation number [1] that increments monotonically, and can > use x-goog-if-generation-match [2] to perform atomic swap. Similar feature > [3] exists in Azure Blob Storage. I cannot speak for the S3 team roadmap. > But Amazon S3 is clearly falling behind in this domain, and with market > competition, it is very clear that similar features will come in reasonably > near future. > > (2) directory bucket is becoming the norm. Amazon S3 announced directory > bucket in 2023 re:invent [4], which does not have the same performance > limitation even if you have very nested folders and many objects in a > folder. GCS also has a similar feature launched in preview [5] right now. > Azure also already has this feature since 2021 [6]. > > With these new developments in the industry, a storage-only Iceberg > catalog becomes very attractive. It is simple with only one service > dependency. It can safely perform atomic compare-and-swap. It is performant > without the need to worry about folder and file organization. If you want > to add additional features for things like access control, there are also > integrations like access grant [7] that can be integrated to do it in a > very scalable way. > > I know the direction in the community so far is to go with the REST > catalog, and I am personally a big advocate for that. However, that > requires either building a full REST catalog, or choosing a catalog vendor > that supports REST. There are many capabilities that REST would unlock, but > those are visions which I expect will take many years down the road for the > community to continue to drive consensus and build those features. If I am > the CTO of a small company and I just want an Iceberg data lake(house) > right now, do I choose REST, or do I choose (or even just build) a > storage-only Iceberg catalog? I feel I would actually choose the later. > > Going back to the discussion points, my current take of this topic is that: > > (1) +1 for clarifying that HadoopCatalog should only work with HDFS in the > spec. > > (2) +1 if we want to block non-HDFS use cases in HadoopCatalog by default > (e.g. fail if using S3A), but we should allow a feature flag to unblock the > usage so that people can use it after understanding the implications and > risks, just like how people use S3A today. > > (3) +0 for removing HadoopCatalog from the core library. It could be in a > different module like iceberg-hdfs if that is more suitable. > > (4) -1 for moving HadoopCatalog to tests, because HDFS is still a valid > use case for Iceberg. After the measures 1-3 above, people actually having > a HDFS use case should be able to continue to innovate and optimize the > HadoopCatalog implementation. Although "HDFS is becoming much less common", > looking at GitHub issues and discussion forums, it still has a pretty big > user base. > > (5) In general, I propose we separate the discussion of HadoopCatalog from > a "storage only catalog" that also deals with other object stages when > evaluating it. With these latest industry developments, we should evaluate > the direction for building a storage only Iceberg catalog and see if the > community has an interest in that. I could help raise a thread about it > after this discussion is closed. > > Best, > Jack Ye > > [1] > https://cloud.google.com/storage/docs/object-versioning#file_restoration_behavior > [2] > https://cloud.google.com/storage/docs/xml-api/reference-headers#xgoogifgenerationmatch > [3] > https://learn.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations > [4] > https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-buckets-overview.html > [5] https://cloud.google.com/storage/docs/buckets#enable-hns > [6] > https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace > [7] > https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-grants.html > > > > > > > On Thu, Jul 18, 2024 at 7:16 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> +1 on deprecating now and removing them from the codebase with Iceberg 2.0 >> >> On Thu, Jul 18, 2024 at 10:40 AM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> >>> +1 on deprecating the `File System Tables` from spec and >>> `HadoopCatalog`, `HadoopTableOperations` in code for now >>> and removing them permanently during 2.0 release. >>> >>> For testing we can use `InMemoryCatalog` as others mentioned. >>> >>> I am not sure about moving to test or keeping them only for HDFS. >>> Because, it leads to confusion to existing users of Hadoop catalog. >>> >>> I wanted to have it deprecated 2 years ago >>> <https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1647950504955309> >>> and I remember that we discussed it in sync that time and left it as it is. >>> Also, when the user brought this up in slack >>> <https://apache-iceberg.slack.com/archives/C03LG1D563F/p1720075009593789?thread_ts=1719993403.208859&cid=C03LG1D563F> >>> recently about lockmanager and refactoring the HadoopTableOperations, >>> I have asked to open this discussion on the mailing list. So, that we >>> can conclude it once and for all. >>> >>> - Ajantha >>> >>> On Thu, Jul 18, 2024 at 12:49 PM Fokko Driesprong <fo...@apache.org> >>> wrote: >>> >>>> Hey Ryan and others, >>>> >>>> Thanks for bringing this up. I would be in favor of removing the >>>> HadoopTableOperations, mostly because of the reasons that you already >>>> mentioned, but also about the fact that it is not fully in line with the >>>> first principles of Iceberg (being object store native) as it uses >>>> file-listing. >>>> >>>> I think we should deprecate the HadoopTables to raise the attention of >>>> their users. I would be reluctant to move it to test to just use it for >>>> testing purposes, I'd rather remove it and replace its use in tests with >>>> the InMemoryCatalog. >>>> >>>> Regarding the StaticTable, this is an easy way to have a read-only >>>> table by directly pointing to the metadata. This also lives in Java under >>>> StaticTableOperations >>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/StaticTableOperations.java>. >>>> It isn't a full-blown catalog where you can list {tables,schemas}, >>>> update tables, etc. As ZENOTME pointed out already, it is all up to the >>>> user, for example, there is no listing of directories to determine which >>>> tables are in the catalog. >>>> >>>> is there a probability that the strategy used by HadoopCatalog is not >>>>> compatible with the table managed by other catalogs? >>>> >>>> >>>> Yes, so they are different, you can see in the spec the section on File >>>> System tables >>>> <https://github.com/apache/iceberg/blob/main/format/spec.md#file-system-tables>, >>>> is used by the HadoopTable implementation. Whereas the other catalogs >>>> follow the Metastore Tables >>>> <https://github.com/apache/iceberg/blob/main/format/spec.md#metastore-tables> >>>> . >>>> >>>> Kind regards, >>>> Fokko >>>> >>>> Op do 18 jul 2024 om 07:19 schreef NOTME ZE <st810918...@gmail.com>: >>>> >>>>> According to our requirements, this function is for some users who >>>>> want to read iceberg tables without relying on any catalogs, I think the >>>>> StaticTable may be more flexible and clear in semantics. For StaticTable, >>>>> it's the user's responsibility to decide which metadata of the table to >>>>> read. But for read-only HadoopCatalog, the metadata may be decided by >>>>> Catalog, is there a probability that the strategy used by HadoopCatalog is >>>>> not compatible with the table managed by other catalogs? >>>>> >>>>> Renjie Liu <liurenjie2...@gmail.com> 于2024年7月18日周四 11:39写道: >>>>> >>>>>> I think there are two ways to do this: >>>>>> 1. As Xuanwo said, we refactor HadoopCatalog to be read only, and >>>>>> throw unsupported operation exception for other operations that >>>>>> manipulate >>>>>> tables. >>>>>> 2. Totally deprecate HadoopCatalog, and add StaticTable as we did in >>>>>> pyiceberg or iceberg-rust. >>>>>> >>>>>> On Thu, Jul 18, 2024 at 11:26 AM Xuanwo <xua...@apache.org> wrote: >>>>>> >>>>>>> Hi, Renjie >>>>>>> >>>>>>> Are you suggesting that we refactor HadoopCatalog as a >>>>>>> FileSystemCatalog to enable direct reading from file systems like HDFS, >>>>>>> S3, >>>>>>> and Azure Blob Storage? This catalog will be read-only that don't >>>>>>> support >>>>>>> write operations. >>>>>>> >>>>>>> On Thu, Jul 18, 2024, at 10:23, Renjie Liu wrote: >>>>>>> >>>>>>> Hi, Ryan: >>>>>>> >>>>>>> Thanks for raising this. I agree that HadoopCatalog is dangerous in >>>>>>> manipulating tables/catalogs given limitations of different file >>>>>>> systems. >>>>>>> But I see that there are some users who want to read iceberg tables >>>>>>> without >>>>>>> relying on any catalogs, this is also the motivational use case of >>>>>>> StaticTable in pyiceberg and iceberg-rust, is there similar things in >>>>>>> java >>>>>>> implementation? >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2024 at 7:01 AM Ryan Blue <b...@apache.org> wrote: >>>>>>> >>>>>>> Hey everyone, >>>>>>> >>>>>>> There has been some recent discussion about improving >>>>>>> HadoopTableOperations and the catalog based on those tables, but we've >>>>>>> discouraged using file system only table (or "hadoop" tables) for years >>>>>>> now >>>>>>> because of major problems: >>>>>>> * It is only safe to use hadoop tables with HDFS; most local file >>>>>>> systems, S3, and other common object stores are unsafe >>>>>>> * Despite not providing atomicity guarantees outside of HDFS, people >>>>>>> use the tables in unsafe situations >>>>>>> * HadoopCatalog cannot implement atomic operations for rename and >>>>>>> drop table, which are commonly used in data engineering >>>>>>> * Alternative file names (for instance when using metadata file >>>>>>> compression) also break guarantees >>>>>>> >>>>>>> While these tables are useful for testing in non-production >>>>>>> scenarios, I think it's misleading to have them in the core module >>>>>>> because >>>>>>> there's an appearance that they are a reasonable choice. I propose we >>>>>>> deprecate the HadoopTableOperations and HadoopCatalog implementations >>>>>>> and >>>>>>> move them to tests the next time we can make breaking API changes (2.0). >>>>>>> >>>>>>> I think we should also consider similar fixes to the table spec. It >>>>>>> currently describes how HadoopTableOperations works, which does not >>>>>>> work in >>>>>>> object stores or local file systems. HDFS is becoming much less common >>>>>>> and >>>>>>> I propose that we note that the strategy in the spec should ONLY be used >>>>>>> with HDFS. >>>>>>> >>>>>>> What do other people think? >>>>>>> >>>>>>> Ryan >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> >>>>>>> >>>>>>> Xuanwo >>>>>>> >>>>>>> https://xuanwo.io/ >>>>>>> >>>>>>>