There are ways to use object store or file system features to do this, but there are a lot of variations. Building implementations and trying to standardize each one is a lot of work. And then you still get a catalog that doesn't support important features.
I don't think that this is a good direction to build for the Iceberg project. But I also have no objection to someone doing it in a different project that uses the Iceberg metadata format. On Tue, Jul 23, 2024 at 5:57 PM lisoda <lis...@yeah.net> wrote: > > Sir, regarding this point, we have some experience. In my view, as long as > the file system supports atomic single-file writing, where the file becomes > immediately visible upon the client's successful write operation, that is > sufficient. We can do without the rename operation as long as the file > system guarantees this feature. Of course, if the object storage system > supports mutex operations, we can also uniformly use the rename operation > for committing. We can theoretically avoid the situation of providing a > large number of commit strategies for different file systems. > ---- Replied Message ---- > From Jack Ye<yezhao...@gmail.com> <yezhao...@gmail.com> > Date 07/24/2024 02:52 > To dev@iceberg.apache.org > Cc > Subject Re: Re: [DISCUSS] Deprecate HadoopTableOperations, move to tests > in 2.0 > If we come up with a new storage-only catalog implementation that could > solve those limitations and also leverage the new features being developed > in object storage, would that be a potential alternative strategy? so > HadoopCatalog users has a way to move forward with still a storage-only > catalog that can run on HDFS, and we can fully deprecate HadoopCatalog. > > -Jack > > On Tue, Jul 23, 2024 at 10:00 AM Ryan Blue <b...@databricks.com.invalid> > wrote: > >> I don't think we would want to put this in a module with other catalog >> implementations. It has serious limitations and is actively discouraged, >> while the other catalog implementations still have value as either REST >> back-end catalogs or as regular catalogs for many users. >> >> On Tue, Jul 23, 2024 at 9:11 AM Jack Ye <yezhao...@gmail.com> wrote: >> >>> For some additional information, we also have some Iceberg HDFS users on >>> EMR. Those are mainly users that have long-running Hadoop and HBase >>> installations. They typically refresh their installation every 1-2 years. >>> From my understanding, they use S3 for data storage, but metadata is kept >>> in the local HDFS cluster, thus HadoopCatalog works well for them. >>> >>> I remember we discussed moving all catalog implementations in the main >>> repo right now to a separated iceberg-catalogs repo. Could we do this move >>> as a part of that effort? >>> >>> -Jack >>> >>> On Tue, Jul 23, 2024 at 8:46 AM Ryan Blue <b...@databricks.com.invalid> >>> wrote: >>> >>>> Thanks for the context, lisoda. I agree that it's good to understand >>>> the issues you're facing with the HadoopCatalog. One follow up question >>>> that I have is what the underlying storage is. Are you using HDFS for those >>>> 30,000 customers? >>>> >>>> I think you're right that there is a challenge to migrating. Because >>>> there is no catalog requirement, it's hard to make sure you have all of the >>>> writers migrated. I think that means we do need to have a plan or >>>> recommendation for people currently using this catalog in production, but >>>> it also puts more pressure on us to deprecate this catalog and avoid more >>>> people having this problem. >>>> >>>> I think it's a good idea to make the spec change, which we have >>>> agreement for and to ensure that the FS catalog and table operations are >>>> properly deprecated to show that they should not be used. I'm not sure >>>> whether there is support in the community for moving the implementation >>>> into a new iceberg-hadoop module, but at a minimum we can't just remove >>>> this right away. I think that a separate iceberg-hadoop module would make >>>> the most sense. >>>> >>>> On Thu, Jul 18, 2024 at 11:09 PM lisoda <lis...@yeah.net> wrote: >>>> >>>>> Hi team. >>>>> I am not a pmc member, just a regular user. Instead of discussing >>>>> whether hadoopcatalog needs to continue to exist, I'd like to share a more >>>>> practical issue. >>>>> >>>>> We currently serve over 30,000 customers, all of whom use Iceberg >>>>> to store their foundational data, and all business analyses are conducted >>>>> based on Iceberg. However, all the Iceberg tables are hadoop_catalog. At >>>>> least, this has been the case since I started working with our production >>>>> environment system. >>>>> >>>>> In recent days, I've attempted to migrate hadoop_catalog to >>>>> jdbc-catalog, but I failed. We store 2PB of data, and replacing the >>>>> current >>>>> catalogues has become an almost impossible task. Users not only create >>>>> hadoop_catalog tables through Spark, they also continuously use >>>>> third-party >>>>> OLAP systems/FLINK and other means to write data into Iceberg in the form >>>>> of hadoop_catalog. Given this situation, we can only continue to fix >>>>> hadoop_catalog and provide services to customers. >>>>> >>>>> I understand that the community wants to make a big push into >>>>> rest-catalog, and I agree with the direction the community is going.But >>>>> considering >>>>> that there might be a significant number of users facing similar issues, >>>>> can we at least retain a module similar to iceberg-hadoop to extend >>>>> hadoop_catalog? If it is removed, we won't be able to continue providing >>>>> services to customers. So, if possible, please consider this option. >>>>> >>>>> Thank you all. >>>>> >>>>> Kind regards, >>>>> lisoda >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> At 2024-07-19 01:28:18, "Jack Ye" <yezhao...@gmail.com> wrote: >>>>> >>>>> Thank you for bringing this up Ryan. I have been also in the camp of >>>>> saying HadoopCatalog is not recommended, but after thinking about this >>>>> more >>>>> deeply last night, I now have mixed feelings about this topic. Just to >>>>> comment on the reasons you listed first: >>>>> >>>>> * For reason 1 & 2, it looks like the root cause is that people try to >>>>> use HadoopCatalog outside native HDFS because there are HDFS connectors to >>>>> other storages like S3AFileSystem. However, the norm for such usage has >>>>> been that those connectors do not strictly follow HDFS semantics, and it >>>>> is >>>>> assumed that people acknowledge the implication of such usage and accept >>>>> the risk. For example, S3AFileSystem was there even before S3 was strongly >>>>> consistent, but people have been using that to write files. >>>>> >>>>> * For reason 3, there are multiple catalogs that do not support all >>>>> operations (e.g. Glue for atomic table rename) and people still widely use >>>>> it. >>>>> >>>>> * For reason 4, I see that more as a missing feature. More features >>>>> could definitely be developed in that catalog implementation. >>>>> >>>>> So the key question to me is, how can we prevent people from using >>>>> HadoopCatalog outside native HDFS. We know HadoopCatalog is popular >>>>> because >>>>> it is a storage only solution. For object storages specifically, >>>>> HadoopCatalog is not suitable for 2 reasons: >>>>> >>>>> (1) file write does not enforce mutual exclusion, thus cannot enforce >>>>> Iceberg optimistic concurrency requirement (a.k.a. cannot do atomic and >>>>> swap) >>>>> >>>>> (2) directory-based design is not preferred in object storage and will >>>>> result in bad performance. >>>>> >>>>> However, now I look at these 2 issues, they are getting outdated. >>>>> >>>>> (1) object storage is starting to enforce file mutual exclusion. GCS >>>>> supports file generation number [1] that increments monotonically, and can >>>>> use x-goog-if-generation-match [2] to perform atomic swap. Similar feature >>>>> [3] exists in Azure Blob Storage. I cannot speak for the S3 team roadmap. >>>>> But Amazon S3 is clearly falling behind in this domain, and with market >>>>> competition, it is very clear that similar features will come in >>>>> reasonably >>>>> near future. >>>>> >>>>> (2) directory bucket is becoming the norm. Amazon S3 announced >>>>> directory bucket in 2023 re:invent [4], which does not have the same >>>>> performance limitation even if you have very nested folders and many >>>>> objects in a folder. GCS also has a similar feature launched in preview >>>>> [5] >>>>> right now. Azure also already has this feature since 2021 [6]. >>>>> >>>>> With these new developments in the industry, a storage-only Iceberg >>>>> catalog becomes very attractive. It is simple with only one service >>>>> dependency. It can safely perform atomic compare-and-swap. It is >>>>> performant >>>>> without the need to worry about folder and file organization. If you want >>>>> to add additional features for things like access control, there are also >>>>> integrations like access grant [7] that can be integrated to do it in a >>>>> very scalable way. >>>>> >>>>> I know the direction in the community so far is to go with the REST >>>>> catalog, and I am personally a big advocate for that. However, that >>>>> requires either building a full REST catalog, or choosing a catalog vendor >>>>> that supports REST. There are many capabilities that REST would unlock, >>>>> but >>>>> those are visions which I expect will take many years down the road for >>>>> the >>>>> community to continue to drive consensus and build those features. If I am >>>>> the CTO of a small company and I just want an Iceberg data lake(house) >>>>> right now, do I choose REST, or do I choose (or even just build) a >>>>> storage-only Iceberg catalog? I feel I would actually choose the later. >>>>> >>>>> Going back to the discussion points, my current take of this topic is >>>>> that: >>>>> >>>>> (1) +1 for clarifying that HadoopCatalog should only work with HDFS in >>>>> the spec. >>>>> >>>>> (2) +1 if we want to block non-HDFS use cases in HadoopCatalog by >>>>> default (e.g. fail if using S3A), but we should allow a feature flag to >>>>> unblock the usage so that people can use it after understanding the >>>>> implications and risks, just like how people use S3A today. >>>>> >>>>> (3) +0 for removing HadoopCatalog from the core library. It could be >>>>> in a different module like iceberg-hdfs if that is more suitable. >>>>> >>>>> (4) -1 for moving HadoopCatalog to tests, because HDFS is still a >>>>> valid use case for Iceberg. After the measures 1-3 above, people actually >>>>> having a HDFS use case should be able to continue to innovate and optimize >>>>> the HadoopCatalog implementation. Although "HDFS is becoming much less >>>>> common", looking at GitHub issues and discussion forums, it still has a >>>>> pretty big user base. >>>>> >>>>> (5) In general, I propose we separate the discussion of HadoopCatalog >>>>> from a "storage only catalog" that also deals with other object stages >>>>> when >>>>> evaluating it. With these latest industry developments, we should evaluate >>>>> the direction for building a storage only Iceberg catalog and see if the >>>>> community has an interest in that. I could help raise a thread about it >>>>> after this discussion is closed. >>>>> >>>>> Best, >>>>> Jack Ye >>>>> >>>>> [1] >>>>> https://cloud.google.com/storage/docs/object-versioning#file_restoration_behavior >>>>> [2] >>>>> https://cloud.google.com/storage/docs/xml-api/reference-headers#xgoogifgenerationmatch >>>>> [3] >>>>> https://learn.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations >>>>> [4] >>>>> https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-buckets-overview.html >>>>> [5] https://cloud.google.com/storage/docs/buckets#enable-hns >>>>> [6] >>>>> https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace >>>>> [7] >>>>> https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-grants.html >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Jul 18, 2024 at 7:16 AM Eduard Tudenhöfner < >>>>> etudenhoef...@apache.org> wrote: >>>>> >>>>>> +1 on deprecating now and removing them from the codebase with >>>>>> Iceberg 2.0 >>>>>> >>>>>> On Thu, Jul 18, 2024 at 10:40 AM Ajantha Bhat <ajanthab...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> +1 on deprecating the `File System Tables` from spec and >>>>>>> `HadoopCatalog`, `HadoopTableOperations` in code for now >>>>>>> and removing them permanently during 2.0 release. >>>>>>> >>>>>>> For testing we can use `InMemoryCatalog` as others mentioned. >>>>>>> >>>>>>> I am not sure about moving to test or keeping them only for HDFS. >>>>>>> Because, it leads to confusion to existing users of Hadoop catalog. >>>>>>> >>>>>>> I wanted to have it deprecated 2 years ago >>>>>>> <https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1647950504955309> >>>>>>> and I remember that we discussed it in sync that time and left it as it >>>>>>> is. >>>>>>> Also, when the user brought this up in slack >>>>>>> <https://apache-iceberg.slack.com/archives/C03LG1D563F/p1720075009593789?thread_ts=1719993403.208859&cid=C03LG1D563F> >>>>>>> recently about lockmanager and refactoring the HadoopTableOperations, >>>>>>> I have asked to open this discussion on the mailing list. So, that >>>>>>> we can conclude it once and for all. >>>>>>> >>>>>>> - Ajantha >>>>>>> >>>>>>> On Thu, Jul 18, 2024 at 12:49 PM Fokko Driesprong <fo...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Hey Ryan and others, >>>>>>>> >>>>>>>> Thanks for bringing this up. I would be in favor of removing the >>>>>>>> HadoopTableOperations, mostly because of the reasons that you already >>>>>>>> mentioned, but also about the fact that it is not fully in line with >>>>>>>> the >>>>>>>> first principles of Iceberg (being object store native) as it uses >>>>>>>> file-listing. >>>>>>>> >>>>>>>> I think we should deprecate the HadoopTables to raise the attention >>>>>>>> of their users. I would be reluctant to move it to test to just use it >>>>>>>> for >>>>>>>> testing purposes, I'd rather remove it and replace its use in tests >>>>>>>> with >>>>>>>> the InMemoryCatalog. >>>>>>>> >>>>>>>> Regarding the StaticTable, this is an easy way to have a read-only >>>>>>>> table by directly pointing to the metadata. This also lives in Java >>>>>>>> under >>>>>>>> StaticTableOperations >>>>>>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/StaticTableOperations.java>. >>>>>>>> It isn't a full-blown catalog where you can list {tables,schemas}, >>>>>>>> update tables, etc. As ZENOTME pointed out already, it is all up to the >>>>>>>> user, for example, there is no listing of directories to determine >>>>>>>> which >>>>>>>> tables are in the catalog. >>>>>>>> >>>>>>>> is there a probability that the strategy used by HadoopCatalog is >>>>>>>>> not compatible with the table managed by other catalogs? >>>>>>>> >>>>>>>> >>>>>>>> Yes, so they are different, you can see in the spec the section on File >>>>>>>> System tables >>>>>>>> <https://github.com/apache/iceberg/blob/main/format/spec.md#file-system-tables>, >>>>>>>> is used by the HadoopTable implementation. Whereas the other catalogs >>>>>>>> follow the Metastore Tables >>>>>>>> <https://github.com/apache/iceberg/blob/main/format/spec.md#metastore-tables> >>>>>>>> . >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Fokko >>>>>>>> >>>>>>>> Op do 18 jul 2024 om 07:19 schreef NOTME ZE <st810918...@gmail.com >>>>>>>> >: >>>>>>>> >>>>>>>>> According to our requirements, this function is for some users who >>>>>>>>> want to read iceberg tables without relying on any catalogs, I think >>>>>>>>> the >>>>>>>>> StaticTable may be more flexible and clear in semantics. For >>>>>>>>> StaticTable, >>>>>>>>> it's the user's responsibility to decide which metadata of the table >>>>>>>>> to >>>>>>>>> read. But for read-only HadoopCatalog, the metadata may be decided by >>>>>>>>> Catalog, is there a probability that the strategy used by >>>>>>>>> HadoopCatalog is >>>>>>>>> not compatible with the table managed by other catalogs? >>>>>>>>> >>>>>>>>> Renjie Liu <liurenjie2...@gmail.com> 于2024年7月18日周四 11:39写道: >>>>>>>>> >>>>>>>>>> I think there are two ways to do this: >>>>>>>>>> 1. As Xuanwo said, we refactor HadoopCatalog to be read only, and >>>>>>>>>> throw unsupported operation exception for other operations that >>>>>>>>>> manipulate >>>>>>>>>> tables. >>>>>>>>>> 2. Totally deprecate HadoopCatalog, and add StaticTable as we did >>>>>>>>>> in pyiceberg or iceberg-rust. >>>>>>>>>> >>>>>>>>>> On Thu, Jul 18, 2024 at 11:26 AM Xuanwo <xua...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, Renjie >>>>>>>>>>> >>>>>>>>>>> Are you suggesting that we refactor HadoopCatalog as a >>>>>>>>>>> FileSystemCatalog to enable direct reading from file systems like >>>>>>>>>>> HDFS, S3, >>>>>>>>>>> and Azure Blob Storage? This catalog will be read-only that don't >>>>>>>>>>> support >>>>>>>>>>> write operations. >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 18, 2024, at 10:23, Renjie Liu wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, Ryan: >>>>>>>>>>> >>>>>>>>>>> Thanks for raising this. I agree that HadoopCatalog is dangerous >>>>>>>>>>> in manipulating tables/catalogs given limitations of different file >>>>>>>>>>> systems. But I see that there are some users who want to read >>>>>>>>>>> iceberg >>>>>>>>>>> tables without relying on any catalogs, this is also the >>>>>>>>>>> motivational use >>>>>>>>>>> case of StaticTable in pyiceberg and iceberg-rust, is there similar >>>>>>>>>>> things >>>>>>>>>>> in java implementation? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 18, 2024 at 7:01 AM Ryan Blue <b...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hey everyone, >>>>>>>>>>> >>>>>>>>>>> There has been some recent discussion about improving >>>>>>>>>>> HadoopTableOperations and the catalog based on those tables, but >>>>>>>>>>> we've >>>>>>>>>>> discouraged using file system only table (or "hadoop" tables) for >>>>>>>>>>> years now >>>>>>>>>>> because of major problems: >>>>>>>>>>> * It is only safe to use hadoop tables with HDFS; most local >>>>>>>>>>> file systems, S3, and other common object stores are unsafe >>>>>>>>>>> * Despite not providing atomicity guarantees outside of HDFS, >>>>>>>>>>> people use the tables in unsafe situations >>>>>>>>>>> * HadoopCatalog cannot implement atomic operations for rename >>>>>>>>>>> and drop table, which are commonly used in data engineering >>>>>>>>>>> * Alternative file names (for instance when using metadata file >>>>>>>>>>> compression) also break guarantees >>>>>>>>>>> >>>>>>>>>>> While these tables are useful for testing in non-production >>>>>>>>>>> scenarios, I think it's misleading to have them in the core module >>>>>>>>>>> because >>>>>>>>>>> there's an appearance that they are a reasonable choice. I propose >>>>>>>>>>> we >>>>>>>>>>> deprecate the HadoopTableOperations and HadoopCatalog >>>>>>>>>>> implementations and >>>>>>>>>>> move them to tests the next time we can make breaking API changes >>>>>>>>>>> (2.0). >>>>>>>>>>> >>>>>>>>>>> I think we should also consider similar fixes to the table spec. >>>>>>>>>>> It currently describes how HadoopTableOperations works, which does >>>>>>>>>>> not work >>>>>>>>>>> in object stores or local file systems. HDFS is becoming much less >>>>>>>>>>> common >>>>>>>>>>> and I propose that we note that the strategy in the spec should >>>>>>>>>>> ONLY be >>>>>>>>>>> used with HDFS. >>>>>>>>>>> >>>>>>>>>>> What do other people think? >>>>>>>>>>> >>>>>>>>>>> Ryan >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Ryan Blue >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Xuanwo >>>>>>>>>>> >>>>>>>>>>> https://xuanwo.io/ >>>>>>>>>>> >>>>>>>>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Databricks >>>> >>> >> >> -- >> Ryan Blue >> Databricks >> > -- Ryan Blue Databricks