Hey everyone, Lisoda,
In recent days, I've attempted to migrate hadoop_catalog to jdbc-catalog, > but I failed. Was this because the JDBC(or SQL)-Catalog didn't work, or the migration was not feasible? In the case of the first, I invite you to raise an issue on Github to see what's happening. Next to the HadoopCatalog there is also the SQL-Catalog as mentioned above. This is available in Java, PyIceberg, and in-flight for Rust. While for the HadoopCatalog the correctness depends on the guarantee of the underlying storage, with the SQLCatalog we can also move forward and implement features like multi-table transactions. PyIceberg relies heavily on the SQLCatalog with an in-memory database (SQLite) for integration tests. Since there is no consensus, I believe clarifying the spec and moving the HadoopCatalog to a separate package are the first two steps. Kind regards, Fokko Op di 30 jul 2024 om 09:43 schreef Gabor Kaszab <gaborkas...@apache.org>: > Hey Iceberg Community, > > Sorry, for being late to this conversation. I just wanted to share that > I'm against deprecating HadoopCatalog or moving it to tests. Currently > Impala relies heavily on HadoopCatalog for it's own tests and I personally > find HadoopCatalog pretty handy when I just want to do some cross-engine > experiments where my data is already on HDFS and I just write a table with > engineA and see if engineB can read it and I don't want to bother with > setting up any services to serve as an Iceberg catalog (HMS for instance). > > I believe that even though we don't consider HadoopCatalog a production > grade solution as it is now, it has its benefits for lightweight > experimentation. > > - I'm +1 for keeping HadoopCatalog > - We should emphasize that HDFS is the desired storage for > HadoopCatalog (can we force this in the code?) > - Apparently, there is a part of this community that is open to add > enhancements to HadoopCatalog to bring it closer to production gradeness > (lisoda). I don't think we shouldn't block these contributions. > - If we say that REST Catalog is preferred over HadoopCatalog I think > the Iceberg project should offer its own open-source solution available for > everyone. > > Regards, > Gabor > > On Thu, Jul 25, 2024 at 9:04 PM Ryan Blue <b...@databricks.com.invalid> > wrote: > >> There are ways to use object store or file system features to do this, >> but there are a lot of variations. Building implementations and trying to >> standardize each one is a lot of work. And then you still get a catalog >> that doesn't support important features. >> >> I don't think that this is a good direction to build for the Iceberg >> project. But I also have no objection to someone doing it in a different >> project that uses the Iceberg metadata format. >> >> On Tue, Jul 23, 2024 at 5:57 PM lisoda <lis...@yeah.net> wrote: >> >>> >>> Sir, regarding this point, we have some experience. In my view, as long >>> as the file system supports atomic single-file writing, where the file >>> becomes immediately visible upon the client's successful write operation, >>> that is sufficient. We can do without the rename operation as long as the >>> file system guarantees this feature. Of course, if the object storage >>> system supports mutex operations, we can also uniformly use the rename >>> operation for committing. We can theoretically avoid the situation of >>> providing a large number of commit strategies for different file systems. >>> ---- Replied Message ---- >>> From Jack Ye<yezhao...@gmail.com> <yezhao...@gmail.com> >>> Date 07/24/2024 02:52 >>> To dev@iceberg.apache.org >>> Cc >>> Subject Re: Re: [DISCUSS] Deprecate HadoopTableOperations, move to >>> tests in 2.0 >>> If we come up with a new storage-only catalog implementation that could >>> solve those limitations and also leverage the new features being developed >>> in object storage, would that be a potential alternative strategy? so >>> HadoopCatalog users has a way to move forward with still a storage-only >>> catalog that can run on HDFS, and we can fully deprecate HadoopCatalog. >>> >>> -Jack >>> >>> On Tue, Jul 23, 2024 at 10:00 AM Ryan Blue <b...@databricks.com.invalid> >>> wrote: >>> >>>> I don't think we would want to put this in a module with other catalog >>>> implementations. It has serious limitations and is actively discouraged, >>>> while the other catalog implementations still have value as either REST >>>> back-end catalogs or as regular catalogs for many users. >>>> >>>> On Tue, Jul 23, 2024 at 9:11 AM Jack Ye <yezhao...@gmail.com> wrote: >>>> >>>>> For some additional information, we also have some Iceberg HDFS users >>>>> on EMR. Those are mainly users that have long-running Hadoop and HBase >>>>> installations. They typically refresh their installation every 1-2 years. >>>>> From my understanding, they use S3 for data storage, but metadata is kept >>>>> in the local HDFS cluster, thus HadoopCatalog works well for them. >>>>> >>>>> I remember we discussed moving all catalog implementations in the main >>>>> repo right now to a separated iceberg-catalogs repo. Could we do this move >>>>> as a part of that effort? >>>>> >>>>> -Jack >>>>> >>>>> On Tue, Jul 23, 2024 at 8:46 AM Ryan Blue <b...@databricks.com.invalid> >>>>> wrote: >>>>> >>>>>> Thanks for the context, lisoda. I agree that it's good to understand >>>>>> the issues you're facing with the HadoopCatalog. One follow up question >>>>>> that I have is what the underlying storage is. Are you using HDFS for >>>>>> those >>>>>> 30,000 customers? >>>>>> >>>>>> I think you're right that there is a challenge to migrating. Because >>>>>> there is no catalog requirement, it's hard to make sure you have all of >>>>>> the >>>>>> writers migrated. I think that means we do need to have a plan or >>>>>> recommendation for people currently using this catalog in production, but >>>>>> it also puts more pressure on us to deprecate this catalog and avoid more >>>>>> people having this problem. >>>>>> >>>>>> I think it's a good idea to make the spec change, which we have >>>>>> agreement for and to ensure that the FS catalog and table operations are >>>>>> properly deprecated to show that they should not be used. I'm not sure >>>>>> whether there is support in the community for moving the implementation >>>>>> into a new iceberg-hadoop module, but at a minimum we can't just remove >>>>>> this right away. I think that a separate iceberg-hadoop module would make >>>>>> the most sense. >>>>>> >>>>>> On Thu, Jul 18, 2024 at 11:09 PM lisoda <lis...@yeah.net> wrote: >>>>>> >>>>>>> Hi team. >>>>>>> I am not a pmc member, just a regular user. Instead of >>>>>>> discussing whether hadoopcatalog needs to continue to exist, I'd like to >>>>>>> share a more practical issue. >>>>>>> >>>>>>> We currently serve over 30,000 customers, all of whom use >>>>>>> Iceberg to store their foundational data, and all business analyses are >>>>>>> conducted based on Iceberg. However, all the Iceberg tables are >>>>>>> hadoop_catalog. At least, this has been the case since I started working >>>>>>> with our production environment system. >>>>>>> >>>>>>> In recent days, I've attempted to migrate hadoop_catalog to >>>>>>> jdbc-catalog, but I failed. We store 2PB of data, and replacing the >>>>>>> current >>>>>>> catalogues has become an almost impossible task. Users not only create >>>>>>> hadoop_catalog tables through Spark, they also continuously use >>>>>>> third-party >>>>>>> OLAP systems/FLINK and other means to write data into Iceberg in the >>>>>>> form >>>>>>> of hadoop_catalog. Given this situation, we can only continue to fix >>>>>>> hadoop_catalog and provide services to customers. >>>>>>> >>>>>>> I understand that the community wants to make a big push into >>>>>>> rest-catalog, and I agree with the direction the community is going.But >>>>>>> considering >>>>>>> that there might be a significant number of users facing similar issues, >>>>>>> can we at least retain a module similar to iceberg-hadoop to extend >>>>>>> hadoop_catalog? If it is removed, we won't be able to continue providing >>>>>>> services to customers. So, if possible, please consider this option. >>>>>>> >>>>>>> Thank you all. >>>>>>> >>>>>>> Kind regards, >>>>>>> lisoda >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> At 2024-07-19 01:28:18, "Jack Ye" <yezhao...@gmail.com> wrote: >>>>>>> >>>>>>> Thank you for bringing this up Ryan. I have been also in the camp of >>>>>>> saying HadoopCatalog is not recommended, but after thinking about this >>>>>>> more >>>>>>> deeply last night, I now have mixed feelings about this topic. Just to >>>>>>> comment on the reasons you listed first: >>>>>>> >>>>>>> * For reason 1 & 2, it looks like the root cause is that people try >>>>>>> to use HadoopCatalog outside native HDFS because there are HDFS >>>>>>> connectors >>>>>>> to other storages like S3AFileSystem. However, the norm for such usage >>>>>>> has >>>>>>> been that those connectors do not strictly follow HDFS semantics, and >>>>>>> it is >>>>>>> assumed that people acknowledge the implication of such usage and accept >>>>>>> the risk. For example, S3AFileSystem was there even before S3 was >>>>>>> strongly >>>>>>> consistent, but people have been using that to write files. >>>>>>> >>>>>>> * For reason 3, there are multiple catalogs that do not support all >>>>>>> operations (e.g. Glue for atomic table rename) and people still widely >>>>>>> use >>>>>>> it. >>>>>>> >>>>>>> * For reason 4, I see that more as a missing feature. More features >>>>>>> could definitely be developed in that catalog implementation. >>>>>>> >>>>>>> So the key question to me is, how can we prevent people from using >>>>>>> HadoopCatalog outside native HDFS. We know HadoopCatalog is popular >>>>>>> because >>>>>>> it is a storage only solution. For object storages specifically, >>>>>>> HadoopCatalog is not suitable for 2 reasons: >>>>>>> >>>>>>> (1) file write does not enforce mutual exclusion, thus cannot >>>>>>> enforce Iceberg optimistic concurrency requirement (a.k.a. cannot do >>>>>>> atomic >>>>>>> and swap) >>>>>>> >>>>>>> (2) directory-based design is not preferred in object storage and >>>>>>> will result in bad performance. >>>>>>> >>>>>>> However, now I look at these 2 issues, they are getting outdated. >>>>>>> >>>>>>> (1) object storage is starting to enforce file mutual exclusion. GCS >>>>>>> supports file generation number [1] that increments monotonically, and >>>>>>> can >>>>>>> use x-goog-if-generation-match [2] to perform atomic swap. Similar >>>>>>> feature >>>>>>> [3] exists in Azure Blob Storage. I cannot speak for the S3 team >>>>>>> roadmap. >>>>>>> But Amazon S3 is clearly falling behind in this domain, and with market >>>>>>> competition, it is very clear that similar features will come in >>>>>>> reasonably >>>>>>> near future. >>>>>>> >>>>>>> (2) directory bucket is becoming the norm. Amazon S3 announced >>>>>>> directory bucket in 2023 re:invent [4], which does not have the same >>>>>>> performance limitation even if you have very nested folders and many >>>>>>> objects in a folder. GCS also has a similar feature launched in preview >>>>>>> [5] >>>>>>> right now. Azure also already has this feature since 2021 [6]. >>>>>>> >>>>>>> With these new developments in the industry, a storage-only Iceberg >>>>>>> catalog becomes very attractive. It is simple with only one service >>>>>>> dependency. It can safely perform atomic compare-and-swap. It is >>>>>>> performant >>>>>>> without the need to worry about folder and file organization. If you >>>>>>> want >>>>>>> to add additional features for things like access control, there are >>>>>>> also >>>>>>> integrations like access grant [7] that can be integrated to do it in a >>>>>>> very scalable way. >>>>>>> >>>>>>> I know the direction in the community so far is to go with the REST >>>>>>> catalog, and I am personally a big advocate for that. However, that >>>>>>> requires either building a full REST catalog, or choosing a catalog >>>>>>> vendor >>>>>>> that supports REST. There are many capabilities that REST would unlock, >>>>>>> but >>>>>>> those are visions which I expect will take many years down the road for >>>>>>> the >>>>>>> community to continue to drive consensus and build those features. If I >>>>>>> am >>>>>>> the CTO of a small company and I just want an Iceberg data lake(house) >>>>>>> right now, do I choose REST, or do I choose (or even just build) a >>>>>>> storage-only Iceberg catalog? I feel I would actually choose the later. >>>>>>> >>>>>>> Going back to the discussion points, my current take of this topic >>>>>>> is that: >>>>>>> >>>>>>> (1) +1 for clarifying that HadoopCatalog should only work with HDFS >>>>>>> in the spec. >>>>>>> >>>>>>> (2) +1 if we want to block non-HDFS use cases in HadoopCatalog by >>>>>>> default (e.g. fail if using S3A), but we should allow a feature flag to >>>>>>> unblock the usage so that people can use it after understanding the >>>>>>> implications and risks, just like how people use S3A today. >>>>>>> >>>>>>> (3) +0 for removing HadoopCatalog from the core library. It could be >>>>>>> in a different module like iceberg-hdfs if that is more suitable. >>>>>>> >>>>>>> (4) -1 for moving HadoopCatalog to tests, because HDFS is still a >>>>>>> valid use case for Iceberg. After the measures 1-3 above, people >>>>>>> actually >>>>>>> having a HDFS use case should be able to continue to innovate and >>>>>>> optimize >>>>>>> the HadoopCatalog implementation. Although "HDFS is becoming much less >>>>>>> common", looking at GitHub issues and discussion forums, it still has a >>>>>>> pretty big user base. >>>>>>> >>>>>>> (5) In general, I propose we separate the discussion of >>>>>>> HadoopCatalog from a "storage only catalog" that also deals with other >>>>>>> object stages when evaluating it. With these latest industry >>>>>>> developments, >>>>>>> we should evaluate the direction for building a storage only Iceberg >>>>>>> catalog and see if the community has an interest in that. I could help >>>>>>> raise a thread about it after this discussion is closed. >>>>>>> >>>>>>> Best, >>>>>>> Jack Ye >>>>>>> >>>>>>> [1] >>>>>>> https://cloud.google.com/storage/docs/object-versioning#file_restoration_behavior >>>>>>> [2] >>>>>>> https://cloud.google.com/storage/docs/xml-api/reference-headers#xgoogifgenerationmatch >>>>>>> [3] >>>>>>> https://learn.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations >>>>>>> [4] >>>>>>> https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-buckets-overview.html >>>>>>> [5] https://cloud.google.com/storage/docs/buckets#enable-hns >>>>>>> [6] >>>>>>> https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace >>>>>>> [7] >>>>>>> https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-grants.html >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 18, 2024 at 7:16 AM Eduard Tudenhöfner < >>>>>>> etudenhoef...@apache.org> wrote: >>>>>>> >>>>>>>> +1 on deprecating now and removing them from the codebase with >>>>>>>> Iceberg 2.0 >>>>>>>> >>>>>>>> On Thu, Jul 18, 2024 at 10:40 AM Ajantha Bhat < >>>>>>>> ajanthab...@gmail.com> wrote: >>>>>>>> >>>>>>>>> +1 on deprecating the `File System Tables` from spec and >>>>>>>>> `HadoopCatalog`, `HadoopTableOperations` in code for now >>>>>>>>> and removing them permanently during 2.0 release. >>>>>>>>> >>>>>>>>> For testing we can use `InMemoryCatalog` as others mentioned. >>>>>>>>> >>>>>>>>> I am not sure about moving to test or keeping them only for HDFS. >>>>>>>>> Because, it leads to confusion to existing users of Hadoop catalog. >>>>>>>>> >>>>>>>>> I wanted to have it deprecated 2 years ago >>>>>>>>> <https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1647950504955309> >>>>>>>>> and I remember that we discussed it in sync that time and left it as >>>>>>>>> it is. >>>>>>>>> Also, when the user brought this up in slack >>>>>>>>> <https://apache-iceberg.slack.com/archives/C03LG1D563F/p1720075009593789?thread_ts=1719993403.208859&cid=C03LG1D563F> >>>>>>>>> recently about lockmanager and refactoring the HadoopTableOperations, >>>>>>>>> I have asked to open this discussion on the mailing list. So, that >>>>>>>>> we can conclude it once and for all. >>>>>>>>> >>>>>>>>> - Ajantha >>>>>>>>> >>>>>>>>> On Thu, Jul 18, 2024 at 12:49 PM Fokko Driesprong < >>>>>>>>> fo...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> Hey Ryan and others, >>>>>>>>>> >>>>>>>>>> Thanks for bringing this up. I would be in favor of removing the >>>>>>>>>> HadoopTableOperations, mostly because of the reasons that you already >>>>>>>>>> mentioned, but also about the fact that it is not fully in line with >>>>>>>>>> the >>>>>>>>>> first principles of Iceberg (being object store native) as it uses >>>>>>>>>> file-listing. >>>>>>>>>> >>>>>>>>>> I think we should deprecate the HadoopTables to raise the >>>>>>>>>> attention of their users. I would be reluctant to move it to test to >>>>>>>>>> just >>>>>>>>>> use it for testing purposes, I'd rather remove it and replace its >>>>>>>>>> use in >>>>>>>>>> tests with the InMemoryCatalog. >>>>>>>>>> >>>>>>>>>> Regarding the StaticTable, this is an easy way to have a >>>>>>>>>> read-only table by directly pointing to the metadata. This also >>>>>>>>>> lives in >>>>>>>>>> Java under StaticTableOperations >>>>>>>>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/StaticTableOperations.java>. >>>>>>>>>> It isn't a full-blown catalog where you can list {tables,schemas}, >>>>>>>>>> update tables, etc. As ZENOTME pointed out already, it is all up to >>>>>>>>>> the >>>>>>>>>> user, for example, there is no listing of directories to determine >>>>>>>>>> which >>>>>>>>>> tables are in the catalog. >>>>>>>>>> >>>>>>>>>> is there a probability that the strategy used by HadoopCatalog is >>>>>>>>>>> not compatible with the table managed by other catalogs? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes, so they are different, you can see in the spec the section >>>>>>>>>> on File System tables >>>>>>>>>> <https://github.com/apache/iceberg/blob/main/format/spec.md#file-system-tables>, >>>>>>>>>> is used by the HadoopTable implementation. Whereas the other catalogs >>>>>>>>>> follow the Metastore Tables >>>>>>>>>> <https://github.com/apache/iceberg/blob/main/format/spec.md#metastore-tables> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>>> Kind regards, >>>>>>>>>> Fokko >>>>>>>>>> >>>>>>>>>> Op do 18 jul 2024 om 07:19 schreef NOTME ZE < >>>>>>>>>> st810918...@gmail.com>: >>>>>>>>>> >>>>>>>>>>> According to our requirements, this function is for some users >>>>>>>>>>> who want to read iceberg tables without relying on any catalogs, I >>>>>>>>>>> think >>>>>>>>>>> the StaticTable may be more flexible and clear in semantics. For >>>>>>>>>>> StaticTable, it's the user's responsibility to decide which >>>>>>>>>>> metadata of the >>>>>>>>>>> table to read. But for read-only HadoopCatalog, the metadata may be >>>>>>>>>>> decided by Catalog, is there a probability that the strategy used by >>>>>>>>>>> HadoopCatalog is not compatible with the table managed by other >>>>>>>>>>> catalogs? >>>>>>>>>>> >>>>>>>>>>> Renjie Liu <liurenjie2...@gmail.com> 于2024年7月18日周四 11:39写道: >>>>>>>>>>> >>>>>>>>>>>> I think there are two ways to do this: >>>>>>>>>>>> 1. As Xuanwo said, we refactor HadoopCatalog to be read only, >>>>>>>>>>>> and throw unsupported operation exception for other operations that >>>>>>>>>>>> manipulate tables. >>>>>>>>>>>> 2. Totally deprecate HadoopCatalog, and add StaticTable as we >>>>>>>>>>>> did in pyiceberg or iceberg-rust. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jul 18, 2024 at 11:26 AM Xuanwo <xua...@apache.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, Renjie >>>>>>>>>>>>> >>>>>>>>>>>>> Are you suggesting that we refactor HadoopCatalog as a >>>>>>>>>>>>> FileSystemCatalog to enable direct reading from file systems like >>>>>>>>>>>>> HDFS, S3, >>>>>>>>>>>>> and Azure Blob Storage? This catalog will be read-only that don't >>>>>>>>>>>>> support >>>>>>>>>>>>> write operations. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 18, 2024, at 10:23, Renjie Liu wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, Ryan: >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for raising this. I agree that HadoopCatalog is >>>>>>>>>>>>> dangerous in manipulating tables/catalogs given limitations of >>>>>>>>>>>>> different >>>>>>>>>>>>> file systems. But I see that there are some users who want to >>>>>>>>>>>>> read iceberg >>>>>>>>>>>>> tables without relying on any catalogs, this is also the >>>>>>>>>>>>> motivational use >>>>>>>>>>>>> case of StaticTable in pyiceberg and iceberg-rust, is there >>>>>>>>>>>>> similar things >>>>>>>>>>>>> in java implementation? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 18, 2024 at 7:01 AM Ryan Blue <b...@apache.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> There has been some recent discussion about improving >>>>>>>>>>>>> HadoopTableOperations and the catalog based on those tables, but >>>>>>>>>>>>> we've >>>>>>>>>>>>> discouraged using file system only table (or "hadoop" tables) for >>>>>>>>>>>>> years now >>>>>>>>>>>>> because of major problems: >>>>>>>>>>>>> * It is only safe to use hadoop tables with HDFS; most local >>>>>>>>>>>>> file systems, S3, and other common object stores are unsafe >>>>>>>>>>>>> * Despite not providing atomicity guarantees outside of HDFS, >>>>>>>>>>>>> people use the tables in unsafe situations >>>>>>>>>>>>> * HadoopCatalog cannot implement atomic operations for rename >>>>>>>>>>>>> and drop table, which are commonly used in data engineering >>>>>>>>>>>>> * Alternative file names (for instance when using metadata >>>>>>>>>>>>> file compression) also break guarantees >>>>>>>>>>>>> >>>>>>>>>>>>> While these tables are useful for testing in non-production >>>>>>>>>>>>> scenarios, I think it's misleading to have them in the core >>>>>>>>>>>>> module because >>>>>>>>>>>>> there's an appearance that they are a reasonable choice. I >>>>>>>>>>>>> propose we >>>>>>>>>>>>> deprecate the HadoopTableOperations and HadoopCatalog >>>>>>>>>>>>> implementations and >>>>>>>>>>>>> move them to tests the next time we can make breaking API changes >>>>>>>>>>>>> (2.0). >>>>>>>>>>>>> >>>>>>>>>>>>> I think we should also consider similar fixes to the table >>>>>>>>>>>>> spec. It currently describes how HadoopTableOperations works, >>>>>>>>>>>>> which does >>>>>>>>>>>>> not work in object stores or local file systems. HDFS is becoming >>>>>>>>>>>>> much less >>>>>>>>>>>>> common and I propose that we note that the strategy in the spec >>>>>>>>>>>>> should ONLY >>>>>>>>>>>>> be used with HDFS. >>>>>>>>>>>>> >>>>>>>>>>>>> What do other people think? >>>>>>>>>>>>> >>>>>>>>>>>>> Ryan >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Ryan Blue >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Xuanwo >>>>>>>>>>>>> >>>>>>>>>>>>> https://xuanwo.io/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Databricks >>>>>> >>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Databricks >>>> >>> >> >> -- >> Ryan Blue >> Databricks >> >