Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread Péter Váry
+1 from my side too. I wanted to make sure that the community is aware of this change which will bring behavioral difference compared to other catalogs. This is why I have asked Steve to start this thread. On Thu, Nov 28, 2024, 02:10 Szehon Ho wrote: > Yea, I think that part is definitely kept

Re: [DISCUSS] Hive Support

2024-11-27 Thread Péter Váry
Given that the Hive folks also leaning towards keeping the hive-runtime code in the Hive repo, I think we should move forward as Cheng Pan suggested: - Upgrade to Hive 4 - Remove hive-runtime code and tests - Make sure that a nightly build is available, so Hive folks could run integration tests, an

Re: [DISCUSS] Apache Iceberg Summit 2025 - Selection Committee

2024-11-27 Thread Eduard Tudenhöfner
Thanks for organizing this and I'd like to volunteer to help out where I can. On Wed, Nov 27, 2024 at 9:16 AM Christian Thiel wrote: > Hey JB, > > happy to help any way I can. Thanks for organizing this! > > Best, > Christian > > On 27. Nov 2024, at 07:52, Fokko Driesprong wrote: > > Hey JB, >

Re: [DISCUSS] Enforce table properties at catalog level

2024-11-27 Thread Pucheng Yang
I think the naming of the property should be fixed as it only applies for any new table creation. On Wed, Nov 27, 2024 at 2:21 AM Manu Zhang wrote: > Hi all, > > Currently, we can *enforce default table properties* at catalog level > with configs like > spark.sql.catalog.*catalog-name*.table-ove

Re: Storing catalog directly on object store

2024-11-27 Thread Steve Loughran
There's a PR up from amazon to add this to the s3a connector https://github.com/apache/hadoop/pull/7011 targeting a 3.4.2 release early next year, though they've not updated the PR as requested yet. 1. It doesn't give you the same semantics as posix create-no-overwrite call -you only get t

Re: [DISCUSS] Hive Support

2024-11-27 Thread Ayush Saxena
> Let me know if the above doesn't make any sense, though! To be honest, it doesn’t. The email feels accusatory, unfairly blaming the Hive community for wrongdoing while portraying the Iceberg folks as "worse" and insinuating misconduct on their part. This kind of tone does nothing to foster conse

Re: [DISCUSS] iceberg rust 0.4.0 and iceberg pyiceberg_core 0.1.0 release

2024-11-27 Thread Sung Yun
Hi folks, it's been some time since we've done an Iceberg Rust release, and we've finally set up the ghactions workflow[1] that will allow us to build and publish an abi3 compatible wheel to Pypi. If we are still +1 for the release (both iceberg-rust and pyiceberg_core), I think it'll be awesom

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread rdb...@gmail.com
I'd support changing the behavior if we still have a way to match the intent, which is to return true if the table exists in Hive and is an Iceberg table. On Wed, Nov 27, 2024 at 11:26 AM Szehon Ho wrote: > Hm I think the thread got a bit sidetracked by the other question. > > The initial propos

Re: [ACTION REQUIRED] Removal of v3 artifact actions on December 5th

2024-11-27 Thread Sung Yun
Hi JB and Kevin, thank you for jumping on the chore. Here's one more PR to bump up the version in iceberg-rust: https://github.com/apache/iceberg-rust/pull/725 I assume this didn't show up in the grep.app search since it was recently merged On 2024/11/26 22:22:36 Kevin Liu wrote: > We merged th

Re: Storing catalog directly on object store

2024-11-27 Thread Alex Merced
This is just a quick thought to put out there: If there will be a new reimagining of a file system catalog, would it be worth adding a multi-table layer on top? *As a rough example:* - At the TOP is a JSON file that is just a mapping of the table name to the directory where VERSION-HINT would be

Re: Storing catalog directly on object store

2024-11-27 Thread Alex Merced
Ignore the last email, just re-read the proposal earlier in the email chain On Wed, Nov 27, 2024 at 11:37 AM Alex Merced wrote: > This is just a quick thought to put out there: If there will be a new > reimagining of a file system catalog, would it be worth adding a > multi-table layer on top? >

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread rdb...@gmail.com
What kind of corruption are you referring to? I would expect corruption to result in an exception when loading the table, but that the table should still exist. The problem is likely that we determine if a table exists by attempting to load it. We could fix that by not attempting to load the table.

Re: [DISCUSS] Hive Support

2024-11-27 Thread Fokko Driesprong
Hey Cheng, Thanks for the suggestion. The nightly snapshots are available: https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-core/, which might help when working on features that are not released yet (eg Nanosecond timestamps). Besides that, we should run RCs agains

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread Szehon Ho
Hm I think the thread got a bit sidetracked by the other question. The initial proposal by Steve is a performance improvement for HiveCatalog's tableExists(). Currently it loads both Hive and Iceberg table metadata, and if successful returns true. The proposal is to load from Hive only, and retu

Re: [Discuss] Document Snapshot Summary Optional Fields for Standardization

2024-11-27 Thread Szehon Ho
This makes sense to me generally, I've tried a few times to search in the spec to find a list of possible snapshot summary properties, and was a bit surprised to not find them there. So I think this would be a nice addition. I'm curious if there's any historical reason it's not been included in t

Re: [DISCUSS] Deprecate embedded manifests

2024-11-27 Thread Fokko Driesprong
I'd say emit deprecation warnings for a reasonable amount of time (at least v2.0 of the Java implementations), including emitting warnings as shown in the PR . This and then remove the code path at some point. If you still have snapshots around with ma

Re: [ACTION REQUIRED] Removal of v3 artifact actions on December 5th

2024-11-27 Thread Kevin Liu
Thanks Sung. I assumed grep.app will continuously index all GitHub repos but it seems to be missing a few. For completeness, I went through the GitHub search feature, using `org:apache` with both `upload-artifact@v3` and `download-artifact@v3`. * https://github.com/search?q=org%3Aapache%20upload-a

Re: [DISCUSS] iceberg rust 0.4.0 and iceberg pyiceberg_core 0.1.0 release

2024-11-27 Thread Kevin Liu
Thanks for driving this, Sung! I'm +1 to release both iceberg-rust and pyiceberg_core. It's very exciting to see pyiceberg_core and its integration with PyIceberg. It makes sense to decouple pyiceberg_core from iceberg-rust since the two "projects" are on different tracks. We'd want to release pyic

Re: [DISCUSS] Hive Support

2024-11-27 Thread Cheng Pan
> That said, it would be helpful if they continue running > tests against the latest stable Hive releases to ensure that any > changes don’t unintentionally break something for Hive, which would be > beyond our control. > I believe we should continue maintaining a Hive Iceberg runtime test suite

Re: Storing catalog directly on object store

2024-11-27 Thread rdb...@gmail.com
> We deprecated this recently and we don't have to deprecate it if object stores support atomic operations like this. I disagree because this misses many of the reasons for deprecation. It isn't just that S3 didn't support a `putIfAbsent` operation. Other object stores did and there are still seve

Re: [Discuss] Document Snapshot Summary Optional Fields for Standardization

2024-11-27 Thread Kevin Liu
Thanks for driving this Honah! It's important to have a consistent naming scheme so that we don't need to worry about edge cases when using multiple engines, and possibly have to deal with migrations. Also, since users can store arbitrary key/value pairs in the summary property, it's good to docu

Re: [DISCUSS] Enforce table properties at catalog level

2024-11-27 Thread rdb...@gmail.com
Manu, this is something that you can easily build into a REST catalog implementation. I think that's probably the best way to solve it, rather than trying to implement this behavior across all of the catalogs in the project, right? On Wed, Nov 27, 2024 at 8:47 AM Pucheng Yang wrote: > I think th

Re: [DISCUSS] iceberg rust 0.4.0 and iceberg pyiceberg_core 0.1.0 release

2024-11-27 Thread Fokko Driesprong
Hey Sung, All for it, and happy to help as well. I'll add it to the agenda for tomorrow's Rust sync . We'll make sure to publish the notes since it is on a US holiday. Kind regards, Fokko Op wo 27 nov 2024 om 19:30 schreef Kevin L

Re: [DISCUSS] Hive Support

2024-11-27 Thread rdb...@gmail.com
I think that we should remove Hive 2 and Hive 3. We already agreed to remove Hive 2, but Hive 3 is not compatible with the project anymore and is already EOL and will not see a release to update it so that it can be compatible. Anyone using the existing Hive 3 support should be able to continue usi

Re: [DISCUSS] Deprecate embedded manifests

2024-11-27 Thread rdb...@gmail.com
I think it's reasonable to mark it deprecated in the spec, especially because we don't allow it in v2. But I'm not sure how that would allow us to remove code paths associated with it. If it is allowed by an older and supported version of the spec, then how can we safely remove the code paths that

Re: [VOTE] Release Apache PyIceberg 0.8.1rc1

2024-11-27 Thread Sung Yun
Hi Kevin, Yes, that approach sounds good to me as well. And thanks for the explanation! Sung On Wed, Nov 27, 2024 at 8:17 PM Kevin Liu wrote: > Hey Sung, > > Good point. For context, I accidentally generated and uploaded to PyPi a > version with `0.8.1` instead of `0.8.1rc1`. Fokko helped me y

Re: [VOTE] Release Apache PyIceberg 0.8.1rc1

2024-11-27 Thread Kevin Liu
Hey Sung, Good point. For context, I accidentally generated and uploaded to PyPi a version with `0.8.1` instead of `0.8.1rc1`. Fokko helped me yank that version. https://pypi.org/project/pyiceberg/0.8.1/ If this RC passes, we can un-yank and reuse the currently uploaded version. Otherwise, I can

[VOTE] Release Apache PyIceberg 0.8.1rc1

2024-11-27 Thread Kevin Liu
Hi Everyone, I propose that we release the following RC as the official PyIceberg 0.8.1 release. The commit ID is a051584a3684392d2db6556449eb299145d47d15 * This corresponds to the tag: pyiceberg-0.8.1rc1 (17124779c5294cb928f3807ed539f427f9b4bd2e) * https://github.com/apache/iceberg-python/relea

Re: [DISCUSS] Hive Support

2024-11-27 Thread Ajantha Bhat
+1 to remove support for both Hive2 and Hive3 in the latest Iceberg release as it has reached EOL. Hive4 is natively managing Iceberg integration, similar to how Trino handles its Iceberg integration. Therefore, in my opinion, it would be better for engines to manage the integration aspect, allowi

Re: [VOTE] Release Apache PyIceberg 0.8.1rc1

2024-11-27 Thread Sung Yun
Hi Kevin, Thank you so much for working on this release! I noticed this morning that PyIceberg 0.8.1 was released and yanked[1] this morning. Similar to how we had handled it when this had happened last time, I think this would mean that we would need to now move on to the next version and publis

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread Szehon Ho
Yea, I think that part is definitely kept. Thanks Szehon On Wed, Nov 27, 2024 at 12:02 PM rdb...@gmail.com wrote: > I'd support changing the behavior if we still have a way to match the > intent, which is to return true if the table exists in Hive and is an > Iceberg table. > > On Wed, Nov 27,

Re: [DISCUSS] Apache Iceberg Summit 2025 - Selection Committee

2024-11-27 Thread Christian Thiel
Hey JB, happy to help any way I can. Thanks for organizing this! Best, Christian On 27. Nov 2024, at 07:52, Fokko Driesprong wrote: Hey JB, Thanks for organizing this. Happy to help! Kind regards, Fokko Op wo 27 nov 2024 om 06:23 schreef karuppayya mailto:karuppayya1...@gmail.com>>: Hi JB,

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread Manu Zhang
> > The current behavior's intent is not to check whether the metadata is > valid, it is to detect whether the table is an Iceberg table. Is there a way to detect this from HiveCatalog without loading the table? On Wed, Nov 27, 2024 at 2:01 PM Péter Váry wrote: > I think we have an agreement,

Re: Storing catalog directly on object store

2024-11-27 Thread Xuanwo
Hi I believe we still need to deprecate HadoopCatalog since the operation is still not safe on Hadoop. As raised by Jack Ye before, I suggest we consider having a StorageCatalog or ObjectStorageCatalog that can only be used with storage services supporting conditional writes. That would be a go

[DISCUSS] Enforce table properties at catalog level

2024-11-27 Thread Manu Zhang
Hi all, Currently, we can *enforce default table properties* at catalog level with configs like spark.sql.catalog.*catalog-name*.table-override.*propertyKey*[1]. It prevents users from overriding those properties when creating a table. However, users can still override later through altering the

Re: Storing catalog directly on object store

2024-11-27 Thread Gabor Kaszab
Hi All, Xuanwo, I recall the reasoning against HadoopCatalog was the other way around: even though it is safe to use on HDFS, it is unsafe on object storage. I believe that this gap of functionalities of object stores seems to go away, so for me HadoopCatalog would even make more sense now than be

Re: [DISCUSS] Hive Support

2024-11-27 Thread Gabor Kaszab
Hi All, As I see there is a general opinion on not keeping the Hive code in the Iceberg repo, but maintaining a set of tests that verifies the actual Iceberg code against the latest Hive release. For me it would seem a bit odd to maintain a test suite for verifying some code that is not maintained

Re: Storing catalog directly on object store

2024-11-27 Thread Manu Zhang
I think one major issue with current HadoopCatalog is that there's no way to manage tables by name. If adding one metadata layer on top of it, we need to handle more consistency challenges. Manu On Wed, Nov 27, 2024 at 8:03 PM Gabor Kaszab wrote: > Hi All, > > Xuanwo, I recall the reasoning aga