View Support for Hive Catalog

2024-05-24 Thread Naveen Kumar
eople looked into this. In the early discussion, Peter and Szehon have already shared their thoughts. Please take a look and let me know your comments. Issue:https://github.com/apache/iceberg/issues/8698 PR: https://github.com/apache/iceberg/pull/9852 Regards, Naveen Kumar

Re: [Discuss] Heap pressure with RewriteFiles APIs

2024-05-24 Thread Naveen Kumar
e flush to manifest as soon as possible. At commit time we should only lookout for the created manifests and save it to the new snapshot. Please share your thoughts. Thanks, Naveen Kumar On Wed, May 22, 2024 at 9:51 PM Amogh Jahagirdar wrote: > I'd think chunking the work as muc

Re: [Discuss] Heap pressure with RewriteFiles APIs

2024-05-22 Thread Naveen Kumar
re *Set, Set *has grown to a significantly big number(say 1M). This might be very rare but I have seen examples where a user never ran compaction and after adding a new partition column they are trying to compact the entire table. Is this a valid use case? WDYT? Regards, Naveen Kumar On Tue, May

[Discuss] Heap pressure with RewriteFiles APIs

2024-05-21 Thread Naveen Kumar
tions <https://github.com/apache/iceberg/blob/8d6bee736884575da7368e0963268d1cbe362d90/api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java#L53C7-L53C43>better to avoid any heap pressure? Also, has someone encountered similar issues and if so how did they fix it? Regards, Naveen Kumar

Potential Enhancement for ViewMetadata

2024-01-17 Thread Naveen Kumar
thoughts. Ref: https://github.com/apache/iceberg/pull/8907#discussion_r1441986754 Regards, Naveen Kumar

Re: [DISCUSS] Iceberg community summit

2024-01-17 Thread Naveen Kumar
Happy to volunteer wherever I can. I would love to start with promotional content stuff. Regards, Naveen Kumar On Wed, Jan 17, 2024 at 7:11 AM Steven Wu wrote: > Happy to volunteer for the selection committee too. Looking forward to a > great event! > > On Tue, Jan 16, 2024 at

Re: [DISCUSS] Run GC with Catalog or Tables

2023-12-06 Thread Naveen Kumar
t; Agreed. I started this discussion to take the opinions of individuals on the idea of GC per catalog. If this does sound a good use case, I can start spending time around the complexity and challenges. Please advise. Regards, Naveen Kumar On Thu, Dec 7, 2023 at 9:24 AM Renjie Liu wrote: >

[DISCUSS] Run GC with Catalog or Tables

2023-12-06 Thread naveen
s() Multiple Tables CALL hive_prod.system.expire_snapshots(tables => Array('db1.table1', 'db2.table2), ) PS: There could be exceptions for individual catalogs. Like Nessie doesn't support GC other than Nessie CLI. Hadoop can't list all the Namespaces. Regards, Naveen Kumar