Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-18 Thread Péter Váry
Here are the cases where we call the `newWorkerPool` in our code: - Correctly: - S3FileIO.executorService - HadoopFileIO.executorService - Incorrectly: - CountersBenchmark.defaultCounterMultipleThreads (core module) - BaseDistributedDataScan.newMonitorPool (core modul

Re: Code structuring question

2024-09-18 Thread Péter Váry
One more idea: - Create a new gradle module for the "api" that would contain all the classes a client could access. This would fit nicely to the Iceberg codebase, but would need a serious refactor of the current code (maybe even the api) I'm still in favor of the api package solution. On Wed, Se

[DISCUSS] Defining a concept of "externally owned" tables in the REST spec

2024-09-18 Thread Dennis Huo
Hi all, I wanted to follow up on some discussions that came up in one of the Iceberg Catalog community syncs awhile back relating to the concept of tables that can be registered in an Iceberg REST Catalog but which have their "source of truth" in some external Catalog. The original context was th

REST Catalog based Integration Test for Query Engines

2024-09-18 Thread Haizhou Zhao
Hello dev-list, *What* I'm looking for issues and PRs reviews from the community to enable REST Catalog based Integration Test for Query Engines. Issue: https://github.com/apache/iceberg/issues/11079 PR: https://github.com/apache/iceberg/pull/11093 *Background* Recently, thanks to @Daniel's effo

Re: [DISCUSS] Iceberg Materialzied Views

2024-09-18 Thread Benny Chow
Steven and I met up yesterday at the Seattle Iceberg meetup and we got to talking about the "catalog alias" issue. He described it as an annoying problem =p I think there are some key requirements we need to support: 1. Different engines can produce and consume shared MVs with freshness validati

Re: [DISCUSS] Column to Column filtering

2024-09-18 Thread Russell Spitzer
I have similar concerns to Ryan although I could see that if we were writing smaller and better correlated files that this could be a big help. Specifically with variant use cases this may be very useful. I would love to hear more about the use cases and rationale for adding this. Do you have any s

Re: [DISCUSS] Column to Column filtering

2024-09-18 Thread rdb...@gmail.com
I'm curious to learn more about this feature. Is there a driving use case that you're implementing it for? Are there common situations in which these filters are helpful and selective? My initial impression is that this kind of expression would have limited utility at the table format level. Icebe

Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-18 Thread rdb...@gmail.com
I think this is the intended behavior. The code calls `MoreExecutors.getExitingExecutorService` internally to ensure the pool exits. I think the right fix is for callers to create their own `ExecutorService` rather than using `newWorkerPool`. That allows for customization without making Iceberg mor

Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-18 Thread Péter Váry
This is not just a Flink issue, tha calls are spread out in multiple packages. We checked the code, and in many of the current use-cases in the Iceberg repo the pool is not used in a static environment, and closed manually. In this cases we should switch to a thread pool without a shutdown hook. So

Code structuring question

2024-09-18 Thread Péter Váry
Hi Team, Currently I'm working on the Flink Table Maintenance see: https://github.com/apache/iceberg/pull/11144#discussion_r1764015878, and with Steven we are trying to find a good way to organize the incoming 50 classes. There will be: - ~10 classes which will be used by the users - ~10 classes

Re: [DISCUSS] Modify ThreadPools.newWorkerPool to avoid unnecessary Shutdown Hook registration

2024-09-18 Thread rdb...@gmail.com
Since we're using standard interfaces, maybe we should just document this behavior and you can control it by creating your own worker pool instead? On Tue, Sep 17, 2024 at 2:20 AM Péter Váry wrote: > Bumping this thread a bit. > > Cleaning up the pool in non-static cases should be a responsibili

Re: [DISCUSS] REST: OAuth2 Authentication Guide

2024-09-18 Thread Dmitri Bourlatchkov
Hi Christian, Very nice proposal. Thanks for putting it together! I added some comments to the doc. I think it is related to PR #10753 [4], which proposes some foundational refactoring to the java REST client to enable further enhancements in OAuth2 flows. Cheers, Dmitri. [4] https://github.com

[DISCUSS] REST: OAuth2 Authentication Guide

2024-09-18 Thread Christian Thiel
Dear everyone, the Iceberg REST specification allows for different ways of Authentication, OAuth2 is one of them. Until recently the OAuth2 /token endpoint was part of the REST-spec together with datatypes required for the client-credential flow. Both have since been removed from the spec for s