Re: Should DDL operations always create new snapshots?

2025-05-13 Thread Vladimir Ozerov
> Also, it isn’t clear to me why the time travel query would resolve * as >> a and b when time traveling. That error shows that there is an inconsistent >> schema between time(v3) and the current schema (where b comes from). What >> happens when you run `SELECT * FROM t FOR TIMEST

Should DDL operations always create new snapshots?

2025-05-09 Thread Vladimir Ozerov
andled on the engine side, but this will lead to inconsistent behaviour between different engines depending on their internal implementation details. Regards, -- *Vladimir Ozerov*

registerView command for REST catalog

2025-01-29 Thread Vladimir Ozerov
Hi, It is possible to register an already existing table within a REST catalog. It seems that a similar feature is missing for views. WDYT if we add it to the protocol with a mechanics similar to "registerTable": RegisterViewRequest[name, metadata-location]? Regards, -- *Vladimir Ozerov*

Re: [DISCUSS] REST Catalog bulk object lookup

2025-01-03 Thread Vladimir Ozerov
requests bursts, with considerable number of requests returning error responses because we cannot get object type and its metadata in one shot. On Tue, Dec 24, 2024 at 10:29 PM Vladimir Ozerov wrote: > Hi, > > Following the discussion [1] I'd like to formally propose an exten

There is no easy way to secure Iceberg data. How can we improve?

2025-01-01 Thread Vladimir Ozerov
reciate your feedback on the matter. Regards, -- *Vladimir Ozerov*

[DISCUSS] REST Catalog bulk object lookup

2024-12-24 Thread Vladimir Ozerov
SQL query planning latency. Proposal: https://docs.google.com/document/d/1KfzdQT8Q2xiV_yPNvICROCepz-Qqpm0npob7hmb40Fc/edit?usp=sharing [1] https://lists.apache.org/thread/g44czzpjqqhdvronqfyckw4mnxvlpn3s Regards, -- *Vladimir Ozerov*

Re: Optimize object lookup in REST catalog

2024-12-17 Thread Vladimir Ozerov
additional filters (like in JDBC or Arrow Flight SQL), or sorting might be useful here? It would be nice to have several examples of real metadata queries generated by BI tools for better understanding. Trying to collect more pain points to wrap my head around the potential proposal. *Vladimir Ozerov* Ср

Re: REST catalog high availability

2024-12-17 Thread Vladimir Ozerov
create and demonstrate a prototype. Regards, *Vladimir Ozerov* Вт, 17 дек. 2024 г. в 16:16, Jean-Baptiste Onofré : > Hi Vladimir > > As I said in my previous email, I can already "inject" the > PoolingHttpClientConnectionManager in the client. So, technically > speaking, I thi

Re: REST catalog high availability

2024-12-17 Thread Vladimir Ozerov
-private-cloud-upgrade/latest/upgrade-cdh/topics/hive-hms-ha-configuration.html *Vladimir Ozerov* Вт, 10 дек. 2024 г. в 00:57, Yufei Gu : > Load balancing operates at a different layer than APIs, with various > implementations available, such as etcd and Zookeeper. I’d prefer to avoid > in

REST catalog high availability

2024-12-09 Thread Vladimir Ozerov
Hi, Catalog is a critical part of Iceberg infrastructure and may require highly available setup. In similar services (e.g., HMS, etc) this is often done as follows: 1. Start several service instances 2. Decide which one is coordinator via etcd, Zookeper, Ratis, etc 3. Expose HA endpoint

Re: [Discuss] Iceberg View Interoperability

2024-12-05 Thread Vladimir Ozerov
ckdb, > postgres, datafusion, for example, all disagree on at least one case). > > P.P.S. The plan "SELECT sum(a)" is even more diabolical as it pulls > numerical precision and processing order into the mix (e.g. some engines > can give you two different answers on two d

Re: [Discuss] Iceberg View Interoperability

2024-12-05 Thread Vladimir Ozerov
ds, >>>> >>>>> Fokko >>>> >>>>> >>>> >>>>> >>>> >>>>> Op vr 25 okt 2024 om 22:16 schreef Walaa Eldin Moustafa < >>>> wa.moust...@gmail.com>: >>>> >>>>>> >>>> >>>>>> I think this may need some more discussion. >>>> >>>>>> >>>> >>>>>> To me, a "serialized IR" is another form of a "dialect". In this >>>> case, this dialect will be mostly specific to Iceberg, and compute engines >>>> will still support reading views in their native SQL. There are some data >>>> points on this from the Trino community in a previous discussion [1]. In >>>> addition to being not directly consumable by engines, a serialized IR will >>>> be hard to consume by humans too. >>>> >>>>>> >>>> >>>>>> From that perspective, even if Iceberg adopts some form of a >>>> serialized IR, we will end up again doing translation, from that IR to the >>>> engine's dialect on view read time, and from the engine's dialect to that >>>> IR on the view write time. So serialized IR cannot eliminate translation. >>>> >>>>>> >>>> >>>>>> I think it is better to not quickly adopt the serialized IR path >>>> until it is proven to work and there is sufficient tooling and support >>>> around it, else it will end up being a constraint. >>>> >>>>>> >>>> >>>>>> For Coral vs SQLGlot (Disclaimer: I maintain Coral): There are >>>> some fundamental differences between their approaches, mainly around the >>>> intermediate representation abstraction. Coral models both the AST and the >>>> logical plan of a query, making it able to capture the query semantics more >>>> accurately and hence perform precise transformations. On the flip side, >>>> SQLGlot abstraction is at the AST level only. Data type inference would be >>>> a major gap in any solution that does not capture the logical plan for >>>> example, yet very important to perform successful translation. This is >>>> backed up by some experiments we performed on actual queries and their >>>> translation results (from Spark to Trino, comparing results of Coral and >>>> SQLGlot). >>>> >>>>>> >>>> >>>>>> For the IR: Any translation solution (including Coral) must rely >>>> on an IR, and it has to be decoupled from any of the input and output >>>> dialects. This is true in the Coral case today. Such IR is the way to >>>> represent both the intermediate AST and logical plans. Therefore, I do not >>>> think we can necessarily split projects as "IR projects" vs not, since all >>>> solutions must use an IR. With that said, IR serialization is a matter of >>>> staging/milestones of the project. Serialized IR is next on Coral's >>>> roadmap. If Iceberg ends up adopting an IR, it might be a good idea to make >>>> Iceberg interoperable with a Coral-based serialized IR. This will make the >>>> compatibility with engines that adopt Coral (like Trino) much more robust >>>> and straightforward. >>>> >>>>>> >>>> >>>>>> [1] >>>> https://github.com/trinodb/trino/pull/19818#issuecomment-1925894002 >>>> >>>>>> >>>> >>>>>> Thanks, >>>> >>>>>> Walaa. >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>> -- *Vladimir Ozerov* Founder querifylabs.com

Optimize object lookup in REST catalog

2024-12-04 Thread Vladimir Ozerov
Hi, Consider the query “SELECT * FROM t”. The query engine needs to resolve the object “t” during semantic analysis. In Iceberg, this could be a table, a view, a materialized view (soon). Currently, the engine has to guess object type via multiple REST calls, e.g loadTable -> loadView. This incr

Re: Overwrite old properties on table replace with REST catalog

2024-12-04 Thread Vladimir Ozerov
ready have existing integration > tests on Iceberg connector for Trino for Hive/Hadoop catalog, then just > setting up the exact same tests against REST catalog for Trino connector > can help systematically detect behavior differences between catalog types. > > Regards, > Haizhou

Re: Storing catalog directly on object store

2024-12-03 Thread Vladimir Ozerov
REST catalog cannot handle some common cases now (namespace renames, object references in views, etc). With this in mind, it seems that while new S3 capabilities are formally sufficient to implement a basic catalog, they can address only a small fraction of real user requirements. *Vladimir Ozerov

Re: Optionally disable SSL verification for RESTCatalog

2024-11-23 Thread Vladimir Ozerov
t; I would propose to add: > > private static final String REST_SSL_DISABLE_CERTIFICATE_CHECK = > "rest.ssl.disable.cert.check"; > > and used this for HTTP5 client setup. > > Regards > JB > > On Wed, Nov 13, 2024 at 1:53 PM Vladimir Ozerov > wrote: > > > > Hi, &

Re: REST catalog removes void transform

2024-11-13 Thread Vladimir Ozerov
d transforms? Is this in a v2 table? In a v2 table, the > catalog should be free to remove void transforms. They are required for v1. > > On Wed, Oct 30, 2024 at 5:00 AM Vladimir Ozerov > wrote: > >> Hi, >> >> When a user creates a table with void() transform on a s

Optionally disable SSL verification for RESTCatalog

2024-11-13 Thread Vladimir Ozerov
a property "rest.client.insecure-ssl" passed to the client. What do you think about this? Apologize if it was already discussed elsewhere, I couldn't find any relevant discussions. Regards, -- *Vladimir Ozerov* Founder querifylabs.com

REST catalog removes void transform

2024-10-30 Thread Vladimir Ozerov
e is not partitioned anyway. However, some engines, such as Trino, currently retain void() partitioning info for non-REST catalogs. What would be the proper expectation from the Iceberg user in this case - should it observe void() in table schema or not? Regards, -- *Vladimir Ozerov* Founder querifylabs.com

Re: Overwrite old properties on table replace with REST catalog

2024-10-23 Thread Vladimir Ozerov
Hi, Sure, will do. *Vladimir Ozerov* Founder querifylabs.com Ср, 23 окт. 2024 г. в 08:50, Jean-Baptiste Onofré : > I second Ryan here, it would be great to clarify in the > "implementation notes" section. > > Thanks ! > Regards > JB > > On Wed, Oct 23, 2024

Re: Overwrite old properties on table replace with REST catalog

2024-10-20 Thread Vladimir Ozerov
g-open-api.yaml#L553 > ) > and update a table > ( > https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L975 > ), > and it's up to the query engine to implement the "CREATE OR REPLACE" > with the correct semantic. > > Regards > JB

Re: Overwrite old properties on table replace with REST catalog

2024-10-20 Thread Vladimir Ozerov
table data changes regularly. And the person > changing the data may not be the person tuning the table settings. > > Hopefully that helps, > > Ryan > > On Sun, Oct 20, 2024 at 9:45 AM Vladimir Ozerov > wrote: > >> Hi, >> >> Consider a REST catalo

Overwrite old properties on table replace with REST catalog

2024-10-20 Thread Vladimir Ozerov
ies would be [a=1, b=3, c=4], while the user expects [b=3, c=4]. It looks like a bug because the user expects complete property replacement instead of a merge. Shall we explicitly clear all previous properties in RESTSessionCatalog.Builder.replaceTransaction? Regards, Vladimir. -- *Vladimir Ozerov* Founder querifylabs.com