> Also, it isn’t clear to me why the time travel query would resolve * as
>> a and b when time traveling. That error shows that there is an inconsistent
>> schema between time(v3) and the current schema (where b comes from). What
>> happens when you run `SELECT * FROM t FOR TIMEST
andled on the engine side,
but this will lead to inconsistent behaviour between different engines
depending on their internal implementation details.
Regards,
--
*Vladimir Ozerov*
Hi,
It is possible to register an already existing table within a REST catalog.
It seems that a similar feature is missing for views. WDYT if we add it to
the protocol with a mechanics similar to "registerTable":
RegisterViewRequest[name, metadata-location]?
Regards,
--
*Vladimir Ozerov*
requests bursts, with considerable
number of requests returning error responses because we cannot get object
type and its metadata in one shot.
On Tue, Dec 24, 2024 at 10:29 PM Vladimir Ozerov
wrote:
> Hi,
>
> Following the discussion [1] I'd like to formally propose an exten
reciate your feedback on the matter.
Regards,
--
*Vladimir Ozerov*
SQL query planning latency.
Proposal:
https://docs.google.com/document/d/1KfzdQT8Q2xiV_yPNvICROCepz-Qqpm0npob7hmb40Fc/edit?usp=sharing
[1] https://lists.apache.org/thread/g44czzpjqqhdvronqfyckw4mnxvlpn3s
Regards,
--
*Vladimir Ozerov*
additional filters (like in JDBC or
Arrow Flight SQL), or sorting might be useful here? It would be nice to
have several examples of real metadata queries generated by BI tools for
better understanding.
Trying to collect more pain points to wrap my head around the potential
proposal.
*Vladimir Ozerov*
Ср
create and demonstrate a prototype.
Regards,
*Vladimir Ozerov*
Вт, 17 дек. 2024 г. в 16:16, Jean-Baptiste Onofré :
> Hi Vladimir
>
> As I said in my previous email, I can already "inject" the
> PoolingHttpClientConnectionManager in the client. So, technically
> speaking, I thi
-private-cloud-upgrade/latest/upgrade-cdh/topics/hive-hms-ha-configuration.html
*Vladimir Ozerov*
Вт, 10 дек. 2024 г. в 00:57, Yufei Gu :
> Load balancing operates at a different layer than APIs, with various
> implementations available, such as etcd and Zookeeper. I’d prefer to avoid
> in
Hi,
Catalog is a critical part of Iceberg infrastructure and may require highly
available setup. In similar services (e.g., HMS, etc) this is often done as
follows:
1. Start several service instances
2. Decide which one is coordinator via etcd, Zookeper, Ratis, etc
3. Expose HA endpoint
ckdb,
> postgres, datafusion, for example, all disagree on at least one case).
>
> P.P.S. The plan "SELECT sum(a)" is even more diabolical as it pulls
> numerical precision and processing order into the mix (e.g. some engines
> can give you two different answers on two d
ds,
>>>> >>>>> Fokko
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Op vr 25 okt 2024 om 22:16 schreef Walaa Eldin Moustafa <
>>>> wa.moust...@gmail.com>:
>>>> >>>>>>
>>>> >>>>>> I think this may need some more discussion.
>>>> >>>>>>
>>>> >>>>>> To me, a "serialized IR" is another form of a "dialect". In this
>>>> case, this dialect will be mostly specific to Iceberg, and compute engines
>>>> will still support reading views in their native SQL. There are some data
>>>> points on this from the Trino community in a previous discussion [1]. In
>>>> addition to being not directly consumable by engines, a serialized IR will
>>>> be hard to consume by humans too.
>>>> >>>>>>
>>>> >>>>>> From that perspective, even if Iceberg adopts some form of a
>>>> serialized IR, we will end up again doing translation, from that IR to the
>>>> engine's dialect on view read time, and from the engine's dialect to that
>>>> IR on the view write time. So serialized IR cannot eliminate translation.
>>>> >>>>>>
>>>> >>>>>> I think it is better to not quickly adopt the serialized IR path
>>>> until it is proven to work and there is sufficient tooling and support
>>>> around it, else it will end up being a constraint.
>>>> >>>>>>
>>>> >>>>>> For Coral vs SQLGlot (Disclaimer: I maintain Coral): There are
>>>> some fundamental differences between their approaches, mainly around the
>>>> intermediate representation abstraction. Coral models both the AST and the
>>>> logical plan of a query, making it able to capture the query semantics more
>>>> accurately and hence perform precise transformations. On the flip side,
>>>> SQLGlot abstraction is at the AST level only. Data type inference would be
>>>> a major gap in any solution that does not capture the logical plan for
>>>> example, yet very important to perform successful translation. This is
>>>> backed up by some experiments we performed on actual queries and their
>>>> translation results (from Spark to Trino, comparing results of Coral and
>>>> SQLGlot).
>>>> >>>>>>
>>>> >>>>>> For the IR: Any translation solution (including Coral) must rely
>>>> on an IR, and it has to be decoupled from any of the input and output
>>>> dialects. This is true in the Coral case today. Such IR is the way to
>>>> represent both the intermediate AST and logical plans. Therefore, I do not
>>>> think we can necessarily split projects as "IR projects" vs not, since all
>>>> solutions must use an IR. With that said, IR serialization is a matter of
>>>> staging/milestones of the project. Serialized IR is next on Coral's
>>>> roadmap. If Iceberg ends up adopting an IR, it might be a good idea to make
>>>> Iceberg interoperable with a Coral-based serialized IR. This will make the
>>>> compatibility with engines that adopt Coral (like Trino) much more robust
>>>> and straightforward.
>>>> >>>>>>
>>>> >>>>>> [1]
>>>> https://github.com/trinodb/trino/pull/19818#issuecomment-1925894002
>>>> >>>>>>
>>>> >>>>>> Thanks,
>>>> >>>>>> Walaa.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>>
>>>
--
*Vladimir Ozerov*
Founder
querifylabs.com
Hi,
Consider the query “SELECT * FROM t”.
The query engine needs to resolve the object “t” during semantic analysis.
In Iceberg, this could be a table, a view, a materialized view (soon).
Currently, the engine has to guess object type via multiple REST calls, e.g
loadTable -> loadView. This incr
ready have existing integration
> tests on Iceberg connector for Trino for Hive/Hadoop catalog, then just
> setting up the exact same tests against REST catalog for Trino connector
> can help systematically detect behavior differences between catalog types.
>
> Regards,
> Haizhou
REST catalog cannot handle some common cases now (namespace renames,
object references in views, etc).
With this in mind, it seems that while new S3 capabilities are formally
sufficient to implement a basic catalog, they can address only a small
fraction of real user requirements.
*Vladimir Ozerov
t; I would propose to add:
>
> private static final String REST_SSL_DISABLE_CERTIFICATE_CHECK =
> "rest.ssl.disable.cert.check";
>
> and used this for HTTP5 client setup.
>
> Regards
> JB
>
> On Wed, Nov 13, 2024 at 1:53 PM Vladimir Ozerov
> wrote:
> >
> > Hi,
&
d transforms? Is this in a v2 table? In a v2 table, the
> catalog should be free to remove void transforms. They are required for v1.
>
> On Wed, Oct 30, 2024 at 5:00 AM Vladimir Ozerov
> wrote:
>
>> Hi,
>>
>> When a user creates a table with void() transform on a s
a
property "rest.client.insecure-ssl" passed to the client.
What do you think about this? Apologize if it was already discussed
elsewhere, I couldn't find any relevant discussions.
Regards,
--
*Vladimir Ozerov*
Founder
querifylabs.com
e is not partitioned anyway.
However, some engines, such as Trino, currently retain void() partitioning
info for non-REST catalogs. What would be the proper expectation from the
Iceberg user in this case - should it observe void() in table schema or not?
Regards,
--
*Vladimir Ozerov*
Founder
querifylabs.com
Hi,
Sure, will do.
*Vladimir Ozerov*
Founder
querifylabs.com
Ср, 23 окт. 2024 г. в 08:50, Jean-Baptiste Onofré :
> I second Ryan here, it would be great to clarify in the
> "implementation notes" section.
>
> Thanks !
> Regards
> JB
>
> On Wed, Oct 23, 2024
g-open-api.yaml#L553
> )
> and update a table
> (
> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L975
> ),
> and it's up to the query engine to implement the "CREATE OR REPLACE"
> with the correct semantic.
>
> Regards
> JB
table data changes regularly. And the person
> changing the data may not be the person tuning the table settings.
>
> Hopefully that helps,
>
> Ryan
>
> On Sun, Oct 20, 2024 at 9:45 AM Vladimir Ozerov
> wrote:
>
>> Hi,
>>
>> Consider a REST catalo
ies
would be [a=1, b=3, c=4], while the user expects [b=3, c=4].
It looks like a bug because the user expects complete property replacement
instead of a merge. Shall we explicitly clear all previous properties
in RESTSessionCatalog.Builder.replaceTransaction?
Regards,
Vladimir.
--
*Vladimir Ozerov*
Founder
querifylabs.com
23 matches
Mail list logo