Re: Current Status of View Specification

2023-03-14 Thread Walaa Eldin Moustafa
+1 to get a basic implementation in. Some of the discussions/feedback on the API PR slightly changed the API from the initial proposed API that probably more closely resembled Netflix's implementation. Getting an implementation going on the finalized APIs could give some good feedback to the spec o

Re: [DISCUSS] Spark 3.1 support?

2023-04-20 Thread Walaa Eldin Moustafa
LinkedIn is still on Spark 3.1. I am guessing a number of other companies could be in the same boat. I feel the argument for Spark 2.4 is different from that of Spark 3.1 and it would be great if we can continue to support 3.1 for some time. On Wed, Apr 19, 2023 at 11:06 AM Ryan Blue wrote: > +1

Re: [DISCUSS] Spark 3.1 support?

2023-04-25 Thread Walaa Eldin Moustafa
To elaborate on LinkedIn's use case: * LinkedIn maintains its own fork, but we would like to keep it as close to upstream as possible. * +1 to Manu on migrations in large companies could take well beyond 18 months, and it is unlikely to migrate/upgrade more frequently. * One important use case for

Re: [DISCUSS] Spark 3.1 support?

2023-04-26 Thread Walaa Eldin Moustafa
if that seems like a good compromise. > > - Anton > > > On Apr 25, 2023, at 8:01 PM, Walaa Eldin Moustafa > wrote: > > To elaborate on LinkedIn's use case: > > * LinkedIn maintains its own fork, but we would like to keep it as close > to upstream as possible

Re: Reading Glue catalog Views created on top of Iceberg tables.

2023-09-26 Thread Walaa Eldin Moustafa
View support in Spark requires the implementation of the View DataSource V2 SPIP (see associated Spark Jira SPARK-31357 ). The interface P

Re: Reading Glue catalog Views created on top of Iceberg tables.

2023-09-26 Thread Walaa Eldin Moustafa
Sanket A. > > > > *From:* Walaa Eldin Moustafa > *Sent:* Tuesday, September 26, 2023 8:00 PM > *To:* dev@iceberg.apache.org > *Subject:* [EXT] Re: Reading Glue catalog Views created on top of Iceberg > tables. > > > > View support in Spark requires

Re: Feedback on Iceberg Materialized View Spec

2023-11-07 Thread Walaa Eldin Moustafa
Are there parallel discussions? So far I have been following/commenting on the issue [1], not the Google doc. Can we converge on the issue going forward? Jan, if there are parallel discussions on the doc, could you summarize the open topics from the doc in the issue? Else, we can close the open top

Re: Branching and Tagging for Iceberg Views

2023-11-14 Thread Walaa Eldin Moustafa
Also, view metadata versions and (underlying) table snapshots/versions are orthogonal concepts. For example, theoretically, one could time-travel in views along two dimensions: view metadata version and underlying data version. Hence, I do not think that data versioning in tables corresponds exactl

Re: MOR CDC view support

2023-11-21 Thread Walaa Eldin Moustafa
We met on Wednesday and created the channel #cdc-read on Iceberg Slack. A summary of the meeting discussion points is there. Thanks, Walaa. On Tue, Nov 21, 2023 at 8:06 AM Renjie Liu wrote: > Hi: > > Is there any update on this topic? > > On Tue, Nov 14, 2023 at 07:25 Yufei Gu wrote: > >> Hi f

Re: Invitation to contribute to OneTable

2023-12-04 Thread Walaa Eldin Moustafa
Thanks Jesus for sharing OneTable. Looks like it touches upon some of the topics we discussed in the Rise of Table Formats panel at VLDB back in September. I was browsing through the source code, and I ran into the OneField

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Walaa Eldin Moustafa
Can we assume it is the responsibility of the server to ensure determinism (e.g., by caching the results along with query ID)? I think start and offset has the advantage of being parallelizable (as compared to continuation tokens). On the other hand, using "asOf" can be complex to implement and ma

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Walaa Eldin Moustafa
t;asOf" can be complex to implement and may be > too powerful for the pagination use case > > I don't think that we want to add `asOf`. If the service chooses to do > this, it would send a continuation token that has the information embedded. > > On Tue, Dec 19, 2023 at 9:4

Re: Pagination for List APIs in the REST spec

2023-12-19 Thread Walaa Eldin Moustafa
now. Would you need to parallelize the client for > listing namespaces or tables? That seems odd to me. > > On Tue, Dec 19, 2023 at 9:48 AM Walaa Eldin Moustafa < > wa.moust...@gmail.com> wrote: > >> > You can parallelize with opaque tokens by sending a starting point fo

Column-Level Key-Value Properties (Tags) in Iceberg

2024-01-03 Thread Walaa Eldin Moustafa
Hi Iceberg Developers, I would like to start a discussion on a potential enhancement to Iceberg around the implementation of key-value style properties (tags) for individual columns or fields. I believe this feature could have significant applications, especially in the domain of data governance.

Re: Column-Level Key-Value Properties (Tags) in Iceberg

2024-01-03 Thread Walaa Eldin Moustafa
a "policy" field that > contains sub-fields like the table's basic access permission > (READ/WRITE/ADMIN), authorized columns, data filters, etc. I am not sure if > Iceberg needs its own policy spec though, that might go a bit too far. > > Any thoughts? > > Best, &

Re: Column-Level Key-Value Properties (Tags) in Iceberg

2024-01-04 Thread Walaa Eldin Moustafa
Walaa, >>> >>> Netflix internal Spark and Iceberg have supported column metadata in >>> Iceberg tables since Spark 2.4. The Spark data type is >>> `org.apache.spark.sql.types.Metadata` in StructType. The feature is used by >>> ML teams. >>> >&g

Re: Column-Level Key-Value Properties (Tags) in Iceberg

2024-01-08 Thread Walaa Eldin Moustafa
dge. >> > >> > I do think it would be nice for engines that have similar concepts if >> it really can be natively integrated and I'm sure there are other use cases >> for column properties, but it still feels somewhat niche. >> > >> > That be

Re: Materialized view integration with REST spec

2024-02-19 Thread Walaa Eldin Moustafa
I think it would help if we answer the question of whether an MV is a view + storage table (and degree of exposing this underlying implementation) in the context of the user interfacing with those concepts: For the end user, interfacing with the engine APIs (e.g., through SQL), materialized view A

Re: Materialized view integration with REST spec

2024-02-20 Thread Walaa Eldin Moustafa
ign Question in the doc, and add the >> options. This will allow us to flesh out this alternative >> option(s). Maybe Micah's point about modifying existing proposal to >> 'embed' the required table metadata fields in the existing view metadata, >> is o

Re: Materialized view integration with REST spec

2024-02-21 Thread Walaa Eldin Moustafa
t; >>>> Of course we also need threads that express our preferences (voting). I >>>> would suggest to keep these separate from discussions about single points >>>> so that they can be persisted in the document. >>>> >>>> After a pha

Re: Materialized view integration with REST spec

2024-02-28 Thread Walaa Eldin Moustafa
Thanks Ryan for the insights. I agree that reusing existing metadata definitions and minimizing spec changes are very important. This also minimizes spec drift (between materialized views and views spec, and between materialized views and tables spec), and simplifies the implementation. In an effo

Re: Materialized view integration with REST spec

2024-02-29 Thread Walaa Eldin Moustafa
gular view or table. >> >> ViewCatalog viewCatalog = (ViewCatalog) catalog; >> >> // REST: GET /namespaces/db1/views/mv1 >> // non-REST: load JSON file at metadata_location >> View mv = viewCatalog.loadView(TableIdentifier.of("db1", "mv1")); >

Re: Materialized view integration with REST spec

2024-02-29 Thread Walaa Eldin Moustafa
property "materialized" is set to "true" for a >>> MV and "false" for a regular view. And the table property "storage_table" >>> is set to "true" for a storage table and "false" for a regular table. The >>> absen

Re: Materialized view integration with REST spec

2024-02-29 Thread Walaa Eldin Moustafa
e identifiers stored in a > view metadata field, one for each materialization. But there's a lot of > assumptions about how we come out on these questions before we get to how > to store metadata. > > On Thu, Feb 29, 2024 at 4:35 PM Walaa Eldin Moustafa < > wa.moust...@gmail.co

Re: Materialized view integration with REST spec

2024-03-01 Thread Walaa Eldin Moustafa
t thing to note is >>>>>> that the catalog methods exhibit two different behaviors: the *create >>>>>> and load methods deal with the entire entity*, while the *update(commit) >>>>>> method only deals with partial changes* to the entities. >

Re: Materialized view integration with REST spec

2024-03-01 Thread Walaa Eldin Moustafa
discussion about >> trade-offs. >> >> Does that sound reasonable? >> >> Ryan >> >> >> On Fri, Mar 1, 2024 at 11:09 AM Walaa Eldin Moustafa < >> wa.moust...@gmail.com> wrote: >> >>> I am finding it hard to interpret the o

Re: MV Query Planning Use Case

2024-03-07 Thread Walaa Eldin Moustafa
Hi Benny, For the first part of your question, yes, the intention is to switch between the virtual view and the materialized storage table transparently, and use the storage table as long as it is fresh. For the second part, this dimension of the MV problem has been discussed as part of the curre

Re: Materialized view integration with REST spec

2024-03-22 Thread Walaa Eldin Moustafa
>> Hi: >>>> >>>> Sorry I didn't make it to join the last community sync. Did we reach >>>> any conclusion about mv spec? >>>> >>>> On Tue, Mar 5, 2024 at 11:28 PM himadri pal wrote: >>>> >>>>> For

Re: Materialized view integration with REST spec

2024-03-24 Thread Walaa Eldin Moustafa
Thanks Himadri for the questions. At this point, our objective is to have a common understanding of both options and their pros and cons. The best way to achieve this is to iterate on the doc to discuss the details of each option or their pros and cons. We can always add more details or update the

Re: Materialized view integration with REST spec

2024-03-25 Thread Walaa Eldin Moustafa
et. > > Best > Benny > > On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang > wrote: > >> Thanks Walaa for the summary. It's unclear to me which are the reference >> implementation for option 1 and reference MV spec for option 2 from the >> context. I can find so

Re: Materialized view integration with REST spec

2024-03-26 Thread Walaa Eldin Moustafa
tion to look like. This would > simplify the discussion about pros and cons, because we can reference or > extend the description. I will try to find the time later today. > > Thanks, > > Jan > On 3/25/24 4:39 PM, Walaa Eldin Moustafa wrote: > > Thanks Jan! I am not sure

Re: Materialized view integration with REST spec

2024-03-27 Thread Walaa Eldin Moustafa
egards > JB > > On Tue, Mar 26, 2024 at 3:05 PM Walaa Eldin Moustafa > wrote: > > > > Thanks Jan! To avoid spreading discussions on multiple places, I will > continue the comments on the doc. Also it is easier to run into > communication gaps in email threads since eff

Re: Materialized view integration with REST spec

2024-04-01 Thread Walaa Eldin Moustafa
know if I can help on that. > >> > >> I'm working on a PR to list the proposals on the website and the > >> "stale reminder". > >> > >> Thanks ! > >> Regards > >> JB > >> > >> On Thu, Mar 28, 2024 at 6:52 AM

Re: Materialized view integration with REST spec

2024-04-02 Thread Walaa Eldin Moustafa
te: > >> Hi Walaa >> >> Yes, I think it makes sense to go with a vote, now that pros/cons are >> clearly state in the doc. >> >> Thanks ! >> Regards >> JB >> >> On Tue, Apr 2, 2024 at 3:59 AM Walaa Eldin Moustafa >> wrote: >

Re: Materialized view integration with REST spec

2024-04-03 Thread Walaa Eldin Moustafa
e can schedule a time). >> > >> > -Dan >> > >> > >> > >> > On Mon, Apr 1, 2024 at 10:45 PM Jean-Baptiste Onofré >> wrote: >> >> >> >> Hi Walaa >> >> >> >> Yes, I think it makes sense

Re: Materialized view integration with REST spec

2024-04-17 Thread Walaa Eldin Moustafa
;> available). >> >> > >> >> > The vote is a confirmation of the direction, not a way to settle >> disagreements about approaches. >> >> > >> >> > I think we need to have a more focused discussion (this can either >> be at

[Proposal] Add support for Materialized Views in Iceberg

2024-04-18 Thread Walaa Eldin Moustafa
Hi everyone, I would like to make a proposal for issue [1] to support materialized views in Iceberg. The support leverages two separate objects, an Iceberg view and an Iceberg table to implement materialized views. Each object retains relevant metadata to support the MV operations. An initial desi

Re: [Proposal] Add support for Materialized Views in Iceberg

2024-04-25 Thread Walaa Eldin Moustafa
;> +1 for this proposal. >>>>> >>>>> On Fri, Apr 19, 2024 at 3:40 PM Ajantha Bhat >>>>> wrote: >>>>> >>>>>> +1 for the proposal. >>>>>> >>>>>> - Ajantha >>>>>&

Re: [Proposal] Add support for Materialized Views in Iceberg

2024-04-29 Thread Walaa Eldin Moustafa
gt; > On Apr 25, 2024, at 2:08 PM, Jean-Baptiste Onofré >> wrote: >> > >> > +1 to separate, it makes sense to me. >> > >> > Regards >> > JB >> > >> > On Thu, Apr 18, 2024 at 11:50 AM Walaa Eldin Moustafa >> > wrote: &g

Materialized Views: Next Steps

2024-05-06 Thread Walaa Eldin Moustafa
Hi Everyone, Thanks again for participating in the modeling discussion [1]. Since the outcome of this discussion was to model materialized views as separate objects, an Iceberg view and a table, I think the next step should be discussing the metadata details for each object. I have created a PR ht

Re: Materialized Views: Next Steps

2024-05-07 Thread Walaa Eldin Moustafa
can be said about the storage table metadata. > > We may keep the separate materialized view page to document motivation, > freshness semantics, etc.. > > On Mon, May 6, 2024 at 10:58 PM Walaa Eldin Moustafa < > wa.moust...@gmail.com> wrote: > >> Hi Everyone, >> >

Re: Materialized Views: Next Steps

2024-05-08 Thread Walaa Eldin Moustafa
ce the draft is finalized we can adopt the PR to reflect the consensus > from the google doc. > > Best wishes, > > Jan > On 07.05.24 19:11, Walaa Eldin Moustafa wrote: > > Thanks Steven. I feel it is needed so the MV spec is not scattered across > the table and view spec pa

Re: Materialized Views: Next Steps

2024-05-08 Thread Walaa Eldin Moustafa
ached consensus which > took a considerable amount of time. > > Thanks, Jan > On 08.05.24 10:21, Walaa Eldin Moustafa wrote: > > Thanks Jan. I think we moved on to more alignment steps beyond that doc a > while ago. After that doc, we have discussed the topic further in 2 de

Re: Materialized Views: Next Steps

2024-05-09 Thread Walaa Eldin Moustafa
start with a completely >> different design because we are bound to have the same discussions all over >> again. >> >> Thanks, Jan >> On 08.05.24 12:11, Walaa Eldin Moustafa wrote: >> >> The only consensus the community had was on the object model through the

Re: Materialized Views: Next Steps

2024-05-09 Thread Walaa Eldin Moustafa
t Spec in the design doc. What do people think? > > Thanks > Szehon > > > > On Thu, May 9, 2024 at 5:35 PM Walaa Eldin Moustafa > wrote: > >> Thanks Szehon. >> >> The reason for the difference is that the proposal in the Google doc is >> based on a

Re: Materialized Views: Next Steps

2024-05-09 Thread Walaa Eldin Moustafa
complexity over new key/value properties? > To me, the vote seemed to just rule out using a combined catalog object > (MaterializedView) in favor of re-using the Table and View metadata models, > not to prevent change to the Table and View model. > > Thanks > Szehon > > > On

Re: Materialized Views: Next Steps

2024-05-10 Thread Walaa Eldin Moustafa
> field on the Storage Table's snapshot metadata. This allows you to 'time > travel', 'branch', and have this metadata life cycle integrated via normal > snapshots lifecycle operations. > > So that's my rationale. Not sure if we can come to an agr

Re: Materialized Views: Next Steps

2024-05-14 Thread Walaa Eldin Moustafa
are quicker to adopt API changes >(than engines), e.g., using Iceberg library to manipulate MVs, or parsing >metadata files directly >- Spark view catalog API can evolve separately from Iceberg API and >spec changes > > > Thanks all for the great discussion! >

Re: Materialized Views: Next Steps

2024-05-14 Thread Walaa Eldin Moustafa
wonder what are your > thoughts here Walaa? > > Thanks > > On May 14, 2024, at 4:20 PM, Walaa Eldin Moustafa > wrote: > >  > Thanks John. The current metadata does not sound complex. We need to track > the underlying table snapshot IDs as well as the view version

Re: Materialized Views: Next Steps

2024-05-16 Thread Walaa Eldin Moustafa
;> and Dataframe libraries. This is also how Trino is doing it. That's why we >> chose the design in the google doc. >> >> Storing the storage table identifier as a property might work. >> >> Thanks, Jan >> On 15.05.24 02:38, Walaa Eldin Moustafa wrote: >

Re: Materialized Views: Next Steps

2024-05-17 Thread Walaa Eldin Moustafa
t; > Also, Jan brought up interesting use case with BI tool using the MV > without SQL representation. The BI tool can get all table and view > dependencies if the lineage is complete. > > Thanks > > > On May 17, 2024, at 1:35 PM, Walaa Eldin Moustafa > wrote: > >

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Walaa Eldin Moustafa
I think there is a disconnect about what is perceived as a "UDF". There are 2 flavors: (1) Functions that are defined by the user whose definition is a composition of other built-in functions/SQL expressions. (2) Custom code written in imperative function according to a Java/Scala/Python API, etc.

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-05-28 Thread Walaa Eldin Moustafa
dy be a huge step > forward. > > -Jack > > > On Tue, May 28, 2024 at 1:40 PM Benny Chow wrote: > >> It's interesting to note that a tabular SQL UDF can be used to build a >> *parameterized >> *view. So, there's definitely a lot in common between UD

Re: Summary of Iceberg Materialized View Meeting

2024-06-06 Thread Walaa Eldin Moustafa
Hi Benny, Your understanding is correct. Another point that we discussed was the type of APIs engines can use to conveniently update the storage table with view query results as well as set the snapshot summary on the output snapshot (one that was produced by the update). We will follow up on tha

Re: Summary of Iceberg Materialized View Meeting

2024-06-06 Thread Walaa Eldin Moustafa
* lineage state JSON structure On Thu, Jun 6, 2024 at 11:31 PM Walaa Eldin Moustafa wrote: > Hi Benny, > > Your understanding is correct. > > Another point that we discussed was the type of APIs engines can use to > conveniently update the storage table with view query result

Re: Support Securable Objects in Iceberg REST Catalog

2024-06-08 Thread Walaa Eldin Moustafa
Thanks Jack and team for working on this proposal. I went over it and it is very well written. I particularly like: (1) The fact that it is adopting the SQL standard and adjusting some of its semantics to fit the Iceberg model. (2) It includes views from v1. Views are a very important tool for po

Re: Summary of Iceberg Materialized View Meeting

2024-06-20 Thread Walaa Eldin Moustafa
me open comments that are not relevant anymore due to the > changes, please close them so that we can clean up the comments section a > bit. > > Regards, > > Jan > On 07.06.24 08:33, Walaa Eldin Moustafa wrote: > > * lineage state JSON structure > > On Thu, Jun 6, 2024

Re: Iceberg MV Refresh

2024-06-20 Thread Walaa Eldin Moustafa
Benny, is the suggestion to couple the "refresh-start-timestamp-ms" property with a grace period as well? Also, could you clarify which timestamp "refresh-start-timestamp-ms" refers to: (1) Timestamp when refresh is triggered (2) Timestamp when refresh is concluded and the snapshot is written. Als

Re: Iceberg MV Refresh

2024-06-20 Thread Walaa Eldin Moustafa
n on top of 100 tables (possibly a mix of Iceberg and > non-Iceberg) and you know that the refresh job ran on say 6/20/2024 > 12:02:10 UTC, then whatever data is in the materialization has to be "fresh > as of" 6/20/2024 12:02:10 UTC. > > Thanks > Benny > > > &

Re: Feedback Collection: Bylaws in Iceberg

2024-06-25 Thread Walaa Eldin Moustafa
Thanks, Jack, for taking the time to put this initiative together. I will borrow Julian Hyde’s example of the blind men and the elephant [1]: “Do you know the story of the blind men and the elephant? Each man touches a different part of the elephant, so they assume they are touching a different a

Re: [DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-06 Thread Walaa Eldin Moustafa
Hi Szehon, Thanks for sharing this proposal. We have thought along the same lines and implemented an external system (LakeChime [1]) that retains snapshot + partition metadata for longer (actual internal implementation keeps data for 13 months, but that can be tuned). For efficient analysis, we ha

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-06 Thread Walaa Eldin Moustafa
Hi Yufie, The original proposal did not seem to indicate that the metadata tables will be "materialized" (outside regular Iceberg metadata since most of those metadata tables are actually "views" on Iceberg metadata). However, in the last response, it seems metadata could potentially be written to

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-07-16 Thread Walaa Eldin Moustafa
gt; >>> Hi All, >>> Please find the proposal link >>> https://github.com/apache/iceberg/issues/10432 >>> >>> Google doc link is attached in the proposal. >>> And Thanks Stephen Lin <https://github.com/sxlin> for working on it. >>>

Re: [DISCUSS] DROP PARTITION in Spark

2024-07-17 Thread Walaa Eldin Moustafa
Hi Jean, One use case is Hive to Iceberg migration, where DROP PARTITION does not need to change to DELETE queries prior to the migration. That said, I am not in favor of adding this to Iceberg directly (or Iceberg-Spark) due to the reasons Jean mentioned. It might be possible to do it in a custom

Re: [ANNOUNCE] Welcoming new committers and PMC members

2024-07-23 Thread Walaa Eldin Moustafa
Congratulations everyone! Great to see the community growing. Thanks, Walaa. On Tue, Jul 23, 2024 at 8:51 AM Alex Dutra wrote: > Congratulations to you all! > > Thanks, > Alex > > On Tue, Jul 23, 2024 at 5:30 PM Jack Ye wrote: > >> Congratulations!! >> >> Best, >> Jack Ye >> >> On Tue, Jul 23,

Re: [DISCUSS] adoption of format version 3

2024-07-31 Thread Walaa Eldin Moustafa
Another feature that was planned for V3 is support for default values. Spec doc update was already merged a while ago [1]. Implementation is ongoing in this PR [2]. [1] https://iceberg.apache.org/spec/#default-values [2] https://github.com/apache/iceberg/pull/9502 Thanks, Walaa. On Wed, Jul 31,

Re: [DISCUSS] Guidelines for committing PRs

2024-08-02 Thread Walaa Eldin Moustafa
My concern with this change (in its current form) is that it combines (mixes?) three things: [1] A few paragraphs/statements that delegate to the comitter's judgement call. (e.g., "Committer is trusted", "If committer feels" ..). So the value of adding them is not very clear to me. [2] A few thing

Re: [DISCUSS] Describing REST Server capabilities

2024-08-05 Thread Walaa Eldin Moustafa
Catching up here. >From Eduard's doc [1], it seems that at the end of the day, the capability boils down to whether an end point is implemented by the server or not. Therefore, I feel we could simplify things by skipping the categorization/grouping (e.g., tables, views, udfs, etc) and just allow s

Re: [Discussion] Versioned SQL UDFs (Catalog routines) in Iceberg

2024-08-07 Thread Walaa Eldin Moustafa
eview on this. >>> >>> We didn't find any blocker for the spec. >>> I will wait for a week and If no more review comments, I will raise a PR >>> for spec addition next week. >>> >>> If anyone else is interested, please have a look at the proposa

[DISCUSS] Materialized Views: Lineage and State information

2024-08-08 Thread Walaa Eldin Moustafa
Hi Everyone, In the last community sync on Materialized Views [1], we agreed to split the information that is used to determine the materialized view staleness to two parts: Lineage Information and State Information. We have made a lot of progress on representing both but one issue remains open:

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-08 Thread Walaa Eldin Moustafa
also capture the view lineage. So, what I am suggesting is that we > do both but not couple them together through any sort of sequence ID or > UUID. > > Thanks > Benny > > On Thu, Aug 8, 2024 at 2:04 PM Walaa Eldin Moustafa > wrote: > >> Hi Everyone, >> >> In th

Re: [DISCUSS] Guidelines for committing PRs

2024-08-09 Thread Walaa Eldin Moustafa
a process that guides both the contributor and committer for >>>>> what qualifies as proposal vs a direct PR). >>>> >>>> >>>> If I am understanding correctly, this is saying more guidelines for >>>> what should go through th

Re: [DISCUSS] Guidelines for committing PRs

2024-08-12 Thread Walaa Eldin Moustafa
I think the issue with the first paragraph is about: 1- The perceived contradiction between a) trusting committers to act in the best interest of the project and b) simultaneously providing specific guidelines on how to act (e.g., by avoiding conflicts of interest). 2- The specific examples given

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-13 Thread Walaa Eldin Moustafa
, Aug 8, 2024 at 4:43 PM Walaa Eldin Moustafa wrote: > Thanks Benny! We discussed this option during the meeting but we did not > prefer it because we did not want to leak the SQL identifiers to the > storage table since SQL identifiers are view concepts and fit better with > the view

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-14 Thread Walaa Eldin Moustafa
Thanks Benny. For refs, I am +1 to represent them as UUID + optional ref, although we can iterate ohe exact JSON structure (e.g., another option is splitting for (UUID) state from (UUID + ref) state into two separate higher-level fields). Generally agree on REFRESH VIEW strategy could be up to the

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-15 Thread Walaa Eldin Moustafa
are making is that every id that is used in the >>>> refresh-state has to be defined in the lineage. >>>> So the question about using uuids is rather, can the query engine trust >>>> that the id defined in the lineage is the uuid of the table. >>>> >&

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-15 Thread Walaa Eldin Moustafa
the MV and >> possibly additional views to reconstruct the lineage map. It's just a lot >> slower and more work for the engine when there is a MV that references a >> lot of views (and those views reference additional views). >> >> Thanks >> Benny >

Re: [DISCUSS] REST Endpoint discovery

2024-08-15 Thread Walaa Eldin Moustafa
Thank you Eduard for sharing this version of the proposal. Looks simple, functional, and extensible. On Thu, Aug 15, 2024 at 1:10 PM Ryan Blue wrote: > I think I'm fine either way. I lean toward the simplicity of the strings > in the proposal but would not complain if we went with Yufei's sugges

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-16 Thread Walaa Eldin Moustafa
; currently in the state map), without reparsing the view tree. > > For a refresh operation the query engine has to parse the SQL and fully > expand the lineage with it's children anyway. So the lineage is not > strictly required. > > If I understand correctly, most of you are

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-16 Thread Walaa Eldin Moustafa
o fully expand the query tree. > > Best wishes, > > Jan > > Am 16.08.2024 18:13 schrieb Walaa Eldin Moustafa : > > Thanks Jan for the summary. > > For this point: > > > For a refresh operation the query engine has to parse the SQL and fully > expand t

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-16 Thread Walaa Eldin Moustafa
ineage in the view as a nice-to-have. > > > I think if lineage is introduced to the View metadata, it should only hold > direct dependencies for the reasons already discussed. IMO, I think the > potential overlap is OK as they serve two different purposes. > > Cheers, > Micah &

Re: [DISCUSS] Guidelines for committing PRs

2024-08-16 Thread Walaa Eldin Moustafa
re are several exceptions to this process:", but it was not clear what >> process/which part of it. Hence, being more direct could simplify parsing >> these two paragraphs. > > > I have tried to remove the ambiguity on the "this process". For anything > not add

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-19 Thread Walaa Eldin Moustafa
could be handled if it proves to be a problem but hopefully engines are > placing a reasonable cap on view depth + number of tables per view which > puts an upper bound on overall size. > > Thanks, > Micah > > On Fri, Aug 16, 2024 at 4:56 PM Walaa Eldin Moustafa < > wa.mou

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-20 Thread Walaa Eldin Moustafa
the table and view versions at the time of > materialization. Directly using table identifiers seems pretty natural to > me. So, I'm +1 for: > > no lineage > + refresh-state key = identifier > > Thanks > Benny > > > > On Mon, Aug 19, 2024 at 9:19 P

Re: [VOTE] REST Endpoint discovery

2024-08-20 Thread Walaa Eldin Moustafa
+1 non-biding Thanks for driving this Eduard. On Tue, Aug 20, 2024 at 12:17 PM Daniel Weeks wrote: > +1 > > On Tue, Aug 20, 2024 at 11:19 AM Yufei Gu wrote: > >> +1 >> >> Yufei >> >> >> On Tue, Aug 20, 2024 at 11:16 AM Eduard Tudenhöfner < >> etudenhoef...@apache.org> wrote: >> >>> Hey everyon

Re: [DISCUSS] Materialized Views: Lineage and State information

2024-08-21 Thread Walaa Eldin Moustafa
d. You > would still be able to determine the freshness of the precomputed data. > > Thanks, > > Jan > On 20.08.24 18:53, Walaa Eldin Moustafa wrote: > > Theoretically, we could have multiple catalogs each with different table > name entries but referring to the same Ic

Re: Supporting TINYINT and SMALLINT in Iceberg

2021-12-03 Thread Walaa Eldin Moustafa
argument that Trino should use INT in https://github.com/trinodb/trino/pull/3483 to minimize incompatibility issues. Thoughts? On Thu, Dec 2, 2021 at 5:28 PM Walaa Eldin Moustafa wrote: > Hi all, > > I wanted to bring up the topic of supporting TINYINT and SMALLINT data > types in Ic

Re: Supporting TINYINT and SMALLINT in Iceberg

2021-12-10 Thread Walaa Eldin Moustafa
not be used. This supports the side > of the argument that Trino should use INT in > https://github.com/trinodb/trino/pull/3483 to minimize incompatibility > issues. Thoughts? > > On Thu, Dec 2, 2021 at 5:28 PM Walaa Eldin Moustafa > wrote: > >> Hi all, >> >>

Time-sliced incremental scan

2022-01-06 Thread Walaa Eldin Moustafa
Hi Iceberg devs, We have been considering the problem of Time-sliced incremental scan (i.e., reading data that is committed between two timestamps), and I ran into this thread [1] in the Iceberg dev mailing list. The summary of the thread is that incremental scan should leverage snapshot IDs as op

Re: Time-sliced incremental scan

2022-01-20 Thread Walaa Eldin Moustafa
n commits in a transaction. >> >> So there are major issues with timestamps. We would want to make it clear >> that timestamps are for convenience, and not for incremental consumption >> (because of the issue from that thread) and may not reflect the actual >> table state (or

Hive table compatibility for Iceberg readers

2022-01-27 Thread Walaa Eldin Moustafa
Hi Iceberg community, We have been working on converting our tables from the Hive table format to Iceberg. In order to achieve that switch transparently, we have introduced a number of Hive table features and compatibility modes in Iceberg, and connected them to Spark DataSource API. At a high lev

Re: Hive table compatibility for Iceberg readers

2022-01-31 Thread Walaa Eldin Moustafa
another repo, but we could save some steps if folks here think it is worth moving to Iceberg. Thanks, Walaa. On Thu, Jan 27, 2022 at 2:26 PM Walaa Eldin Moustafa wrote: > Hi Iceberg community, > > We have been working on converting our tables from the Hive table format > to Iceberg.

Re: [DISCUSS] Support streaming read Iceberg V2 table

2022-02-09 Thread Walaa Eldin Moustafa
Hi Reo, I am not sure if I am reading the proposal correctly or not, but does the proposal suggest changing the data file format/schema to support the operation type? I think one of the Iceberg principles is not to change the data file open formats (Avro, ORC, Parquet, etc) or semantics in an Iceb

Re: Hive table compatibility for Iceberg readers

2022-02-09 Thread Walaa Eldin Moustafa
here is no guarantee that other implementations will read > them and Iceberg cannot write them in this form. I'm fairly confident that > not allowing unions to be written is a good choice, but I would support > being able to read them. > > Ryan > > On Mon, Jan 31,

Re: Hive table compatibility for Iceberg readers

2022-02-11 Thread Walaa Eldin Moustafa
d default values be stored in Iceberg metadata for each type? > Currently, the spec changes just mention defaults without going into detail > about how they are tracked and what rules there are about them. > > On Wed, Feb 9, 2022 at 6:32 PM Walaa Eldin Moustafa > wrote: > >&

Re: Hive table compatibility for Iceberg readers

2022-03-09 Thread Walaa Eldin Moustafa
The union type conversion PR is up: https://github.com/apache/iceberg/pull/4242. Thanks, Walaa. On Fri, Feb 11, 2022 at 8:53 AM Walaa Eldin Moustafa wrote: > Thanks Ryan! Yes there is an active discussion on the PR on the spec > aspect. > > On Fri, Feb 11, 2022 at 8:47 AM Ryan

Re: Capability to create table without reassigning IDs

2022-08-22 Thread Walaa Eldin Moustafa
ence we are >>> not using the SQL API, and our goal is to reuse the files for performance >>> reasons. >>> >>> Vikram >>> -- >>> *From:* Ryan Blue >>> *Sent:* Sunday, August 21, 2022 12:06 PM >>> *To:* W

Re: Geospatial/geometry support

2022-10-27 Thread Walaa Eldin Moustafa
Hi Thomas, It sounds what you are trying to achieve is to provide a custom partition function? There is some discussion here https://github.com/apache/iceberg/issues/1482. I guess supporting geometry through this framework makes more sense since it does not require extending the Iceberg type syste

Re: Geospatial/geometry support

2022-10-27 Thread Walaa Eldin Moustafa
t's probably done by storing each object > using a standard envelope definition (bbox?) that we can use in > partition transforms, and then a WKB column for the actual object. > > What do you think? > > Ryan > > On Thu, Oct 27, 2022 at 4:03 AM Walaa Eldin Moustafa >

Re: Geospatial/geometry support

2022-10-27 Thread Walaa Eldin Moustafa
antics are invisible to Iceberg, and are just interpretable by the application. On Thu, Oct 27, 2022 at 10:08 AM Ryan Blue wrote: > Walaa, > > How are those types defined? Would we need to have definitions in the > Iceberg spec? > > Ryan > > > On Thu, Oct 27, 2022 at 9:47

  1   2   >