me it is the current
>>>>>>>> catalog of view definition..
>>>>>>>>
>>>>>>>> On Mon, Oct 7, 2024 at 9:31 AM Russell Spitzer <
>>>>>>>> russell.spit...@gmail.com> wrote:
>>>&
Hi Russell
Yes, you listed out the requirements to make the two Spark engines case
work. Basically, it allows each engine to dynamically resolve the table
identifiers under the correct catalog name.
Hello Walla
IMO, we don't need to list out such restrictions because they really depend
on the s
Having spent some time testing Nessie views with multiple engines (Dremio +
Spark) using different catalog names and different namespaces, I tend to
agree with Dan and Amogh that the current view spec is fine. Unlike
tables, I think when it comes to views, engines have to "work together" if
they e
Assuming the table contained smaller and better correlated files, I think a
workaround where you materialized the timestamp difference between two
columns could be effective for data file pruning. So if a particular
planned departure date was associated with a lot of delays and the table
was parti
uire the lineage, I would propose to move ahead
> without the lineage. Especially as this seems to be a problem with the View
> Spec that we can't solve now. If there is a demand to add the lineage in
> the future, once the catalog-alias problem has been solved, we can still
> add it
mer must understand the dialect anyway. In fact,
>> simply parsing the SQL definition seems like a more robust and
>> straightforward solution than using a lineage for every representation. I
>> believe this is why Benny suggested reverting to SQL parsing, and I agree
>> with
rage table identifier was provided as part of the
> MV definition? Sounds like a not very ideal UX. Note that it also conflicts
> with the spirit of requirement #3.
>
> Thanks,
> Walaa.
>
> On Thu, Sep 19, 2024 at 10:02 AM Benny Chow wrote:
>
>> Hi Jan
>>
>>
with a namespace and a
> name field, like so:
>
> {
>
> namespace: ["bronze"],
>
> name: "lineitem"
>
> }
>
> And require the storage table to be in the same catalog as the MV itself?
>
> Thanks,
>
> Jan
> On 19.09.24 00
two Nessie catalogs? They can't both
> be called LocalNessie.
>
> Thanks,
>
> Jan
> On 14.09.24 01:23, Benny Chow wrote:
>
> The main reason for putting the lineage into the view is so that "another"
> engine can enumerate out the tables in the view withou
ame" of the
>> identifier for a "Spark" dialect can be different then for a "Dremio"
>> dialect.
>>
>> The important part is that we still have a list of identifiers for each
>> representation that we can use with the catalog to obtain the state
se you can't store the catalog names of multiple representations
>>>> in the lineage. You would need to fallback to parsing the SQL for a
>>>> particular representation and rebuilding the full query tree to obtain the
>>>> identifiers.
>>>>
>>&
Benny, `default-catalog` is optional, while `default-namespace` is
>> required.
>>
>> I will retract my comment on the `summary`. it indicates the engine that
>> made the revision to the current view version. it doesn't really matter for
>> multi-engine/repres
Hi Steven
Yes, I definitely think #2 is easier and cleaner for both reader and writer
and that lineage is a separate feature all together. There's no need to
couple materialization state with view lineage.
The other way to look at helping to decide between the two options is
what is the most per
>>> For a refresh operation the query engine has to parse the SQL and
>>>>>> fully expand the lineage with it's children anyway. So the lineage is
>>>>>> not
>>>>>> strictly required.
>>>>>>
>>>>&
ineage record that is stored as part of the view metadata?
>>>
>>> No, I don't think so, I think #5 is a reasonable requirement and I think
>>> this violates it.
>>>
>>>
>>>> 2. If yes, should the lineage in the view be fully expanded
If we go with either UUID or Table Identifier + VersionID/SnapshotId in the
refresh state, then this list is fully expanded already. So, to validate
the freshness of a materialization, the engine doesn't even need to look at
the view lineage. IMO, the view lineage is nice to have but not a
necess
refer it because we did not want to leak the SQL identifiers to the
>> storage table since SQL identifiers are view concepts and fit better with
>> the view.
>>
>> Thanks,
>> Walaa.
>>
>> On Thu, Aug 8, 2024 at 4:12 PM Benny Chow wrote:
>>
>>> Maybe a th
Maybe a third option is to decouple the view lineage and materialization
state.
The view lineage can just list out the SQL identifiers+ref... we can still
decide whether this is just direct children or fully expanded.
The materialization state doesn't have to depend on the view lineage
(through ei
ccessed, which may not be the
> case in some producer/consumer scenarios.
>
> Best
> PF
>
>
>
>
> On Fri, 21 Jun 2024 at 18:28, Benny Chow wrote:
>
>> Hi Dan, looks like it is pretty common across engines and sometimes part
>> of the engine specific DDL op
;> Thanks Benny for bringing these issues up. I would agree with both of
>> your propositions.
>>
>> Regarding the naming of the fields, we can go with the naming that you
>> suggested. I just wanted to wait if some more people chime in with their
>> opinions.
>&g
domain knowledge), or would it be on the user's side? In the latter case
> the user would need to explicitly query the storage table directly,
> correct? With a grace period I think we could push it down to the engine.
>
> Thanks,
> Walaa.
>
>
> On Thu, Jun 20, 2024 a
t; unlimited)
>> - staleness clock starts with the first table change after refresh
>> - for unmanaged (non-iceberg) tables where we don't know when the table
>> changed, the staleness clock starts right after refresh
>>
>> Best
>> Piotr
>>
>>
>>
&
Hey Guys,
Great progress on the MV spec and thanks a ton to Jan and Walaa for
driving this. One of our latest achievements was that we finalized the
view lineage and materialization table refresh JSON so that we can
definitively and concisely describe what data is in the materialization
table.
R
was produced by
>> the update). We will follow up on that separately.
>>
>> Jan, do you want to reflect the lineage + state discussion in the doc
>> so we can iterate on the lineage JSON structure?
>>
>> Thanks,
>> Walaa.
>>
>>
>> On
I really enjoyed listening to the replay and hearing everyone's feedback!
I'm in agreement with all 3 consensus items, especially around Dan's idea
to separate the view's query tree lineage vs materialization's lineage
state.
I'll summarize my understanding about the distinction and add a few
comm
Thanks for organizing Jan. I’ll be there!
Benny
> On Jun 3, 2024, at 11:15 PM, Jan Kaul wrote:
>
>
> Hi all,
>
> we will have a video call to get together and discuss Iceberg Materialized
> Views. The call is on Wednesday, 5 June 2024, 16:00:00 UTC (9:00 PDT) and you
> can join the meeti
It's interesting to note that a tabular SQL UDF can be used to build a
*parameterized
*view. So, there's definitely a lot in common between UDFs and views.
Thanks
On Tue, May 28, 2024 at 9:53 AM Walaa Eldin Moustafa
wrote:
> I think there is a disconnect about what is perceived as a "UDF". The
1:35 PM, Walaa Eldin Moustafa wrote:Sounds good. I am assuming we agree it is not required for either snapshot or timestamp?Thanks,Walaa.On Fri, May 17, 2024 at 1:17 PM Benny Chow <btc...@gmail.com> wrote:I like Jack's suggestions to capture the ref type and value! When the ref typ
drift).
>
> If we have feedback on the actual properties used in the properties model
> as defined in the PR, we can have the discussion there.
>
> THanks,
> Walaa.
>
>
> On Thu, May 16, 2024 at 3:22 PM Benny Chow wrote:
>
>> Hi Walaa
>>
>> I left co
ceberg metadata fields
> as engine properties) just for the lack of other cleaner options does not
> sound like a good idea in both short and long term.
>
> Let me know your thoughts.
>
> Thanks,
> Walaa.
>
>
>
> On Tue, May 14, 2024 at 5:12 PM Benny Chow wrote:
>
I agree with Szheon here. I think storing the materialization lineage as a bunch of properties is brittle. This lineage information is needed by engines to validate the staleness of a materialization and also to perform full or incremental refreshes. There’s a lot to capture here. Maybe we shoul
+1 for separate view and table objects. Walaa's Spark
implementation demonstrates how little change it takes on the Iceberg APIs
to start sharing MVs between engines.
Thanks
Benny
On Thu, Apr 18, 2024 at 9:52 AM Walaa Eldin Moustafa
wrote:
> Hi everyone,
>
> I would like to make a proposal for
Hi Manu
This is Walaa's Spark implementation for option 1:
https://github.com/apache/iceberg/pull/9830/files/a9e1bee3b5bf5914e5330d3b195042aea33868c9
There's no code for option 2 yet.
Best
Benny
On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang wrote:
> Thanks Walaa for the summary. It's unclear to
for your message. I think the idea is to "smoothly"
> (implicitly) add regular table storage over a view.
>
> The MV approach is right now in discussion, without consensus so far.
> We plan to have document/meeting to discuss further.
>
> Regards
> JB
>
> On
Hey Everyone
I've been following the MV spec and listened in on the last community
sync. I'd like to chime in from a query planner point of view on how the
MVs could be used.
Suppose a user has a dashboard query like:
*SELECT product, sum(sales) *
*FROM view1 *
*WHERE brand = 'X' and year = '20
35 matches
Mail list logo