I think it makes sense if I use the "Description" section of your
document to clarify how I imagine a combined MV solution to look like.
This would simplify the discussion about pros and cons, because we can
reference or extend the description. I will try to find the time later
today.
Thanks,
Jan
On 3/25/24 4:39 PM, Walaa Eldin Moustafa wrote:
Thanks Jan! I am not sure if you would like to make suggestions to
revise the options themselves or the current options pros and cons. In
either case, as mentioned earlier, we can do that on the doc and once
we agree on the options and their pros and cons we can move forward.
How does that sound?
Thanks,
Walaa.
On Mon, Mar 25, 2024 at 7:45 AM Jan Kaul <jank...@mailbox.org.invalid>
wrote:
I have the feeling that the current pros and cons from the summary
target a version of the MV spec that wasn't really part of the
discussion. The current arguments target a completely new
specification for materialized views which we agreed on, is out of
scope. Instead of a completely new specification the argument was
made for a MV metadata object that embeds the View and the Table
metadata, which was Option 6
<https://docs.google.com/spreadsheets/d/1a0tlyh8f2ft2SepE7H3bgoY2A0q5IILgzAsJMnwjTBs/edit#gid=0&range=G3>
in Jack's summary document. With that approach the "commitView"
and "commitTable" operations don't have to be changed and only the
"loadView" operation has to be adopted. Additionally, compaction
and snapshot expiration can be reused for the embedded solution.
With that in mind, the cons 2, 4, 5, 6 from the summary don't
really apply.
Furthermore, I think we should distinguish between pros and cons
for the implementers and the users. Because most of the pros (no
new operations) for separate objects (option1) are for the
implementers and most of the pros (single logical object, doesn't
require 2 loads) for combined objects (option3) are for the users.
In my opinion, in the long run the design decisions should be
focused more on the user preferences than the implementers.
On 3/25/24 14:49, Benny Chow wrote:
Hi Manu
This is Walaa's Spark implementation for option 1:
https://github.com/apache/iceberg/pull/9830/files/a9e1bee3b5bf5914e5330d3b195042aea33868c9
There's no code for option 2 yet.
Best
Benny
On Mon, Mar 25, 2024 at 12:37 AM Manu Zhang
<owenzhang1...@gmail.com> wrote:
Thanks Walaa for the summary. It's unclear to me which are
the reference implementation for option 1 and reference MV
spec for option 2 from the context. I can find some links in
the References section but not sure which should be referred
to respectively.
On Mon, Mar 25, 2024 at 3:38 AM Walaa Eldin Moustafa
<wa.moust...@gmail.com> wrote:
Thanks Himadri for the questions. At this point, our
objective is to have a common understanding of both
options and their pros and cons. The best way to achieve
this is to iterate on the doc to discuss the details of
each option or their pros and cons. We can always add
more details or update the pros and cons. The main thing
is to keep the options to two so that we keep the scope
manageable.
Once we have a common understanding, it will be easy to
make a choice and move forward. Therefore, I would
suggest reframing your questions as either adding
suggestions to add more details to the options, questions
on how either works, or discussions of their pros and
cons on the doc.
Thanks,
Walaa.