I agree with Anton that we should probably spend some time on hangouts
further discussing things. Definitely differing expectations here and we
seem to be talking a bit past each other.
--
Jacques Nadeau
CTO and Co-Founder, Dremio
On Tue, May 21, 2019 at 3:44 PM Cristian Opris
wrote:
> I love a
I love a good flame war :P
> On 21 May 2019, at 22:57, Jacques Nadeau wrote:
>
>
> That's my point, truly independent writers (two Spark jobs, or a Spark job
> and Dremio job) means a distributed transaction. It would need yet another
> external transaction coordinator on top of both Spark an
I would propose to have a series of sessions over hangouts to clarify all
pending points. We can start this week if there is a timeslot that works for
everyone.
Potential topics (feel free to suggest yours):
- Use cases
I believe it is critical that everyone is on the same page when it comes to
> That's my point, truly independent writers (two Spark jobs, or a Spark job
> and Dremio job) means a distributed transaction. It would need yet another
> external transaction coordinator on top of both Spark and Dremio, Iceberg
> by itself
> cannot solve this.
>
I'm not ready to accept this. Ice
Another point to add to my list of clarifications, which I hope will help
understand what the document is proposing better:
The delete diff files can be simply list of keys to delete from previous
snapshots. Natural keys or synthetic keys, it really doesn't matter. This is
simple and elegant, i
Hi Jacques,
> On 21 May 2019, at 22:11, Jacques Nadeau wrote:
>
> It’s not at all clear why unique keys would be needed at all.
>
> If we turn your questions around, you answer yourself. If you have
> independent writers, you need unique keys.
>
> Also truly independent writers (like a job
>
> It’s not at all clear why unique keys would be needed at all.
If we turn your questions around, you answer yourself. If you have
independent writers, you need unique keys.
Also truly independent writers (like a job writing while a job compacts),
> means effectively a distributed transaction,
Hi Jacques,
It’s not at all clear why unique keys would be needed at all.
Also truly independent writers (like a job writing while a job compacts), means
effectively a distributed transaction, and I believe it’s clearly out of scope
for Iceberg to solve that ?
> On 21 May 2019, at 21:31, Jacq
Synthetic vs natural keys generated a lot of discussion internally when coming
up with the proposal :)
There is lots of detail and lots of subtelty but
a few things that may help explain our thinking:
Scale - Iceberg is for remote storage at scale, not local disk. This means
mainly operating on
It would be useful to describe the types of concurrent operations that
> would be supported (i.e., failed snapshotting could easily be recovered,
> vs. the whole operation needing to be re-executed) vs. those that wouldn't.
> Solving for unlimited concurrency cases may create way more complexity th
On Tue, May 21, 2019 at 2:01 PM Jacques Nadeau wrote:
> I think we just need to have further discussion about keys. Ryan said:
>
> 3. Synthetic keys should be based on filename and position
>
>
> But I'm not clear there is consensus around that. I'm also not sure
> whether he means lossless inclu
I think we just need to have further discussion about keys. Ryan said:
3. Synthetic keys should be based on filename and position
But I'm not clear there is consensus around that. I'm also not sure whether
he means lossless inclusion, simply derived-from or something else. My
thinking before is
On Thu, May 16, 2019 at 4:13 PM Ryan Blue wrote:
> Replies inline.
>
> On Thu, May 16, 2019 at 10:07 AM Erik Wright
> wrote:
>
>> I would be happy to participate. Iceberg with merge-on-read capabilities
>> is a technology choice that my team is actively considering. It appears
>> that our scenar
13 matches
Mail list logo