Re: Make "compatibility.snapshot-id-inheritance.enabled" table prop default to True?

2023-08-31 Thread Anton Okolnychyi
I feel like this shouldn't be a big problem going forward since all new tables will be using the V2 format where the snapshot ID inheritance is enabled. There is currently a bug in our rewrite manifests action that checks the flag but doesn't check the format version. I have a fix for that local

Re: Make "compatibility.snapshot-id-inheritance.enabled" table prop default to True?

2023-08-31 Thread Pucheng Yang
Thanks Rayn, then I think I have a path forward. And I will file a feature request on thread-safe appendManifest on github. Thanks again. On Thu, Aug 31, 2023 at 10:04 AM Ryan Blue wrote: > I think making the operation thread-safe and parallelizing is a good idea. > It should be pretty easy. > >

Re: Make "compatibility.snapshot-id-inheritance.enabled" table prop default to True?

2023-08-31 Thread Ryan Blue
I think making the operation thread-safe and parallelizing is a good idea. It should be pretty easy. And yes, versions of Iceberg older than the one where that config property was added would be the ones where it is unsafe. It's probably safe for most people, but we still can't change the default.

Re: Make "compatibility.snapshot-id-inheritance.enabled" table prop default to True?

2023-08-31 Thread Pucheng Yang
Thanks Ryan, what might you consider an "older" version of Iceberg? Is it fair to say any version before https://github.com/apache/iceberg/commit/c3dc9824b381e5e479e356be5e0f4fcf61a9fc37 ? If that is the case, my organization controls the Iceberg reader so might be a less concern for me. Another o

Re: two proposed spec changes

2023-08-31 Thread Ryan Blue
There are a couple problems with default values. First, they are part of v3 and haven’t been implemented yet. But the second larger issue is that null is a value. A default doesn’t replace a null that was written in the data. I don’t think default values would help out here. What I meant by derive

Re: Make "compatibility.snapshot-id-inheritance.enabled" table prop default to True?

2023-08-31 Thread Ryan Blue
It isn't safe for this to be set for any table that may be read by an older version of Iceberg. On Thu, Aug 31, 2023 at 9:49 AM Pucheng Yang wrote: > Ray, in what cases we can not set it as True for v1 tables? > > My major goal is to reduce the runtime of snapshot creation jobs. > > Is it OK if

Re: Make "compatibility.snapshot-id-inheritance.enabled" table prop default to True?

2023-08-31 Thread Pucheng Yang
Ray, in what cases we can not set it as True for v1 tables? My major goal is to reduce the runtime of snapshot creation jobs. Is it OK if I set it True during snapshot table creation and set it to false when finished? On Thu, Aug 31, 2023 at 9:44 AM Ryan Blue wrote: > This isn't something that

Re: Make "compatibility.snapshot-id-inheritance.enabled" table prop default to True?

2023-08-31 Thread Ryan Blue
This isn't something that we can set to `true` because it is a forward-incompatible change. That's why we added a flag. However, we should make sure that this is the default behavior in v2 tables, since it is safe for v2 (where inheritance happens automatically). If I remember correctly, we still

Make "compatibility.snapshot-id-inheritance.enabled" table prop default to True?

2023-08-31 Thread Pucheng Yang
Hi community, Table prop "compatibility.snapshot-id-inheritance.enabled" is introduced to avoid manifest rewrite if possible (PR: https://github.com/apache/iceberg/commit/c3dc9824b381e5e479e356be5e0f4fcf61a9fc37 ). During my recent investigation on a super long snapshot table creation on a huge t

Re: do spark structured streaming writer options works with foreachBatch?

2023-08-31 Thread Ryan Blue
We generally don't recommend fanout writers because they create lots of small data files. It also isn't clear why the table's partitioning isn't causing Spark to distribute the data properly -- maybe you're using an old Spark version? In any case, you can distribute the data yourself to align with