Hi everyone,

Anton has a PR almost ready to merge that implements snapshot ID
inheritance, similar to how we plan to inherit sequence IDs in metadata.
That allows people to create manifests that are missing data that will be
assigned at commit time (snapshot ID) or that may change if a commit is
retried (sequence number). The inherited information is stored as a field
of ManifestFile that is stored in the ManifestList.

This change makes the snapshot ID optional for each data file in a
manifest, so that a null snapshot ID indicates that it should be inherited
from the manifest metadata. This is a breaking change because older readers
consider this field required. A change that can break older readers is not
allowed because we guarantee forward compatibility within a format version.

There are some options for how we handle this. First, we could bump the
format version and break compatibility, but there are cases when it is
possible to read tables that use appended manifests. For example, tables
that don't use appended manifests, or tables that rewrite those manifests
quickly will be compatible with old readers. That's why I think we should
consider a second option: adding a feature flag that ensures that manifests
will not be written with missing snapshot IDs unless the table has the
compatibility flag set. Then tables are opted into breaking changes within
a format version and we have a way to release format features before the
version where they become standard; format v2 will mark the snapshot ID
optional and have requirements for inheritance.

What do people think about this strategy for managing breaking changes? I
like the idea of getting the changes out early behind feature flags, where
possible, but it would be great to hear whether other people see problems
with this approach.


rb

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to