+1 to having a feature flag mechanism that defaults to ‘compatible’ mode by default.
I would suggest a robust test suite around this new feature, with a focus on not breaking existing behavior when in ‘compatible’ mode. Xabriel J Collazo Mojica | Sr Software Engineer | Adobe From: Ryan Blue <rb...@netflix.com.INVALID> Reply-To: "dev@iceberg.apache.org" <dev@iceberg.apache.org>, "rb...@netflix.com" <rb...@netflix.com> Date: Monday, January 13, 2020 at 11:28 AM To: Iceberg Dev List <dev@iceberg.apache.org> Subject: [DISCUSS] Forward compatibility and snapshot ID inheritance Hi everyone, Anton has a PR almost ready to merge that implements snapshot ID inheritance, similar to how we plan to inherit sequence IDs in metadata. That allows people to create manifests that are missing data that will be assigned at commit time (snapshot ID) or that may change if a commit is retried (sequence number). The inherited information is stored as a field of ManifestFile that is stored in the ManifestList. This change makes the snapshot ID optional for each data file in a manifest, so that a null snapshot ID indicates that it should be inherited from the manifest metadata. This is a breaking change because older readers consider this field required. A change that can break older readers is not allowed because we guarantee forward compatibility within a format version. There are some options for how we handle this. First, we could bump the format version and break compatibility, but there are cases when it is possible to read tables that use appended manifests. For example, tables that don't use appended manifests, or tables that rewrite those manifests quickly will be compatible with old readers. That's why I think we should consider a second option: adding a feature flag that ensures that manifests will not be written with missing snapshot IDs unless the table has the compatibility flag set. Then tables are opted into breaking changes within a format version and we have a way to release format features before the version where they become standard; format v2 will mark the snapshot ID optional and have requirements for inheritance. What do people think about this strategy for managing breaking changes? I like the idea of getting the changes out early behind feature flags, where possible, but it would be great to hear whether other people see problems with this approach. rb -- Ryan Blue Software Engineer Netflix