Thanks for writing this FLIP Timo. I think this will be a very important improvement for Flink and our SQL user :-)
Similar to Francesco I would like to understand the statement > For simplification of the design, we assume that upgrades use a step size of a single minor version. We don't guarantee skipping minor versions (e.g. 1.11 to 1.14). a bit better. Is it because Flink does not guarantee that a savepoint created by version 1.x can be directly recovered by version 1.y with x + 1 < y but users might have to go through a cascade of upgrades? From how I understand your proposal, the compiled plan won't be changed after being written initially. Hence, I would assume that for the plan alone Flink will have to give backwards compatibility guarantees for all versions. Am I understanding this part correctly? Cheers, Till On Thu, Nov 25, 2021 at 4:55 PM Francesco Guardiani <france...@ververica.com> wrote: > Hi Timo, > > Thanks for putting this amazing work together, I have some > considerations/questions > about the FLIP: > *Proposed changes #6*: Other than defining this rule of thumb, we must > also make sure > that compiling plans with these objects that cannot be serialized in the > plan must fail hard, > so users don't bite themselves with such issues, or at least we need to > output warning > logs. In general, whenever the user is trying to use the CompiledPlan APIs > and at the same > time, they're trying to do something "illegal" for the plan, we should > immediately either > log or fail depending on the issue, in order to avoid any surprises once > the user upgrades. > I would also say the same for things like registering a function, > registering a DataStream, > and for every other thing which won't end up in the plan, we should log > such info to the > user by default. > > *General JSON Plan Assumptions #9:* When thinking to connectors and > formats, I think > it's reasonable to assume and keep out of the feature design that no > feature/ability can > deleted from a connector/format. I also don't think new features/abilities > can influence > this FLIP as well, given the plan is static, so if for example, > MyCoolTableSink in the next > flink version implements SupportsProjectionsPushDown, then it shouldn't be > a problem > for the upgrade story since the plan is still configured as computed from > the previous flink > version. What worries me is breaking changes, in particular behavioural > changes that > might happen in connectors/formats. Although this argument doesn't seem > relevant for > the connectors shipped by the flink project itself, because we try to keep > them as stable as > possible and avoid eventual breaking changes, it's compelling to external > connectors and > formats, which might be decoupled from the flink release cycle and might > have different > backward compatibility guarantees. It's totally reasonable if we don't > want to tackle it in > this first iteration of the feature, but it's something we need to keep in > mind for the future. > > > *Functions:* It's not clear to me what you mean for "identifier", because > then somewhere > else in the same context you talk about "name". Are we talking about the > function name > or the function complete signature? Let's assume for example we have these > function > definitions: > > > * TO_TIMESTAMP_LTZ(BIGINT) > * TO_TIMESTAMP_LTZ(STRING) > * TO_TIMESTAMP_LTZ(STRING, STRING) > > These for me are very different functions with different implementations, > where each of > them might evolve separately at a different pace. Hence when we store them > in the json > plan we should perhaps use a logically defined unique id like > /bigIntToTimestamp/, / > stringToTimestamp/ and /stringToTimestampWithFormat/. This also solves the > issue of > correctly referencing the functions when restoring the plan, without > running again the > inference logic (which might have been changed in the meantime) and it > might also solve > the versioning, that is the function identifier can contain the function > version like / > stringToTimestampWithFormat_1_1 /or /stringToTimestampWithFormat_1_2/. An > alternative could be to use the string signature representation, which > might not be trivial > to compute, given the complexity of our type inference logic. > > *The term "JSON plan"*: I think we should rather keep JSON out of the > concept and just > name it "Compiled Plan" (like the proposed API) or something similar, as I > see how in > future we might decide to support/modify our persistence format to > something more > efficient storage wise like BSON. For example, I would rename / > CompiledPlan.fromJsonFile/ to simply /CompiledPlan.fromFile/. > > *Who is the owner of the plan file?* I asked myself this question when > reading this: > > > > For simplification of the design, we assume that upgrades use a step > size of a single > minor version. We don't guarantee skipping minor versions (e.g. 1.11 to > 1.14). > > My understanding of this statement is that a user can upgrade between > minors but then > following all the minors, the same query can remain up and running. E.g. I > upgrade from > 1.15 to 1.16, and then from 1.16 to 1.17 and I still expect my original > query to work > without recomputing the plan. This necessarily means that at some point in > future > releases we'll need some basic "migration" tool to keep the queries up and > running, > ending up modifying the compiled plan. So I guess flink should write it > back in the original > plan file, perhaps doing a backup of the previous one? Can you please > clarify this aspect? > > Except these considerations, the proposal looks good to me and I'm eagerly > waiting to see > it in play. > > Thanks, > FG > > > -- > Francesco Guardiani | Software Engineer > france...@ververica.com[1] > > Follow us @VervericaData > -- > Join Flink Forward[2] - The Apache Flink Conference > Stream Processing | Event Driven | Real Time > -- > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > -- > Ververica GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason, > Jinwei (Kevin) > Zhang > > -------- > [1] mailto:france...@ververica.com > [2] https://flink-forward.org/ >