Hi Timo, Thanks for putting this amazing work together, I have some considerations/questions about the FLIP: *Proposed changes #6*: Other than defining this rule of thumb, we must also make sure that compiling plans with these objects that cannot be serialized in the plan must fail hard, so users don't bite themselves with such issues, or at least we need to output warning logs. In general, whenever the user is trying to use the CompiledPlan APIs and at the same time, they're trying to do something "illegal" for the plan, we should immediately either log or fail depending on the issue, in order to avoid any surprises once the user upgrades. I would also say the same for things like registering a function, registering a DataStream, and for every other thing which won't end up in the plan, we should log such info to the user by default.
*General JSON Plan Assumptions #9:* When thinking to connectors and formats, I think it's reasonable to assume and keep out of the feature design that no feature/ability can deleted from a connector/format. I also don't think new features/abilities can influence this FLIP as well, given the plan is static, so if for example, MyCoolTableSink in the next flink version implements SupportsProjectionsPushDown, then it shouldn't be a problem for the upgrade story since the plan is still configured as computed from the previous flink version. What worries me is breaking changes, in particular behavioural changes that might happen in connectors/formats. Although this argument doesn't seem relevant for the connectors shipped by the flink project itself, because we try to keep them as stable as possible and avoid eventual breaking changes, it's compelling to external connectors and formats, which might be decoupled from the flink release cycle and might have different backward compatibility guarantees. It's totally reasonable if we don't want to tackle it in this first iteration of the feature, but it's something we need to keep in mind for the future. *Functions:* It's not clear to me what you mean for "identifier", because then somewhere else in the same context you talk about "name". Are we talking about the function name or the function complete signature? Let's assume for example we have these function definitions: * TO_TIMESTAMP_LTZ(BIGINT) * TO_TIMESTAMP_LTZ(STRING) * TO_TIMESTAMP_LTZ(STRING, STRING) These for me are very different functions with different implementations, where each of them might evolve separately at a different pace. Hence when we store them in the json plan we should perhaps use a logically defined unique id like /bigIntToTimestamp/, / stringToTimestamp/ and /stringToTimestampWithFormat/. This also solves the issue of correctly referencing the functions when restoring the plan, without running again the inference logic (which might have been changed in the meantime) and it might also solve the versioning, that is the function identifier can contain the function version like / stringToTimestampWithFormat_1_1 /or /stringToTimestampWithFormat_1_2/. An alternative could be to use the string signature representation, which might not be trivial to compute, given the complexity of our type inference logic. *The term "JSON plan"*: I think we should rather keep JSON out of the concept and just name it "Compiled Plan" (like the proposed API) or something similar, as I see how in future we might decide to support/modify our persistence format to something more efficient storage wise like BSON. For example, I would rename / CompiledPlan.fromJsonFile/ to simply /CompiledPlan.fromFile/. *Who is the owner of the plan file?* I asked myself this question when reading this: > For simplification of the design, we assume that upgrades use a step size of > a single minor version. We don't guarantee skipping minor versions (e.g. 1.11 to 1.14). My understanding of this statement is that a user can upgrade between minors but then following all the minors, the same query can remain up and running. E.g. I upgrade from 1.15 to 1.16, and then from 1.16 to 1.17 and I still expect my original query to work without recomputing the plan. This necessarily means that at some point in future releases we'll need some basic "migration" tool to keep the queries up and running, ending up modifying the compiled plan. So I guess flink should write it back in the original plan file, perhaps doing a backup of the previous one? Can you please clarify this aspect? Except these considerations, the proposal looks good to me and I'm eagerly waiting to see it in play. Thanks, FG -- Francesco Guardiani | Software Engineer france...@ververica.com[1] Follow us @VervericaData -- Join Flink Forward[2] - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason, Jinwei (Kevin) Zhang -------- [1] mailto:france...@ververica.com [2] https://flink-forward.org/