Tried to get others some opportunity to comment, but I see it's mostly me <> Bolke.
> > > Yep. I rectified the implicitness by storing this as an explicit decision > > record > > in the ADR proposal. I think it's been - similarly as lack of > > > <snip> > Happy to help here - since there is a good "vibe" on the ADR kind of approach, I am happy to "lead by example" and propose a PR with ADRs for serialization (learning from the whole discussions and that's a good opportunity for me to learn some of the nitty-gritty details ). Would that work for you and can I count on some reviews ? > It was added to the DAG serializer at the time, I mentioned that that > looked weird > given the fact that it did not seem connected to DAG serialization (to me) > and serde > should be considered. So no, I wasn't insisting on making it part of > serialization imho, > it defacto was already. > Let's straighten it up after writing ADRs and rounding up the serialization case. We are aiming to enable AIP-44 and Pydantic in 2.9 so that's a good time to see and make some measurements there. > > > > > Going through either of the serializations of ours here is not needed > > overhead at all - we can just purely rely on Rust and get ORM on one side > > and Pydantic object on the other without touching any of our > serialization > > code. > > > > That's fine to me. > Yeah. We'll sort it out. Luckily it's a very localized place and we can simply open draft PR showing how it will look like and see if it looks simpler/whether we lose any of the properties that > > > I don't think you want to ;-). > > Happy also to take a look at helping to straighten up DAG serialization possibly once I get a bit more familiar with the context. > We could. It's probably just a three pass serializer then. > Interesting :). > > > Exactly not. By adding it to Airflow 2.8.0 it would have made every > previous version > of this provider suddenly work without any adjustments against Airflow > 2.8.0. Remember > that the original bug report said "cannot serialize value of type > PyODBC.Row". > I was more talking about compatibility of ODBC provider. If we want to directly return ODBC.Row, it will only be serializable in Airflow 2.8.0 (currently ODBC provider has >= Airflow 2.6.0. What I want to achieve is to implement a fix where you can upgrade just provider and get serializable output even if the user decides to stay with Airflow 2.8.0. This w > You can argue both ways here architecturally. > > 1) Airflow 2.8 with a new feature that allows PyODBC.Row to be serialized, > affecting past and future providers at large and (TaskFlow) returned > variables. > This requires users to upgrade to Airflow 2.8.0. Many of our users delay such upgrades for months or years. So we are talking about making it possible for them to upgrade "easily" where selective upgrades to providers is the easy way. This also works better for some Airflow-as-a-service where 2.8 might only be available months from now. Migration to a new Airflow version is way more involved and uptick of new versions (especially .0) is slow. We are basically talking about having something available now - for Airflow 2.6, 2.7, (and 2.8) now vs. many users having it months or years from now. And releasing it now fixes actual issue people have and raised us. And implementing it in provider has an added benefit that IF we also implement a generic serializer as part of ODBC and have a way for serde to discover serializers by providers, then we could implement it in the provider in a way that either uses internal "serialization" or allows serde to discover it. I see really no benefit of having Airflow know anything about ODBC. Are we going to implement all 80 provider's specific code and put it to airlfow and all dependencies (even optional) as airflow dependencies, now that we made enormous effort in separating these? Imagine ODBC code depends on odbc_library. What happens if Airflow has optional dependency on odbc_library <3 and the provider has dependency on odbc_library >=3 ? Imagine that the provider returns ODBCRow from version 3 that serializer won't be able to handle. How do you install the two together? How do you install these two together? How do you make them work together? It really makes very little sense to have airflow depend on the library and code that provider already depends on, because this will create a whole new class of potential conflicts. I think the *only* possible way to make ODBC Row handled by serde is for serde to discover serializers from installed providers and use them from there. This solves all the "dependency conflict" cases nicely and is long term sustainable and maintainable. > This is exactly the intention with the serializers in > airflow.serialization.serializers, > however can be tough as not all expose the information in public APIs (e.g. > deltalake) > that allow to reinstate the object. > > But yes: loose, and pragmatic public API / SemVer. > > Yep. > >