I have filed a PR with an update to the Release notes for 2.8.0 https://github.com/apache/pulsar/pull/11392
Thank you all for your feedback Enrico Il giorno mar 20 lug 2021 alle ore 00:54 Neng Lu <nl...@apache.org> ha scritto: > Based on my local test, it's fine for String Schema. > > On 2021/07/19 18:47:49 Devin Bost wrote: > > > This leads to an IncompatibleClassChangeError when you have a > Function or > > > a Connector that is using Schema.JSON(Pojo.class) > > > > I just noticed this detail. Do we have a sense of how often people are > > using Schema.JSON in Functions/Connectors? > > Most of our functions are using a string schema, so it's not clear to me > if > > they would be impacted. > > > > Devin G. Bost > > > > > > On Mon, Jul 19, 2021 at 12:41 PM Devin Bost <devin.b...@gmail.com> > wrote: > > > > > > I think Sijie is referring to using KubernetesRuntime to deploy > functions > > > > where each function/source/sink runs as an independent statefulset in > > > K8s. > > > > In this scenario, it is possible to have fine grained control over > which > > > > version of the function container the function is using. > > > > > > Not everybody is using the KubernetesRuntime yet (especially since the > > > Helm charts aren't feature-complete), and it appears that those who > aren't > > > running KubernetesRuntime would be impacted the most by this issue. > > > > > > Devin G. Bost > > > > > > > > > On Mon, Jul 19, 2021 at 12:36 PM Devin Bost <devin.b...@gmail.com> > wrote: > > > > > >> > For example, if you are upgrading Flink from one version to the > other > > >> > version, you have to make a save point in the previous version for > all > > >> > the Flink jobs. > > >> > Upgrade the Flink cluster and resume jobs in a new version. > > >> > > > >> > > > >> > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/upgrading/ > > >> > > > >> > So it is not unreasonable for asking people to do that when dealing > > >> > with upgrading a centralized computing engine. > > >> > > >> One difference with Flink is that organizations running Flink in job > mode > > >> or application mode can upgrade jobs independently of one another, so > teams > > >> can upgrade jobs when they are ready without impacting other teams. > In the > > >> Pulsar case, Pulsar is multi-tenant, so upgrading the entire cluster > would > > >> break every tenant simultaneously and would block the flow of all > messages > > >> until all functions are upgraded. If one team takes a year to upgrade > their > > >> one function, the cluster could not be upgraded until that happened. > Also, > > >> after all the functions have been upgraded, there would be production > > >> downtime while deploying all the upgraded functions, which would be a > major > > >> outage... It might be possible to write a script to speed up the > deployment > > >> to shrink the outage window, but there's currently a bug that wipes > out > > >> existing userConfigs when a function is upgraded, so that adds to the > > >> complexity of upgrading all the functions since someone would need to > know > > >> all the userConfigs for all the functions. > > >> > > >> So, I don't think we're really comparing the same things here. > > >> > > >> Devin G. Bost > > >> > > >> > > >> On Mon, Jul 19, 2021 at 12:17 PM Sijie Guo <guosi...@gmail.com> > wrote: > > >> > > >>> On Mon, Jul 19, 2021 at 10:32 AM Jerry Peng < > jerry.boyang.p...@gmail.com> > > >>> wrote: > > >>> > > > >>> > I agree that the best we can do right now is to just clearly > document > > >>> this > > >>> > as a potential problem when updating 2.7 to 2.8. > > >>> > > > >>> > We should definitely make every attempt to not make BC breaking > > >>> changes. > > >>> > However, there are times when we have to make these tough decisions > > >>> for one > > >>> > reason or another. The bigger problem I see here is not > necessarily a > > >>> BC > > >>> > breaking change occurred, but rather we didn't know about it > > >>> beforehand so > > >>> > we can clearly document this caveat when 2.8 is released. Perhaps > > >>> this is > > >>> > where we can improve our backwards compatibility testing. We > already > > >>> have > > >>> > some but probably not enough as highlighted by this case. > > >>> > > > >>> > In regards to > > >>> > > > >>> > This is partially correct, because you can wait to upgrade the > workers > > >>> pod, > > >>> > > but there is no fine grained control over which version of each > pod > > >>> will > > >>> > > be running your function, especially in a big cluster with many > > >>> tenants and > > >>> > > functions with this problem > > >>> > > > > >>> > > > >>> > > > >>> > I think Sijie is referring to using KubernetesRuntime to deploy > > >>> functions > > >>> > where each function/source/sink runs as an independent statefulset > in > > >>> K8s. > > >>> > In this scenario, it is possible to have fine grained control over > > >>> which > > >>> > version of the function container the function is using. There > > >>> currently > > >>> > might not be tools to easily allow users to do this but using > kubectl > > >>> one > > >>> > can definitely determine which container version is running and > > >>> potentially > > >>> > update the container version on a per function basis. > > >>> > > >>> Jerry - Thank you! That was what I meant. > > >>> > > >>> > > > >>> > Best, > > >>> > > > >>> > Jerry > > >>> > > > >>> > On Mon, Jul 19, 2021 at 12:50 AM Enrico Olivelli < > eolive...@gmail.com> > > >>> > wrote: > > >>> > > > >>> > > Sijie, > > >>> > > Thank you for your feedback > > >>> > > Some additional considerations inline > > >>> > > > > >>> > > Il Lun 19 Lug 2021, 06:47 Sijie Guo <guosi...@gmail.com> ha > scritto: > > >>> > > > > >>> > > > I don't think this is a big problem. Because people can > recompile > > >>> the > > >>> > > > function and submit the function. Most of the > computing/streaming > > >>> > > > engines ask users to recompile the jobs and resubmit the jobs > when > > >>> it > > >>> > > > upgrades to a new version. > > >>> > > > > >>> > > > > >>> > > Unfortunately this is not easily feasible if the org that is > > >>> managing the > > >>> > > Pulsar service is different from the org who is developing the > > >>> Functions. > > >>> > > And especially it is quite impossible to prevent service > > >>> interruption. > > >>> > > > > >>> > > BTW I believe that there is no way to fix this at this point. > > >>> > > > > >>> > > The best approach here is to document this > > >>> > > > behavior. > > >>> > > > > > >>> > > > > >>> > > I agree that the best thing we can do is to document this > > >>> requirement. > > >>> > > > > >>> > > Therefore we must ensure in the future that we won't fall again > into > > >>> this > > >>> > > kind of issues. > > >>> > > > > >>> > > Pulsar is becoming more and more used by large enterprises and > > >>> backward > > >>> > > compatibility is a big value. > > >>> > > > > >>> > > Fortunately not all the Functions need rebuilding. > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > Also, if you are using Kubernetes runtime to schedule > functions, > > >>> you > > >>> > > > are not really impacted. > > >>> > > > > > >>> > > > > >>> > > This is partially correct, because you can wait to upgrade the > > >>> workers pod, > > >>> > > but there is no fine grained control over which version of each > pod > > >>> will > > >>> > > be running your function, especially in a big cluster with many > > >>> tenants and > > >>> > > functions with this problem > > >>> > > > > >>> > > > > >>> > > Enrico > > >>> > > > > >>> > > > > >>> > > > - Sijie > > >>> > > > > > >>> > > > On Fri, Jul 16, 2021 at 2:44 AM Enrico Olivelli < > > >>> eolive...@gmail.com> > > >>> > > > wrote: > > >>> > > > > > > >>> > > > > Hello, > > >>> > > > > I have reported this issue [1] about upgrading from Pulsar > 2.7 > > >>> to 2.8. > > >>> > > > > More information is on the ticket, but the short version of > the > > >>> story > > >>> > > is > > >>> > > > > that > > >>> > > > > in Pulsar 2.8 we introduced a breaking change in the Schema > API, > > >>> by > > >>> > > > > switching SchemaInfo from a class to an interface. > > >>> > > > > > > >>> > > > > This leads to an IncompatibleClassChangeError when you have > a > > >>> Function > > >>> > > > or > > >>> > > > > a Connector that is using Schema.JSON(Pojo.class) and you > > >>> upgrade your > > >>> > > > > Pulsar cluster (the functions worker pod for instance) from > > >>> Pulsar > > >>> > > 2.7.x > > >>> > > > to > > >>> > > > > Pulsar 2.8.0. > > >>> > > > > > > >>> > > > > The bad problem is that you cannot upgrade Pulsar without > > >>> interrupting > > >>> > > > the > > >>> > > > > service and coordinating with the upgrade of the Functions. > > >>> > > > > Your functions need to be recompiled against the Pulsar 2.8 > API > > >>> and > > >>> > > > > deployed again in production. > > >>> > > > > > > >>> > > > > I have tried to move back SchemaInfo to an "abstract class" > but > > >>> without > > >>> > > > > success, because then you fall into errors. > > >>> > > > > > > >>> > > > > I am not sure there is a way to provide a good "upgrade > path" for > > >>> > > > > Functions/IO users. > > >>> > > > > > > >>> > > > > If we do not find a way we have to document the upgrade in > the > > >>> official > > >>> > > > > Pulsar Documentation. > > >>> > > > > > > >>> > > > > We must do our best to prevent users from falling again into > > >>> this bad > > >>> > > > > situation. > > >>> > > > > > > >>> > > > > Any suggestions or thoughts ? > > >>> > > > > > > >>> > > > > Regards > > >>> > > > > Enrico > > >>> > > > > > > >>> > > > > [1] https://github.com/apache/pulsar/issues/11338 > > >>> > > > > > >>> > > > > >>> > > >> > > >