I imagine this is no longer helpful to you, Evan, but I ran into this issue this week and tracked down the underlying problem. Basically, a snappy-java version upgrade [1] seems to have changed how the SnappyCoder [2] is serialized. Since this is used by PubSub read, it caused upgrades to start failing.
I don't think there's much to be done about this at this point since this is from 2.52.0, but I added an update to CHANGES.md to call this out and figured I'd also respond here in case others hit this in the future. I think the only real workaround is probably draining/restarting any impacted pipelines. [1] https://github.com/apache/beam/pull/28655 [2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/SnappyCoder.java [3] https://github.com/apache/beam/pull/32753 Thanks, Danny On Wed, Jan 3, 2024 at 3:14 PM Wiśniowski Piotr < contact.wisniowskipi...@gmail.com> wrote: > Hi Evan, > > Actually did hit similar problem few months ago and finally managed to > solve it. My situation is a bit different as I am using Python SDK and > DataFlow runner v2 (+streaming engine, prime), and quite a lot > of state-full processing. And for my case it did fail with very similar msg > but related to some state-full step. > > The thing is I discovered that update pipeline in place does fail even > when submitting exact same code to the pipeline. it seems the problem was > that the pipeline graph must be parsed in same order that on the original > graph. In my case I had an unordered set of steps to add them to pipeline > resulting in the same pipeline graph, but it seems that the ordering of > parsing does matter and it fails to update running job if order is > different. > > For my case I just sorted the steps to be added to pipeline by name and > updating job on fly started working. So it seems that pipeline state on > DataFlow depends somehow on the order in which steps are added to pipeline > since some recent versions (as I do recall this was working correctly > ~2.50?). Anyone knows if this is intended? If yes would like to know some > explanation. > > Best regards > > Wiśniowski Piotr > On 15.12.2023 00:14, Evan Galpin wrote: > > The pipeline in question is using Dataflow v1 Runner (Runner v2: Disabled) > in case that's an important detail. > > On Tue, Dec 12, 2023 at 4:22 PM Evan Galpin <egal...@apache.org> wrote: > >> I've checked the source code and deployment command for cases of setting >> experiments. I don't see "enable_custom_pubsub_source" being used at all, >> no. I also confirmed that it is not active on the existing/running job. >> >> On Tue, Dec 12, 2023 at 4:11 PM Reuven Lax via user <u...@beam.apache.org> >> wrote: >> >>> Are you setting the enable_custom_pubsub_source experiment by any >>> chance? >>> >>> On Tue, Dec 12, 2023 at 3:24 PM Evan Galpin <egal...@apache.org> wrote: >>> >>>> Hi all, >>>> >>>> When attempting to upgrade a running Dataflow pipeline from SDK 2.51.0 >>>> to 2.52.0, an incompatibility warning is surfaced that prevents pipeline >>>> upgrade: >>>> >>>> >>>>> The Coder or type for step .../PubsubUnboundedSource has changed >>>> >>>> >>>> Was there an intentional coder change introduced for PubsubMessage in >>>> 2.52.0? I didn't note anything in the release notes >>>> https://beam.apache.org/blog/beam-2.52.0/ nor recent changes in >>>> PubsubMessageWithAttributesCoder[1]. Specifically the step uses >>>> `PubsubMessageWithAttributesCoder` via >>>> `PubsubIO.readMessagesWithAttributes()` >>>> >>>> Thanks! >>>> >>>> >>>> [1] >>>> https://github.com/apache/beam/blob/90e79ae373ab38cf4e48e9854c28aaffb0938458/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubMessageWithAttributesCoder.java#L36 >>>> >>>