user@beam.apache.org Hi,
I recently started encountering a strange error where a Dataflow job launched from a template never completes, but runs when launched directly. The template has been in use since Dec 14 without issue, but trying to recreate the template today (or the past week) and executing it, results in one stage of the job sitting at 100% complete for hours, and never completing. When trying to run the job directly (i.e. not via template) today, the Logs Explorer has a confusing message, but does complete: Error requesting progress from SDK: OUT_OF_RANGE: SDK claims to be processing element 535 yet only 535 elements have been sent When trying to run via template, the following three errors show up: Element processed sanity check disabled due to SDK not reporting number of elements processed. Error requesting progress from SDK: UNKNOWN: Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 667, in process_bundle_progress processor = self.bundle_processor_cache.lookup(request.instruction_id) File "/usr/local/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 468, in lookup raise RuntimeError( RuntimeError: Bundle processing associated with process_bundle-7395200449888031466-19 has failed. Check prior failing response for details. [ type.googleapis.com/util.MessageSetPayload='[dist_proc.dax.internal.TrailProto] { trail_point { source_file_loc { filepath: "dist_proc/dax/workflow/worker/fnapi_service_impl.cc" line: 800 } } }'] === Source Location Trace: === dist_proc/dax/workflow/worker/fnapi_sdk_harness.cc:183 dist_proc/dax/workflow/worker/fnapi_service_impl.cc:800 SDK failed progress reporting 6 times (limit: 5), no longer holding back progress to last SDK reported progress. None of these error messages show up in the template created on Dec 14, so I'm unsure if some setting or default behavior has been changed or what's going on. Any help or pointers to debug would be much appreciated. Thanks, Patrick