Thanks.
I opened
https://issues.apache.org/jira/browse/BEAM-12739
And submitted a patch
https://github.com/apache/beam/pull/15313
On Fri, Aug 6, 2021 at 7:57 AM Chamikara Jayalath
wrote:
> Hi Jeremy,
>
> On Thu, Aug 5, 2021 at 7:36 PM Jeremy Lewi wrote:
>
>> Hi Folks,
>>
>> I'm running Beam P
>
> It is likely that the incorrect transform was edited...
>
It appears you're right; I tried to reproduce but this time was able to
clear the issue by making "the same" code change and updating the
pipeline. I believe it was just a change in the wrong place in code.
Good to know this works! T
I created an issue for this: https://issues.apache.org/jira/browse/BEAM-12736
I also took a stab at a fix. Would you accept a pull request? Or, I'd be happy
to discuss.
Cheers,
Chris.
On 9 Aug 2021, at 21:02, Chris Hinds
mailto:chris.hi...@bdi.ox.ac.uk>> wrote:
Haha, it probably shouldn’t!
Thanks for your responses Luke. One point I have confusion over:
* Modify the sink implementation to do what you want with the bad data and
> update the pipeline.
>
I modified the sink implementation to ignore the specific error that was
the problem and updated the pipeline. The update succeeded
Hi all,
I recently had an experience where a streaming pipeline became "clogged"
due to invalid data reaching the final step in my pipeline such that the
data was causing a non-transient error when writing to my Sink. Since the
job is a streaming job, the element (bundle) was continuously retryin
Hi
the following code works for me - mind you i have amended slightly the
code.
few qqq:
1 - where are you running it from>? local pc or GCP console?
2 - has it ever ran before?
3 - can you show the command line you are using to kick off hte process?
i have built your code using gcloud build, an
Hello .
Would this page help ? I hope it helps.
https://beam.apache.org/documentation/runners/spark/
> Running on a pre-deployed Spark cluster
1- What's spark-master-url in case of a remote cluster on Dataproc? Is 7077
the master url port?
* Yes.
2- Should we ssh tunnel to sparkMasterUrl port
Hi,
I have a Python Beam job that works on Dataflow but we would like to submit
it on a Spark Dataproc cluster with no Flink involvement.
I already spent days but failed to figure out how to use PortableRunner
with the beam_spark_job_server to submit my Python Beam job to Spark
Dataproc. All the B
In case it is helpful, as noted in the linked issue this happens on 2.31.0 but
I've since been able to build against a local clone of master and submit my job
successfully. I've just now rebuilt against 2fd98755f2 and again successfully
submitted a job that uses fixed windows.
I've not been abl