Re: Beam Job Server Errors Out: No filesystem found for scheme s3

2021-08-10 Thread Jeremy Lewi
Thanks. I opened https://issues.apache.org/jira/browse/BEAM-12739 And submitted a patch https://github.com/apache/beam/pull/15313 On Fri, Aug 6, 2021 at 7:57 AM Chamikara Jayalath wrote: > Hi Jeremy, > > On Thu, Aug 5, 2021 at 7:36 PM Jeremy Lewi wrote: > >> Hi Folks, >> >> I'm running Beam P

Re: [Dataflow][Java][2.30.0] Best practice for clearing stuck data in streaming pipeline

2021-08-10 Thread Evan Galpin
> > It is likely that the incorrect transform was edited... > It appears you're right; I tried to reproduce but this time was able to clear the issue by making "the same" code change and updating the pipeline. I believe it was just a change in the wrong place in code. Good to know this works! T

Re: Protobuf schema provider row functions break on CamelCased field names

2021-08-10 Thread Chris Hinds
I created an issue for this: https://issues.apache.org/jira/browse/BEAM-12736 I also took a stab at a fix. Would you accept a pull request? Or, I'd be happy to discuss. Cheers, Chris. On 9 Aug 2021, at 21:02, Chris Hinds mailto:chris.hi...@bdi.ox.ac.uk>> wrote: Haha, it probably shouldn’t!

Re: [Dataflow][Java][2.30.0] Best practice for clearing stuck data in streaming pipeline

2021-08-10 Thread Evan Galpin
Thanks for your responses Luke. One point I have confusion over: * Modify the sink implementation to do what you want with the bad data and > update the pipeline. > I modified the sink implementation to ignore the specific error that was the problem and updated the pipeline. The update succeeded

[Dataflow][Java][2.30.0] Best practice for clearing stuck data in streaming pipeline

2021-08-10 Thread Evan Galpin
Hi all, I recently had an experience where a streaming pipeline became "clogged" due to invalid data reaching the final step in my pipeline such that the data was causing a non-transient error when writing to my Sink. Since the job is a streaming job, the element (bundle) was continuously retryin

Re: Dataflow job gets stuck

2021-08-10 Thread Sofia’s World
Hi the following code works for me - mind you i have amended slightly the code. few qqq: 1 - where are you running it from>? local pc or GCP console? 2 - has it ever ran before? 3 - can you show the command line you are using to kick off hte process? i have built your code using gcloud build, an

Re: Submit Python Beam on Spark Dataproc

2021-08-10 Thread Yu Watanabe
Hello . Would this page help ? I hope it helps. https://beam.apache.org/documentation/runners/spark/ > Running on a pre-deployed Spark cluster 1- What's spark-master-url in case of a remote cluster on Dataproc? Is 7077 the master url port? * Yes. 2- Should we ssh tunnel to sparkMasterUrl port

Submit Python Beam on Spark Dataproc

2021-08-10 Thread Mahan Hosseinzadeh
Hi, I have a Python Beam job that works on Dataflow but we would like to submit it on a Spark Dataproc cluster with no Flink involvement. I already spent days but failed to figure out how to use PortableRunner with the beam_spark_job_server to submit my Python Beam job to Spark Dataproc. All the B

Re: Unable to use windowing transform on Dataflow with Go SDK (BEAM-12636)

2021-08-10 Thread Hannes Gustafsson
In case it is helpful, as noted in the linked issue this happens on 2.31.0 but I've since been able to build against a local clone of master and submit my job successfully. I've just now rebuilt against 2fd98755f2 and again successfully submitted a job that uses fixed windows. I've not been abl