Implementing a custom I/O Connector

2022-07-14 Thread Damian Akpan
Hi Everyone, I've been working on implementing a Google Sheets IO source for my pipeline. I've tried this example along with this blog . I have an example here on colab

Metrics in Beam+Spark

2022-07-14 Thread Yushu Yao
Hi Team, Does anyone have a working example of a beam job running on top of spark? So that I can use the beam metric syntax and the metrics will be shipped out via spark's infra? The only thing I achieved is to be able to queryMetrics() every half second and copy all the metrics into the spark met

Re: Metrics in Beam+Spark

2022-07-14 Thread Moritz Mack
Hi Yushu, Wondering, how did you configure your Spark metrics sink? And what version of Spark are you using? Key is to configure Spark to use one of the sinks provided by Beam, e.g.: "spark.metrics.conf.*.sink.csv.class"="org.apache.beam.runners.spark.metrics.sink.CsvSink" Currently there’s sup

Re: Implementing a custom I/O Connector

2022-07-14 Thread Chamikara Jayalath via user
Do you have the full stacktrace ? Also, what does the Read() transform in the example entail ? Thanks, Cham On Thu, Jul 14, 2022 at 7:39 AM Damian Akpan wrote: > Hi Everyone, > > I've been working on implementing a Google Sheets IO source for my > pipeline. I've tried this example >

Re: Implementing a custom I/O Connector

2022-07-14 Thread Damian Akpan
It has this error After some looking around, I think the problem was because I treated the Splittable DoFn as a regular DoFn. And they weren't any PCollection in the pipeline. --- > AttributeErrorT

Re: Implementing a custom I/O Connector

2022-07-14 Thread Chamikara Jayalath via user
You cannot directly apply 'beam.ParDo' on the pipeline object. Instead you feed the source description element to the ParDo, for example, p | beam.Create([source_description]) | beam.ParDo(CountFn(10)) If the 'source_description' element is trivial (or gets ignored in the source), you can replace

Re: Implementing a custom I/O Connector

2022-07-14 Thread Damian Akpan
Okay, beam.Impulse() does solve it now Thanks so much for your help. On Thu, Jul 14, 2022 at 10:59 PM Chamikara Jayalath wrote: > You cannot directly apply 'beam.ParDo' on the pipeline object. > Instead you feed the source description element to the ParDo, for example, > p | beam.Create([source

Re: Metrics in Beam+Spark

2022-07-14 Thread Yushu Yao
Thanks Mortiz! We are using jmx to ship the metrics out of spark. Most of the spark built-in driver and executor metrics are going out fine. Does this require us to make another sink? -Yushu On Thu, Jul 14, 2022 at 1:05 PM Moritz Mack wrote: > Hi Yushu, > > > > Wondering, how did you configur