Re: BigTable reader for Python?

2022-12-28 Thread Lina MÃ¥rtensson via dev
I kept working with an ExternalTransformRegistrar solution (although if there's an easier way, I'm all ears), and I have Java code that builds, and a Python connector that tries to use it. My current issue is that the expansion service that's started up doesn't find my transform using the URN prov

Re: Testing Multilanguage Pipelines?

2022-12-28 Thread Byron Ellis via dev
Thanks for the tips, folks! Took a bit of doing, but I got Java -> Python -> Java working without Docker being involved in the process (getting it working with Docker being involved wasn't so bad... though it didn't do what I wanted with respect to collecting results). Removing Docker appears to le

Re: SparkRunner - ensure SDF output does not need to fit in memory

2022-12-28 Thread Daniel Collins via dev
I believe that for dataflow runner, the result of processElement must also fit in memory, so this is not just a constraint for the spark runner. The best approach at present might be to convert the source from a flatMap to an SDF that reads out chunks of the file at a time, and supports runner che

Re: Testing Multilanguage Pipelines?

2022-12-28 Thread Robert Bradshaw via dev
On Wed, Dec 28, 2022 at 10:09 AM Byron Ellis wrote: > > On Wed, Dec 28, 2022 at 9:49 AM Robert Bradshaw wrote: >> >> On Wed, Dec 28, 2022 at 4:56 AM Danny McCormick via dev >> wrote: >> > >> > > Given the increasing importance of multi language pipelines, it does >> > > seem that we should expa

SparkRunner - ensure SDF output does not need to fit in memory

2022-12-28 Thread Jozef Vilcek
Hello, I am working on an issue which currently limits spark runner by requiring the result of processElement to fit the memory [1]. This is problematic e.g for flatMap where the input element is file split and generates possibly large output. The intended fix is to add an option to have dofn pro

Re: Testing Multilanguage Pipelines?

2022-12-28 Thread Byron Ellis via dev
On Wed, Dec 28, 2022 at 9:49 AM Robert Bradshaw wrote: > On Wed, Dec 28, 2022 at 4:56 AM Danny McCormick via dev > wrote: > > > > > Given the increasing importance of multi language pipelines, it does > seem that we should expand the capabilities of the DirectRunner or just go > all in on FlinkR

Re: Testing Multilanguage Pipelines?

2022-12-28 Thread Robert Bradshaw via dev
On Wed, Dec 28, 2022 at 4:56 AM Danny McCormick via dev wrote: > > > Given the increasing importance of multi language pipelines, it does seem > > that we should expand the capabilities of the DirectRunner or just go all > > in on FlinkRunner for testing and local / small scale development > > +

Re: Testing Multilanguage Pipelines?

2022-12-28 Thread Danny McCormick via dev
> Given the increasing importance of multi language pipelines, it does seem that we should expand the capabilities of the DirectRunner or just go all in on FlinkRunner for testing and local / small scale development +1 - annecdotally I've found local testing of multi-language pipelines to be trick

Re: Testing Multilanguage Pipelines?

2022-12-28 Thread Sachin Agarwal via dev
Given the increasing importance of multi language pipelines, it does seem that we should expand the capabilities of the DirectRunner or just go all in on FlinkRunner for testing and local / small scale development On Wed, Dec 28, 2022 at 12:47 AM Robert Burke wrote: > Probably either on Flink, o

Beam High Priority Issue Report (42)

2022-12-28 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/24776 [Bug]: Race conditi