Difference between Flink runner and portable runner on Python

2021-01-14 Thread Nir Gazit
Hey, The documentation lists 2 options for running Python Beam on Flink: "Portable (Python)" and "Portable (Python / Go / Java)". What are the differences between them? which one is recommended? Thanks! Nir

Deploying with Flink and Beam Python Worker

2021-01-14 Thread Nir Gazit
Hi, I'm trying to deploy the word count job on a Flink cluster in Kubernetes. However, when trying to run the job (with python workers as a side car to the Flink task masters), I get the following error: 2021/01/14 14:11:36 Initializing python harness: /opt/apache/beam/boot --id=1-1 --logging_endp

Re: Difference between Flink runner and portable runner on Python

2021-01-14 Thread Kyle Weaver
It's mostly just a different entry point. With (Python / Go / Java), the user has to start a job server themselves before submitting a job. With (Python), the job server is managed automatically in the background. Aside from that, though, there's very little difference. We recommend (Python) to new

Re: Deploying with Flink and Beam Python Worker

2021-01-14 Thread Sam Bourne
Hi Nir, I have a simple repo where I have a proof of concept deployment setup for doing this. https://github.com/sambvfx/beam-flink-k8s Depending on the type of runner you're using there are a few explanations. That repo should hopefully point you in the right direction. On Thu, Jan 14, 2021 at

Re: Is there an array explode function/transform?

2021-01-14 Thread Reuven Lax
Should Unnest be allowed to specify multiple array fields, or just one? On Wed, Jan 13, 2021 at 11:59 PM Manninger, Matyas < matyas.mannin...@veolia.com> wrote: > I would also not unnest arrays nested in arrays just the top-level array > of the specified fields. > > On Wed, 13 Jan 2021 at 20:58,

Re: Is there an array explode function/transform?

2021-01-14 Thread Robert Bradshaw
I think it makes sense to allow specifying more than one, if desired. This is equivalent to just stacking multiple Unnests. (Possibly one could even have a special syntax like "*" for all array fields.) On Thu, Jan 14, 2021 at 10:05 AM Reuven Lax wrote: > Should Unnest be allowed to specify mult

Re: Is there an array explode function/transform?

2021-01-14 Thread Reuven Lax
And the result is essentially a cross product of all the different array elements? On Thu, Jan 14, 2021 at 11:25 AM Robert Bradshaw wrote: > I think it makes sense to allow specifying more than one, if desired. This > is equivalent to just stacking multiple Unnests. (Possibly one could even > ha

Re: Is there an array explode function/transform?

2021-01-14 Thread Robert Bradshaw
I don't see any other reasonable interpretation. (One could use this as an argument to only support one field at a time, to make the potential explosion in data size all the more obvious.) On Thu, Jan 14, 2021 at 11:30 AM Reuven Lax wrote: > And the result is essentially a cross product of all t

Re: Deploying with Flink and Beam Python Worker

2021-01-14 Thread Nir Gazit
Thanks! I actually managed to have my own deployment for running it locally and it works well for local file, but I get this weird error while trying to run the word count example and I'm trying to understand what can be the cause of it? On Thu, Jan 14, 2021 at 7:29 PM Sam Bourne wrote: > Hi Nir

Re: Difference between Flink runner and portable runner on Python

2021-01-14 Thread Nir Gazit
Thanks! And is there a recommended approach for deploying the SDK harness? On Thu, Jan 14, 2021 at 7:19 PM Kyle Weaver wrote: > It's mostly just a different entry point. With (Python / Go / Java), the > user has to start a job server themselves before submitting a job. With > (Python), the job s

Re: Deploying with Flink and Beam Python Worker

2021-01-14 Thread Nir Gazit
BTW - is this something we may want to add to the official repo? (I can do it ofc). It took me a while to find how to set up a local deployment for Flink and the docs weren't super detailed. On Thu, Jan 14, 2021 at 9:39 PM Nir Gazit wrote: > Thanks! I actually managed to have my own deployment f

Re: Difference between Flink runner and portable runner on Python

2021-01-14 Thread Ankur Goenka
DOCKER is the recommended way but for testing and development you can also use LOOPBACK. More details here https://github.com/apache/beam/blob/17fbe8be70b71bc1236e553a5fe7902a2df5dda4/sdks/python/apache_beam/options/pipeline_options.py#L1115 On Thu, Jan 14, 2021 at 11:40 AM Nir Gazit wrote: > Th

Re: Deploying with Flink and Beam Python Worker

2021-01-14 Thread Sam Bourne
Thanks! I actually managed to have my own deployment for running it locally and it works well for local file, but I get this weird error while trying to run the word count example and I’m trying to understand what can be the cause of it? If you’re referring to this error: 2021/01/14 14:11:45 Fail