Beam on Flink version compatibility

2021-01-10 Thread Nir Gazit
Hey, I'm trying to use Beam on Flink with Python, and for some reason it insists on using Flink 1.10 even though Beam's website states that v1.12 is already supported. Digging through the code, it seems there's a hardcoded limit

Difference between Flink runner and portable runner on Python

2021-01-14 Thread Nir Gazit
Hey, The documentation lists 2 options for running Python Beam on Flink: "Portable (Python)" and "Portable (Python / Go / Java)". What are the differences between them? which one is recommended? Thanks! Nir

Deploying with Flink and Beam Python Worker

2021-01-14 Thread Nir Gazit
Hi, I'm trying to deploy the word count job on a Flink cluster in Kubernetes. However, when trying to run the job (with python workers as a side car to the Flink task masters), I get the following error: 2021/01/14 14:11:36 Initializing python harness: /opt/apache/beam/boot --id=1-1 --logging_endp

Re: Deploying with Flink and Beam Python Worker

2021-01-14 Thread Nir Gazit
the right direction. > > On Thu, Jan 14, 2021 at 6:16 AM Nir Gazit wrote: > >> Hi, >> I'm trying to deploy the word count job on a Flink cluster in Kubernetes. >> However, when trying to run the job (with python workers as a side car to >> the Flink task masters

Re: Difference between Flink runner and portable runner on Python

2021-01-14 Thread Nir Gazit
thon), the job server is managed automatically in the background. Aside > from that, though, there's very little difference. We recommend (Python) to > new users only because it's a smidge easier to set up. > > On Thu, Jan 14, 2021 at 4:12 AM Nir Gazit wrote: > >>

Re: Deploying with Flink and Beam Python Worker

2021-01-14 Thread Nir Gazit
BTW - is this something we may want to add to the official repo? (I can do it ofc). It took me a while to find how to set up a local deployment for Flink and the docs weren't super detailed. On Thu, Jan 14, 2021 at 9:39 PM Nir Gazit wrote: > Thanks! I actually managed to have my own de

Beam stuck when trying to run on remote Flink cluster

2021-02-08 Thread Nir Gazit
Hey, I'm trying to run a super simple word-count example on a remote Flink cluster. Running it on a local cluster works great, but when I try to submit it to a remote cluster it's just stuck forever, no error or anything. How can I debug this? Thanks! Nir

Error when trying to read from S3

2021-02-10 Thread Nir Gazit
Hey, I'm getting this error: apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions {'s3://fiverr-data-science-de v/beam_poc/beam/wc/input.txt': BeamIOError("exists() operation failed with exceptions {'s3://fiverr-data-sc ience-dev/beam_poc/beam/wc/input.txt': ValueError('Mus

Re: Error when trying to read from S3

2021-02-10 Thread Nir Gazit
wonder what is the reason this assertion was added as it seems that it hasn't been there in the past. Any assistance would be appreciated! Nir On Wed, Feb 10, 2021 at 6:49 PM Nir Gazit wrote: > Hey, > I'm getting this error: > apache_beam.io.filesystem.BeamIOError: Match op

Python SDK with S3IO on Flink

2021-02-24 Thread Nir Gazit
Hey, When trying to read a file from S3 with a combine action, the pipeline seems to be stuck. When replacing it with a GCP source it works fine. Furthermore - if I comment out the Count.PerElement part it also works. Anyone has an idea why that is? lines = p | beam.io.ReadFromText('s3://...')

Re: Python SDK with S3IO on Flink

2021-02-25 Thread Nir Gazit
PR you submitted? > > On Wed, Feb 24, 2021 at 8:55 AM Nir Gazit wrote: > >> Hey, >> When trying to read a file from S3 with a combine action, the pipeline >> seems to be stuck. When replacing it with a GCP source it works fine. >> Furthermore - if I comment out

Go SDK with State & Timers

2021-03-01 Thread Nir Gazit
Hey, Does the Go SDK support states? I haven't seen any examples nor documentation about it. Thanks! Nir

Re: Beam stuck when trying to run on remote Flink cluster

2021-03-02 Thread Nir Gazit
you use the portable > runner or the classic one? > > On Mon, Feb 8, 2021 at 5:41 PM Joseph Zack > wrote: > >> unsubscribe >> >> On Mon, Feb 8, 2021 at 5:06 AM Nir Gazit wrote: >> >>> Hey, >>> I'm trying to run a super simple word-cou

Re: Go SDK with State & Timers

2021-03-02 Thread Nir Gazit
; > > On 1 Mar 2021, at 18:13, Nir Gazit wrote: > > > > Hey, > > Does the Go SDK support states? I haven't seen any examples nor > documentation about it. > > > > Thanks! > > Nir > >

Reducing job upload time to Flink

2021-03-02 Thread Nir Gazit
Hey, When running Beam on Flink, the upload time of a job seems to be extremely long. Is there any way to preload some of the JARs to Flink in order to avoid re-loading it every time, especially for development. Thanks Nir

Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-04 Thread Nir Gazit
Hey, I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL environment. However, when the pipeline is run, the error below is thrown, which implies that for some reason the external environment pipeline options didn't get in. When replacing the Kafka Source with an S3 source (for exam

Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-04 Thread Nir Gazit
Jayalath wrote: > Is it possible that you don't have the "docker" command available in your > system ? > > On Tue, May 4, 2021 at 10:28 AM Nir Gazit wrote: > >> Hey, >> I'm trying to run a pipeline with a Kafka Source, using an EXTERNAL >>

Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-04 Thread Nir Gazit
nsforms not the EXTERNAL environment (agree that the terminology is > confusing). > > On Tue, May 4, 2021 at 11:04 AM Nir Gazit wrote: > >> Yes that’s on purpose. I’m running in Kubernetes which makes it hard to >> install docker on the pods so I don’t want to use the do

Re: Potential bug with Kafka (or other external) IO in Python Portable Runner

2021-05-06 Thread Nir Gazit
am/runners/core/construction/Environments.java#L134 > > I don't think we support running Java cross-language transforms in other > environments yet. > > Thanks, > Cham > > On Tue, May 4, 2021 at 12:30 PM Nir Gazit wrote: > >> But looking at the code of the except

IllegalStateException with simple Kafka Pipeline

2021-05-06 Thread Nir Gazit
Hey, I'm trying to run a pipeline with the Python SDK that reads from Kafka. I've started with a simple one that just reads messages and prints them to the console. When running on Flink, I get the following error: File "kafka_print.py", line 36, in run_kafka_pipeline | 'Print to console' >> be

Error with Beam/Flink Python pipeline with Kafka

2021-05-24 Thread Nir Gazit
Hey, I'm having issues with running a simple pipeline on a remote Flink cluster. I use a separate Beam job server to which I submit the job, which then goes to the Flink cluster with docker-in-docker enabled. For some reason, I'm getting errors in Flink which I can't decipher (see gist