Re: # of Readers < MaxNumWorkers for UnboundedSource. Causing deadlocks

2019-09-25 Thread Jan Lukavský
I have a feeling there has to be something going on wrong with your source. If I were you, I would probably do the following:  - verify that I never produce an UnboundedSource split with no queue associated (Preconditions.checkState())  - make sure that I *really never* block in call to advan

A couple questions from someone new to Beam

2019-09-25 Thread Steve973
Hi, all. I am still ramping up on my learning of how to use Beam, and I have a couple of questions for the experts. And, while I have read the documentation, I have either looked at the wrong parts, or my particular questions were not specifically answered. If I have missed something, then pleas

Re: A couple questions from someone new to Beam

2019-09-25 Thread Jeff Klukas
(1) I don't have direct experience with contacting MongoDB in Beam, but my general expectation is that yes, Beam IOs will do reasonable things for connection setup and teardown. For the case of MongoDbIO in the Java SDK, it looks like this connection setup happens in BoundedMongoDbReader [0]. In ge

Re: # of Readers < MaxNumWorkers for UnboundedSource. Causing deadlocks

2019-09-25 Thread Ken Barr
I already poll a local queue in advance() and respond false if queue empty. Is there an example of restricting UnboundedSource splits? On 2019/09/25 08:13:43, Jan Lukavský wrote: > I have a feeling there has to be something going on wrong with your > source. If I were you, I would probably do

Re: No filesystem found for scheme s3 using FileIO

2019-09-25 Thread Maximilian Michels
Hi Magnus, Your observation seems to be correct. There is an issue with the file system registration. The two types of errors you are seeing, as well as the successful run, are just due to the different structure of the generated transforms. The Flink scheduler will distribute them different

Re: No filesystem found for scheme s3 using FileIO

2019-09-25 Thread Maximilian Michels
Hi Preston, Sorry about the name mixup, of course I meant to write Preston not Magnus :) See my reply below. cheers, Max On 25.09.19 08:31, Maximilian Michels wrote: Hi Magnus, Your observation seems to be correct. There is an issue with the file system registration. The two types of err

Re: No filesystem found for scheme s3 using FileIO

2019-09-25 Thread Koprivica,Preston Blake
Not a problem! Thanks for looking into this. In reading through the source associated with the stacktrace, I also noticed that there's neither user-code, nor beam-to-flink lifecycle code available for initialization. As far as I could tell, it was pure flink down to the coders. Nothing new h

Re: A couple questions from someone new to Beam

2019-09-25 Thread Alexey Romanenko
Hi Steve, 1) In general, managing the client connections is a responsibility of IO transform. Usually, one client instance is used per input split (bounded or unbounded source) and it opens a connection in the beginning of reading this split and closes in the end. The is in theory. Practically,

Re: Any possibility to run larger data sets with DirectRunner?

2019-09-25 Thread Lukasz Cwik
+1 for local execution using Flink. On Tue, Sep 17, 2019 at 4:24 AM Paweł Kordek wrote: > Hi Steve > > Maybe local execution on a Flink cluster will work for you: > https://beam.apache.org/documentation/runners/flink/ ? > > Cheers > Pawel > > On Tue, 17 Sep 2019 at 10:51, Steve973 wrote: > >> H

Re: Beam meetup Seattle!! September 26th, 6pm

2019-09-25 Thread Aizhamal Nurmamat kyzy
Gentle reminder that Seattle Apache Beam meetup is happening tomorrow! Here is a quick agenda: - 18:00 - Registrations, speed networking, pizza and drinks. - 18:30 - kick-off - 18:40 - Making Beam Schemas Portable by Brian Hulette - 19:10 - Apache Beam @ Brightcove - A case study - 19:40 - ZetaSQL

How do you call user defined ParDo class from the pipeline in Portable Runner (Apache Flink) ?

2019-09-25 Thread Yu Watanabe
Hello. I would like to ask question for ParDo . I am getting below error inside TaskManager when running code on Apache Flink using Portable Runner. = File "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/bundle_processo

Re: How do you call user defined ParDo class from the pipeline in Portable Runner (Apache Flink) ?

2019-09-25 Thread Kyle Weaver
You will need to set the save_main_session pipeline option to True. Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com On Wed, Sep 25, 2019 at 3:44 PM Yu Watanabe wrote: > Hello. > > I would like to ask question for ParDo . > > I am getting below error inside TaskManager

Re: Python Portable Runner Issues

2019-09-25 Thread Lukasz Cwik
Google Dataflow currently uses a JSON representation of the pipeline graph and also the pipeline proto. We represent the graph in two different ways which leads to some wonderful *features*. Google Dataflow also side steps the Beam job service since Dataflow has its own Job API. Supporting the Beam

Re: Word-count example

2019-09-25 Thread Kyle Weaver
> The only requirement is that TaskManager nodes must have access to Docker. That's not strictly true anymore; there is also the PROCESS option, but I'll warn you it's still under-documented. See https://beam.apache.org/roadmap/portability/#sdk-harness-config > Sorry if/that this is stupid questi

Re: How do you call user defined ParDo class from the pipeline in Portable Runner (Apache Flink) ?

2019-09-25 Thread Yu Watanabe
Thank you for the reply. " save_main_session" did not work, however, situation had changed. 1. get_all_options() output. "save_main_session" set to True. = 2019-09-26 09:04:11,586 DEBUG Pipeline Options: {'wait_until_

Re: How do you call user defined ParDo class from the pipeline in Portable Runner (Apache Flink) ?

2019-09-25 Thread Ankur Goenka
super has some issues wile pickling in python3. Please refer https://github.com/uqfoundation/dill/issues/300 for more details. Removing reference to super in your dofn should help. On Wed, Sep 25, 2019 at 5:13 PM Yu Watanabe wrote: > Thank you for the reply. > > " save_main_session" did not wor

Re: How do you call user defined ParDo class from the pipeline in Portable Runner (Apache Flink) ?

2019-09-25 Thread Yu Watanabe
Thank you for the help. I have chosen to remove the super().__init__() . Thanks, Yu On Thu, Sep 26, 2019 at 9:18 AM Ankur Goenka wrote: > super has some issues wile pickling in python3. Please refer > https://github.com/uqfoundation/dill/issues/300 for more details. > > Removing reference to s

Re: Beam meetup Seattle!! September 26th, 6pm

2019-09-25 Thread Kenneth Knowles
Thanks for organizing. I'll be there! Kenn On Wed, Sep 25, 2019 at 2:50 PM Aizhamal Nurmamat kyzy wrote: > Gentle reminder that Seattle Apache Beam meetup is happening tomorrow! > > Here is a quick agenda: > - 18:00 - Registrations, speed networking, pizza and drinks. > - 18:30 - kick-off > - 1

Re: How do you call user defined ParDo class from the pipeline in Portable Runner (Apache Flink) ?

2019-09-25 Thread Yu Watanabe
Actually there was a good example in the latest wordcount.py in master repo. https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount.py On Thu, Sep 26, 2019 at 12:00 PM Yu Watanabe wrote: > Thank you for the help. > > I have chosen to remove the super().__init__()

How to import external module inside ParDo using Apache Flink ?

2019-09-25 Thread Yu Watanabe
Hello. I would like to ask for help with resolving dependency issue for imported module. I have a directory structure as below and I am trying to import Frames class from frames.py to main.py. = quality-validation/ bin/setup.py main.py