Beam AmqpIO

2019-09-18 Thread kasper
Hello, AmqpIO seems to have a dependency on the org.apache.qpid.proton.messenger package in usage but I do not seem to find that package anywhere in a public nexus repository. The net effect seems to be a classdefnotfound concerning org.apache.qpid.proton.messenger.Messenger$Factory. Any hint

Re: Beam AmqpIO

2019-09-18 Thread Jean-Baptiste Onofré
Hi, It's provided by Apache QPid proton-j: org.apache.qpid:proton-j:0.13.1 Regards JB On 18/09/2019 10:34, kasper wrote: > Hello, > > AmqpIO seems to have a dependency on the > org.apache.qpid.proton.messenger package in usage but I do not seem to > find that package anywhere in a public nexus

Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Yu Watanabe
Hello. I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit job to AWS EMR (5.26.0). However, I get below error when I run the pipeline and fail. - Caused by: java.lang.Exception: The user defined 'open()' method caused an

Re: Beam AmqpIO

2019-09-18 Thread Alexey Romanenko
It seems that we use pretty outdated version of proton-j, current version is 0.33.2. In the same time, Messenger API was deprecated and removed a while ago. So, updating to new version won’t be so easy. What can be as alternative for this? > On 18 Sep 2019, at 10:47, Jean-Baptiste Onofré wro

Re: Beam AmqpIO

2019-09-18 Thread Jean-Baptiste Onofré
Hi, As the original author of AmqpIO, I can update, but it requires some internal changes to the IO (especially the way we are dealing with checkpoint). I will create a Jira and work on an update. Regards JB On 18/09/2019 12:50, Alexey Romanenko wrote: > It seems that we use pretty outdated ver

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Benjamin Tan
Seems like docker is not installed. Maybe run with PROCESS as the environment type ? Or install docker. Sent from my iPhone > On 18 Sep 2019, at 6:40 PM, Yu Watanabe wrote: > > Hello. > > I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit job to > AWS EMR (5.26.0). > >

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Yu Watanabe
Benjamin. Thank you for the reply. Per your suggest, I read the design sheet and it states that harness container is a mandatory settings for all TaskManger. https://s.apache.org/portable-flink-runner-overview > The Flink cluster itself is deployed as normal. For example, it might be deployed

Re: Word-count example

2019-09-18 Thread Matthew Patterson
Tried " mvn archetype:generate \ -DarchetypeGroupId=org.apache.beam \ -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ -DarchetypeVersion=2.16.0 \ -DgroupId=org.example \ -DartifactId=word-count-beam \ -Dversion="0.1" \ -Dpackage=org.apache

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Benjamin Tan
It’s command: /opt/Apache/beam/boot in json. U might be able to find some examples online. I’ll reply you when I get home to paste the actual command. Sent from my iPhone > On 18 Sep 2019, at 9:35 PM, Yu Watanabe wrote: > > Benjamin. > > Thank you for the reply. > Per your suggest, I read th

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Benjamin Tan
Try this as part of PipelineOptions: --environment_config={\"command\":\"/opt/apache/beam/boot\"} On 2019/09/18 10:40:42, Yu Watanabe wrote: > Hello. > > I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit job > to AWS EMR (5.26.0). > > However, I get below error when I run

Where is /opt/apache/beam/boot?

2019-09-18 Thread Benjamin Tan
I'm trying to use the process environment_config and I have no idea where is /opt/apache/beam/boot. Is there a way to generate this?

Re: Python Portable Runner Issues

2019-09-18 Thread Holden Karau
Probably the most stable is running on Dataflow still. But I’m excited to see the progress towards a Spark runner, can’t wait to try TFT on it :) On Tue, Sep 17, 2019 at 4:37 PM Kyle Weaver wrote: > The Flink runner is definitely more stable, as it's been around for longer > and has more develop

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Yu Watanabe
Thank you for the reply. I see files "boot" under below directories. But these seems to be used for containers. (python) admin@ip-172-31-9-89:~/beam-release-2.15.0$ find ./ -name "boot" -exec ls -l {} \; lrwxrwxrwx 1 admin admin 23 Sep 16 23:43 ./sdks/python/container/.gogradle/project_gopath/s

Re: Where is /opt/apache/beam/boot?

2019-09-18 Thread Lukasz Cwik
It is embedded inside the docker container that corresponds to which SDK your using. Python container boot src: https://github.com/apache/beam/blob/master/sdks/python/container/boot.go Java container boot src: https://github.com/apache/beam/blob/master/sdks/java/container/boot.go Go container boot

Re: Python Portable Runner Issues

2019-09-18 Thread Chad Dombrova
Just note that while Dataflow does have robust python support it does not fully support the portability framework. It’s a bit of a blurry distinction, and honestly I’m not crystal clear on this as I get the impression that Dataflow may be a bit of a Portability hybrid. It does not use the job ser

Re: Where is /opt/apache/beam/boot?

2019-09-18 Thread Benjamin Tan
You are awesome. Thanks! On 2019/09/18 15:08:08, Lukasz Cwik wrote: > It is embedded inside the docker container that corresponds to which SDK > your using. > > Python container boot src: > https://github.com/apache/beam/blob/master/sdks/python/container/boot.go > Java container boot src: > ht

Prevent Shuffling on Writing Files

2019-09-18 Thread Shannon Duncan
We have been using Beam for a bit now. However we just turned on the dataflow shuffle service and were very surprised that the shuffled data amounts were quadruple the amounts we expected. Turns out that the file writing TextIO is doing shuffles within itself. Is there a way to prevent shuffling

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Kyle Weaver
> Per your suggest, I read the design sheet and it states that harness container is a mandatory settings for all TaskManger. That doc is out of date. As Benjamin said, it's not strictly required any more to use Docker. However, it is still recommended, as Docker makes managing dependencies a lot

Re: Submitting job to AWS EMR fails with error=2, No such file or directory

2019-09-18 Thread Ankur Goenka
Adding to the previous suggestions. You can also add "--retain_docker_container" to your pipeline option and later login to the machine to check the docker container log. Also, in my experience running on yarn, the yarn user some time do not have access to use docker. I would suggest checking if t

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-18 Thread Maximilian Michels
I disagree that this flag is obsolete. It is still serving a purpose for batch users using dataflow runner and that is decent chunk of beam python users. It is obsolete for the PortableRunner. If the Dataflow Runner needs this flag, couldn't we simply add it there? As far as I know Dataflow us

KinesisIO.write() returning NoSuchMethodError: com.google.common.util.concurrent.Futures.addCallback

2019-09-18 Thread Ankit Jhalaria
Hey beam devs, I am trying to use KinesisIO.write() with beam 2.13, running on flink and its failing while trying to do Futures.addCallback(f, new UserRecordResultFutureCallback()); Its currently pulling in beam-vendor-guava-20_0-0.1.jar I have tried updating bringing in a current version but th

Re: Prevent Shuffling on Writing Files

2019-09-18 Thread Shannon Duncan
So I followed up on why TextIO shuffles and dug into the code some. It is using the shards and getting all the values into a keyed group to write to a single file. However... I wonder if there is way to just take the records that are on a worker and write them out. Thus not needing a shard number

Re: Prevent Shuffling on Writing Files

2019-09-18 Thread Jeff Klukas
What you propose with a writer per bundle is definitely possible, but I expect the blocker is that in most cases the runner has control of bundle sizes and there's nothing exposed to the user to control that. I've wanted to do similar, but found average bundle sizes in my case on Dataflow to be so

Re: Prevent Shuffling on Writing Files

2019-09-18 Thread Chamikara Jayalath
Are you specifying the number of shards to write to: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L859 If so, this will incur an additional shuffle to re-distribute data written by all workers into the given number of shards before writ

Re: Prevent Shuffling on Writing Files

2019-09-18 Thread Shannon Duncan
batch on dataflowRunner. On Wed, Sep 18, 2019 at 4:05 PM Reuven Lax wrote: > Are you using streaming or batch? Also which runner are you using? > > On Wed, Sep 18, 2019 at 1:57 PM Shannon Duncan > wrote: > >> So I followed up on why TextIO shuffles and dug into the code some. It is >> using the

Re: Prevent Shuffling on Writing Files

2019-09-18 Thread Shannon Duncan
I will attempt to do without sharding (though I believe we did do a run without shards and it incurred the extra shuffle costs). Pipeline is simple. The only shuffle that is explicitly defined is the shuffle after merging files together into a single PCollection (Flatten Transform). So it's a Re

Re: Flink Runner logging FAILED_TO_UNCOMPRESS

2019-09-18 Thread Ahmet Altay
I believe the flag was never relevant for PortableRunner. I might be wrong as well. The flag affects a few bits in the core code and that is why the solution cannot be by just setting the flag in Dataflow runner. It requires some amount of clean up. I agree that it would be good to clean this up, a

Re: Word-count example

2019-09-18 Thread Ankur Goenka
Hi Matthew, Beam 2.16.0 is not yet released hence you are getting the error. Can you try using 2.15.0 version? Thanks, Ankur On Wed, Sep 18, 2019 at 6:59 AM Matthew Patterson wrote: > Tried > > " > mvn archetype:generate \ > -DarchetypeGroupId=org.apache.beam \ > -DarchetypeArtifac

apache beam 2.16.0 ?

2019-09-18 Thread Yu Watanabe
Hello. I would like to use 2.16.0 to diagnose container problem, however, looks like the job-server is not available on maven yet. RuntimeError: Unable to fetch remote job server jar at https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.8-job-server/2.16.0/beam-runners-flin

Re: apache beam 2.16.0 ?

2019-09-18 Thread Rui Wang
Hi, You can search(or ask) in dev@ for the progress of 2.16.0. The answer is the release of 2.16.0 is ongoing and it will be released once blockers are solved. -Rui On Wed, Sep 18, 2019 at 9:34 PM Yu Watanabe wrote: > Hello. > > I would like to use 2.16.0 to diagnose container problem, howev

Re: apache beam 2.16.0 ?

2019-09-18 Thread Yu Watanabe
Okay. Thank you. I also confirmed that 2 builds are failing. https://github.com/apache/beam/tree/release-2.16.0 Thanks, Yu On Thu, Sep 19, 2019 at 1:41 PM Rui Wang wrote: > Hi, > > You can search(or ask) in dev@ for the progress of 2.16.0. > > The answer is the release of 2.16.0 is ongoing an