Hi all, Thank you very much for the responses!
I feel a bit better about using dataproc since it's not in beta like flink on GKE. I rebuilt the flinkrunner as you specified but I still get an error. I've attached the stdout from trying to run with the patched flink runner. Here's instructions to get a cluster started and to the state right before I run your patched flink runner instructions: gcloud dataproc clusters create my-flink-cluster --optional-components=FLINK,DOCKER --region=us-central1 --image-version=2.0 --enable-component-gateway gcloud compute ssh my-flink-cluster-m curl https://raw.githubusercontent.com/cs109/2015/master/Lectures/Lecture15b/sparklect/shakes/kinglear.txt > kinglear.txt curl https://raw.githubusercontent.com/apache/beam/master/sdks/python/apache_beam/examples/wordcount.py > wordcount.py pip install apache_beam apache_beam[gcp] . /usr/bin/flink-yarn-daemon python wordcount.py --input kinglear.txt --output my_counts --runner FlinkRunner --flink_master $FLINK_MASTER_URL --environment_type DOCKER --flink_job_server_jar beam/runners/flink/1.13/build/libs/beam-runners-flink-1.13-2.32.0-SNAPSHOT.jar Let me know if anything stands out to you. Thanks again for the support! Sorry if I'm missing something silly On Fri, Jul 9, 2021 at 3:20 PM Kyle Weaver <kcwea...@google.com> wrote: > That's for Java only. Joey was asking about the portable (Python) example. > > On Fri, Jul 9, 2021 at 12:18 PM Tianzi Cai <tia...@google.com> wrote: > >> Thanks Kyle so much for forwarding. >> >> I was literally just trying this myself and got stuck too (b/ >> <http://b/193180649>193180649 <http://b/193180649>). I finally got it >> all to work. Please feel free to share with the customer. I can give them >> repo.reader permission if needed. >> >> 1. Run this command to generate the canonical word count example. >> mvn archetype:generate \ >> -DarchetypeGroupId=org.apache.beam \ >> -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ >> -DarchetypeVersion=2.30.0 \ >> -DgroupId=org.example \ >> -DartifactId=word-count-beam \ >> -Dversion="0.1" \ >> -Dpackage=org.apache.beam.examples \ >> -DinteractiveMode=false >> 2. Make a few code changes (here >> >> <https://source.cloud.google.com/tz-playground-bigdata/word-count-example/+/gcp:> >> are >> mine) then make sure that the code works with mvn compile exec:java >> -Dexec.mainClass=org.apache.beam.examples.WordCount and you can see >> the aggregated results printed out. >> 3. Run mvn package -Pflink-runner to get the packaged JARs. >> 4. Upload the uber jar word-count-beam-bundled-0.1.jar to a Cloud >> Storage bucket. SSH into my Dataproc master node. Download the uber jar. >> 5. flink run -c org.apache.beam.examples.WordCount >> word-count-beam-bundled-0.1.jar --runner=FlinkRunner >> >> >> On Fri, Jul 9, 2021 at 12:11 PM Kyle Weaver <kcwea...@google.com> wrote: >> >>> If you're not committed to Dataproc, you may also want to try running it >>> on GKE, which AFAIK doesn't have these issues. >>> https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md >>> >>> On Fri, Jul 9, 2021 at 12:08 PM Kyle Weaver <kcwea...@google.com> wrote: >>> >>>> Hi Joey, >>>> >>>> Jackson dependency issues are likely >>>> https://issues.apache.org/jira/browse/BEAM-10430. You will have to >>>> manually patch it until a fix is available in an upcoming Beam release. >>>> >>>> 1. Download Beam source from Github >>>> 2. Check out a patch for the issue, such as >>>> https://github.com/apache/beam/pull/14953 >>>> 3. Build the Flink runner using command "./gradlew >>>> :runners:flink:1.13:job-server:shadowJar" >>>> 4. Use the outputted Flink runner jar in your Python pipeline options >>>> "--flink_job_server_jar=runners/flink/1.13/build/libs/beam-runners-flink-1.13-2.31.0-SNAPSHOT.jar" >>>> >>>> For the "No container id" issue, can you share the full logs? >>>> >>>> +Tianzi Cai <tia...@google.com> +Anthony Mancuso <amanc...@google.com> >>>> >>>> Thanks, >>>> Kyle >>>> >>>> On Fri, Jul 9, 2021 at 8:47 AM Ahmet Altay <al...@google.com> wrote: >>>> >>>>> /cc @Kyle Weaver <kcwea...@google.com> >>>>> >>>>> On Fri, Jul 9, 2021 at 5:24 AM Joey Tran <joey.t...@schrodinger.com> >>>>> wrote: >>>>> >>>>>> Hello! >>>>>> >>>>>> I'm trying to just demo Beam/Flink and I tried following the >>>>>> instructions with Google's Dataproc but I get a bunch of errors ranging >>>>>> from jackson dependency issues to some issue about "No container id". >>>>>> >>>>>> Does anyone know if these dataproc instructions[1] are complete? I >>>>>> ran through it pretty much word for word and can't get a simple wordcount >>>>>> going, I'm not sure if I'm somehow messing something up or there's more >>>>>> necessary than just this doc instructs? FWIW I've been able to run the >>>>>> java >>>>>> wordcount example fine, it seems like I only run into issues when trying >>>>>> to >>>>>> follow the portable runner instructions >>>>>> >>>>>> Thanks so much in advance for your help1 I'm not very experienced >>>>>> with deploying these kinds of things but I wanted to do a demo to show >>>>>> that >>>>>> Beam+Flink is a better solution than writing a framework myself >>>>>> >>>>>> [1] >>>>>> https://cloud.google.com/dataproc/docs/concepts/components/flink#portable_beam_jobs >>>>>> >>>>>
patched_runner.log
Description: Binary data