Hi all,

Thank you very much for the responses!

I feel a bit better about using dataproc since it's not in beta like flink
on GKE.

I rebuilt the flinkrunner as you specified but I still get an error. I've
attached the stdout from trying to run with the patched flink runner.

Here's instructions to get a cluster started and to the state right before
I run your patched flink runner instructions:


gcloud dataproc clusters create my-flink-cluster
--optional-components=FLINK,DOCKER --region=us-central1 --image-version=2.0
--enable-component-gateway
gcloud compute ssh my-flink-cluster-m
curl
https://raw.githubusercontent.com/cs109/2015/master/Lectures/Lecture15b/sparklect/shakes/kinglear.txt
> kinglear.txt
curl
https://raw.githubusercontent.com/apache/beam/master/sdks/python/apache_beam/examples/wordcount.py
> wordcount.py
pip install apache_beam apache_beam[gcp]
. /usr/bin/flink-yarn-daemon
python wordcount.py --input kinglear.txt --output my_counts --runner
FlinkRunner --flink_master $FLINK_MASTER_URL --environment_type DOCKER
--flink_job_server_jar
beam/runners/flink/1.13/build/libs/beam-runners-flink-1.13-2.32.0-SNAPSHOT.jar


Let me know if anything stands out to you. Thanks again for the support!
Sorry if I'm missing something silly

On Fri, Jul 9, 2021 at 3:20 PM Kyle Weaver <kcwea...@google.com> wrote:

> That's for Java only. Joey was asking about the portable (Python) example.
>
> On Fri, Jul 9, 2021 at 12:18 PM Tianzi Cai <tia...@google.com> wrote:
>
>> Thanks Kyle so much for forwarding.
>>
>> I was literally just trying this myself and got stuck too (b/
>> <http://b/193180649>193180649 <http://b/193180649>). I finally got it
>> all to work. Please feel free to share with the customer. I can give them
>> repo.reader permission if needed.
>>
>>    1. Run this command to generate the canonical word count example.
>>    mvn archetype:generate \
>>        -DarchetypeGroupId=org.apache.beam \
>>        -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
>>        -DarchetypeVersion=2.30.0 \
>>        -DgroupId=org.example \
>>        -DartifactId=word-count-beam \
>>        -Dversion="0.1" \
>>        -Dpackage=org.apache.beam.examples \
>>        -DinteractiveMode=false
>>    2. Make a few code changes (here
>>    
>> <https://source.cloud.google.com/tz-playground-bigdata/word-count-example/+/gcp:>
>>  are
>>    mine) then make sure that the code works with mvn compile exec:java
>>    -Dexec.mainClass=org.apache.beam.examples.WordCount and you can see
>>    the aggregated results printed out.
>>    3. Run mvn package -Pflink-runner to get the packaged JARs.
>>    4. Upload the uber jar word-count-beam-bundled-0.1.jar to a Cloud
>>    Storage bucket. SSH into my Dataproc master node. Download the uber jar.
>>    5. flink run -c org.apache.beam.examples.WordCount
>>    word-count-beam-bundled-0.1.jar --runner=FlinkRunner
>>
>>
>> On Fri, Jul 9, 2021 at 12:11 PM Kyle Weaver <kcwea...@google.com> wrote:
>>
>>> If you're not committed to Dataproc, you may also want to try running it
>>> on GKE, which AFAIK doesn't have these issues.
>>> https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md
>>>
>>> On Fri, Jul 9, 2021 at 12:08 PM Kyle Weaver <kcwea...@google.com> wrote:
>>>
>>>> Hi Joey,
>>>>
>>>> Jackson dependency issues are likely
>>>> https://issues.apache.org/jira/browse/BEAM-10430. You will have to
>>>> manually patch it until a fix is available in an upcoming Beam release.
>>>>
>>>> 1. Download Beam source from Github
>>>> 2. Check out a patch for the issue, such as
>>>> https://github.com/apache/beam/pull/14953
>>>> 3. Build the Flink runner using command "./gradlew
>>>> :runners:flink:1.13:job-server:shadowJar"
>>>> 4. Use the outputted Flink runner jar in your Python pipeline options
>>>> "--flink_job_server_jar=runners/flink/1.13/build/libs/beam-runners-flink-1.13-2.31.0-SNAPSHOT.jar"
>>>>
>>>> For the "No container id" issue, can you share the full logs?
>>>>
>>>> +Tianzi Cai <tia...@google.com> +Anthony Mancuso <amanc...@google.com>
>>>>
>>>> Thanks,
>>>> Kyle
>>>>
>>>> On Fri, Jul 9, 2021 at 8:47 AM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> /cc @Kyle Weaver <kcwea...@google.com>
>>>>>
>>>>> On Fri, Jul 9, 2021 at 5:24 AM Joey Tran <joey.t...@schrodinger.com>
>>>>> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> I'm trying to just demo Beam/Flink and I tried following the
>>>>>> instructions with Google's Dataproc but I get a bunch of errors ranging
>>>>>> from jackson dependency issues to some issue about "No container id".
>>>>>>
>>>>>> Does anyone know if these dataproc instructions[1] are complete? I
>>>>>> ran through it pretty much word for word and can't get a simple wordcount
>>>>>> going, I'm not sure if I'm somehow messing something up or there's more
>>>>>> necessary than just this doc instructs? FWIW I've been able to run the 
>>>>>> java
>>>>>> wordcount example fine, it seems like I only run into issues when trying 
>>>>>> to
>>>>>> follow the portable runner instructions
>>>>>>
>>>>>> Thanks so much in advance for your help1 I'm not very experienced
>>>>>> with deploying these kinds of things but I wanted to do a demo to show 
>>>>>> that
>>>>>> Beam+Flink is a better solution than writing a framework myself
>>>>>>
>>>>>> [1]
>>>>>> https://cloud.google.com/dataproc/docs/concepts/components/flink#portable_beam_jobs
>>>>>>
>>>>>

Attachment: patched_runner.log
Description: Binary data

Reply via email to