Thanks Dian, that seemed to do the trick. I built a Docker Image simply using:
FROM apache/beam_python3.8_sdk:2.31.0 COPY flink_data/input.txt . I specified nothing in the Pipeline options other than: "--runner=FlinkRunner", "--flink_master=localhost:8081", "--environment_type=DOCKER", "--environment_config=xor/beam_worker:latest" This seemed to do the trick! I’m curious if you have any recommendations on the typical strategies for supplying artifacts like this at runtime, rather than having to build an image for jobs? I think ideally we have a pipeline consistently running somewhere and we can submit data―let’s say CSV files―and the pipeline will process them as a job is kicked off. Given how much discussion I’ve seen around Kafka, I imagine this is done with some sort of distributed messaging framework. If anyone can offer suggestions or resources, I’d very much appreciate it! Thanks again for the help! Best, Adam From: Dian Fu <dian0511...@gmail.com> Date: Friday, September 3, 2021 at 2:10 AM To: Adam Pearce <adam.pea...@xorsecurity.com> Cc: user@flink.apache.org <user@flink.apache.org> Subject: [EXTERNAL]Re: [Question] Basic Python examples.wordcount on local FlinkRunner This seems more like a Beam issue although it uses Flink runner. It would be helpful to also send it to the Beam user mailing list. Regarding to this issue itself, could you check is input.txt accessible in the Docker container? Regards, Dian 2021年9月3日 上午5:19,Adam Pearce <adam.pea...@xorsecurity.com<mailto:adam.pea...@xorsecurity.com>> 写道: Hello all, I’m attempting to run a simple, minimally viable example of a Beam pipeline on Flink. I have installed flink-1.13.2 following the setup instructions, and have successfully run the server and navigated to localhost:8081 to view the Web UI. I can see the job successfully submitted and running. It runs, and completes, but the output is not appropriate and the final output never occurs. I have enabled DEBUG logging for further output, but I don’t really see anything that would indicate issues other than what I am showing below. I’ve attached the complete log. I am running the following from a Python 3.8.12 virtualenv with apache-beam 2.31.0 installed via pip: python -m apache_beam.examples.wordcount --input input.txt --output counts --runner FlinkRunner --flink_master="localhost:8081" [--flink_submit_uber_jar] I have tried with and without “--flink_submit_uber_jar” without any change. The local embedded run of this pipeline works (same command as above, omitting “flink_master”. I understand that running this through the Flink server will use docker to stand up containers to perform the work. It appears to be successfully pulling the image: “apache/beam_python3.8_sdk:2.31.0”. I’m curious if there is some issue with the Python container, because in the logs, I am seeing: 2021/09/02 20:57:54 Initializing python harness: /opt/apache/beam/boot --id=5-1 --provision_endpoint=host.docker.internal:55847 2021/09/02 20:57:54 Downloaded: /tmp/staged/pickled_main_session (sha256:4e9a1199bade55ad73ae6872c8f156c69227ef23d8155f19dead745264999084, size: 3029) 2021/09/02 20:57:54 Found artifact: pickled_main_session 2021/09/02 20:57:54 Installing setup packages ... 2021/09/02 20:57:54 Executing: python -m apache_beam.runners.worker.sdk_worker_main 2021/09/02 20:57:55 Python exited: <nil> I’ve searched high and low for that “Python exited: <nil>” but have found very little. Further context: Operating system: macOS 11.5.2 Docker: Docker version 20.10.8, build 3967b7d Python: 3.8.12 (virtualenv) Beam: 2.31.0 Flink: 1.13.2 This communication is the property of XOR Security and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments. <flink-root-standalonesession-3-MacBook-Pro.local.log><flink-root-taskexecutor-3-MacBook-Pro.local.log> This communication is the property of XOR Security and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.