Hello,

My high level recommendations are not using Docker for a Flink cluster and
using WSL. I wrote a couple of posts about Apache Beam local development
with/without Kubernetes, and you would find them useful.

Without Kubernetes
- Apache Beam Local Development With Python - Part 3 Flink Runner
https://jaehyeon.me/blog/2024-04-18-beam-local-dev-3/
With Kubernetes
- Deploy Python pipelines on Kubernetes using the Flink runner -
https://beam.apache.org/blog/deploy-python-pipeline-on-flink-runner/

Cheers,
Jaehyeon

On Thu, 7 Nov 2024 at 00:57, Robert Sianta <robert.sia...@gmail.com> wrote:

> Hi community!
>
> This is my first contact with community, and it is related to our first
> steps with Beam.
>
> I am trying to establish a development environment to develop Apache Beam
> pipelines, that are running on Flink Runner, and are using Python SDK
> harness to handle some of the tasks. I aim to develop the Beam pipelines
> using Python.
> To establish this environment i am using Docker.
>
> My current setup is:
> Windows 10 64-bit, 32 GB Ram on system
> Docker 2.5.0.0 Stable, Engine 19.03.13, Compose 1.27.4
> On local system: Python 3.9, java version "1.8.0_361"
> docker beam image: apache/beam_python3.9_sdk:2.48.0
> docker flink image: apache/flink:1.14.4-java8
>
> For the way, how to execute the tasks on the python SDK harness, i tryed
> environment_type='PROCESS' and 'EXTERNAL'. Where the goal is to use the
> 'PROCESS' variant. But to better understand what is going on, i am using
> the EXTERNAL option, and the service is launched manually inside of the
> taskmanager container executing this "/opt/apache/beam/boot --worker_pool".
> The "boot" itself is installed in the image as defined in the Dockerfile.
>
> My job is a simple hello-world program, containing just a few prints. The
> PipelineOptions are:
>
> ...
>
> beam_options = PipelineOptions(
>     save_main_session=False,
>     setup_file="./setup.py",
>     runner='FlinkRunner',
>     flink_master='localhost:8081',
>     flink_version='1.14',
>     parallelism=1,
>     environment_type='EXTERNAL',
>     environment_config='localhost:50000'
> )
>
> print("tlacim nejaky hello world vypis")
> print(beam_options.get_all_options())
> print("koniec vypisu")
>
> app.run(
>     input_text=args.input_text,
>     beam_options=beam_options,
> )
>
> where the setup.py is including requirements.txt listed at the bottom of
> this email.
>
> Now when i execute the pipeline, it is successfully builded and deployed
> to the Flink runner.
>
> The building process outputs this:
>
> C:\Users\rsianta\Miniconda3\envs\BeamNewest\python.exe
> C:/smc/beam/beam-starter-python/main.py
> tlacim nejaky hello world vypis
> {'runner': 'FlinkRunner', 'streaming': False, 'resource_hints': [],
> 'beam_services': {}, 'type_check_strictness': 'DEFAULT_TO_ANY',
> 'type_check_additional': '', 'pipeline_type_check': True,
> 'runtime_type_check': False, 'performance_runtime_type_check': False,
> 'allow_non_deterministic_key_coders': False, 'allow_unsafe_triggers':
> False, 'direct_runner_use_stacked_bundle': True,
> 'direct_runner_bundle_repeat': 0, 'direct_num_workers': 1,
> 'direct_running_mode': 'in_memory', 'direct_embed_docker_python': False,
> 'direct_test_splits': {}, 'dataflow_endpoint': '
> https://dataflow.googleapis.com', 'project': None, 'job_name': None,
> 'staging_location': None, 'temp_location': None, 'region': None,
> 'service_account_email': None, 'no_auth': False, 'template_location': None,
> 'labels': None, 'update': False, 'transform_name_mapping': None,
> 'enable_streaming_engine': False, 'dataflow_kms_key': None,
> 'create_from_snapshot': None, 'flexrs_goal': None,
> 'dataflow_service_options': None, 'enable_hot_key_logging': False,
> 'enable_artifact_caching': False, 'impersonate_service_account': None,
> 'gcp_oauth_scopes': ['https://www.googleapis.com/auth/bigquery', '
> https://www.googleapis.com/auth/cloud-platform', '
> https://www.googleapis.com/auth/devstorage.full_control', '
> https://www.googleapis.com/auth/userinfo.email', '
> https://www.googleapis.com/auth/datastore', '
> https://www.googleapis.com/auth/spanner.admin', '
> https://www.googleapis.com/auth/spanner.data'],
> 'azure_connection_string': None, 'blob_service_endpoint': None,
> 'azure_managed_identity_client_id': None, 'hdfs_host': None, 'hdfs_port':
> None, 'hdfs_user': None, 'hdfs_full_urls': False, 'num_workers': None,
> 'max_num_workers': None, 'autoscaling_algorithm': None, 'machine_type':
> None, 'disk_size_gb': None, 'disk_type': None, 'worker_region': None,
> 'worker_zone': None, 'zone': None, 'network': None, 'subnetwork': None,
> 'worker_harness_container_image': None, 'sdk_container_image': None,
> 'sdk_harness_container_image_overrides': None,
> 'default_sdk_harness_log_level': None, 'sdk_harness_log_level_overrides':
> None, 'use_public_ips': None, 'min_cpu_platform': None,
> 'dataflow_job_file': None, 'experiments': None,
> 'number_of_worker_harness_threads': None, 'profile_cpu': False,
> 'profile_memory': False, 'profile_location': None, 'profile_sample_rate':
> 1.0, 'requirements_file': None, 'requirements_cache': None,
> 'requirements_cache_only_sources': False, 'setup_file': './setup.py',
> 'beam_plugins': None, 'pickle_library': 'default', 'save_main_session':
> False, 'sdk_location': 'default', 'extra_packages': None,
> 'prebuild_sdk_container_engine': None, 'prebuild_sdk_container_base_image':
> None, 'cloud_build_machine_type': None, 'docker_registry_push_url': None,
> 'job_endpoint': None, 'artifact_endpoint': None, 'job_server_timeout': 60,
> 'environment_type': 'EXTERNAL', 'environment_config': 'localhost:50000',
> 'environment_options': None, 'sdk_worker_parallelism': 1,
> 'environment_cache_millis': 0, 'output_executable_path': None,
> 'artifacts_dir': None, 'job_port': 0, 'artifact_port': 0, 'expansion_port':
> 0, 'job_server_java_launcher': 'java', 'job_server_jvm_properties': [],
> 'flink_master': 'localhost:8081', 'flink_version': '1.14',
> 'flink_job_server_jar': None, 'flink_submit_uber_jar': False,
> 'parallelism': 1, 'max_parallelism': -1, 'spark_master_url': 'local[4]',
> 'spark_job_server_jar': None, 'spark_submit_uber_jar': False,
> 'spark_rest_url': None, 'spark_version': '3', 'on_success_matcher': None,
> 'dry_run': False, 'wait_until_finish_duration': None, 'pubsubRootUrl':
> None, 's3_access_key_id': None, 's3_secret_access_key': None,
> 's3_session_token': None, 's3_endpoint_url': None, 's3_region_name': None,
> 's3_api_version': None, 's3_verify': None, 's3_disable_ssl': False}
> koniec vypisu
> INFO:root:Default Python SDK image for environment is
> apache/beam_python3.9_sdk:2.48.0
> INFO:apache_beam.runners.portability.stager:Executing command:
> ['C:\\Users\\rsianta\\Miniconda3\\envs\\BeamNewest\\python.exe',
> 'setup.py', 'sdist', '--dist-dir',
> 'C:\\Users\\rsianta\\AppData\\Local\\Temp\\tmpggi9cf3y']
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
> <function pack_combiners at 0x000001E6656CD670> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
> <function lift_combiners at 0x000001E6656CD700> ====================
> INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
> <function sort_stages at 0x000001E6656CDE50> ====================
> INFO:apache_beam.runners.portability.flink_runner:Adding HTTP protocol
> scheme to flink_master parameter: http://localhost:8081
> INFO:apache_beam.utils.subprocess_server:Using cached job server jar from
> https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.14-job-server/2.48.0/beam-runners-flink-1.14-job-server-2.48.0.jar
> INFO:apache_beam.utils.subprocess_server:Starting service with ['java'
> '-jar'
> 'C:\\Users\\rsianta/.apache_beam/cache/jars\\beam-runners-flink-1.14-job-server-2.48.0.jar'
> '--flink-master' 'http://localhost:8081' '--artifacts-dir'
> 'C:\\Users\\rsianta\\AppData\\Local\\Temp\\beam-temptolduaa5\\artifactsbjonha4y'
> '--job-port' '57745' '--artifact-port' '0' '--expansion-port' '0']
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
> org.apache.beam.runners.jobsubmission.JobServerDriver
> createArtifactStagingService
> INFO:apache_beam.utils.subprocess_server:INFO: ArtifactStagingService
> started on localhost:57778
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
> org.apache.beam.runners.jobsubmission.JobServerDriver createExpansionService
> INFO:apache_beam.utils.subprocess_server:INFO: Java ExpansionService
> started on localhost:57779
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
> org.apache.beam.runners.jobsubmission.JobServerDriver createJobServer
> INFO:apache_beam.utils.subprocess_server:INFO: JobService started on
> localhost:57745
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
> org.apache.beam.runners.jobsubmission.JobServerDriver run
> INFO:apache_beam.utils.subprocess_server:INFO: Job server now running,
> terminate with Ctrl+C
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
> org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2 onNext
> INFO:apache_beam.utils.subprocess_server:INFO: Staging artifacts for
> job_321fed50-f7ae-4e93-8774-2bf3fe155938.
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
> org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2
> resolveNextEnvironment
> INFO:apache_beam.utils.subprocess_server:INFO: Resolving artifacts for
> job_321fed50-f7ae-4e93-8774-2bf3fe155938.ref_Environment_default_environment_1.
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
> org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2 onNext
> INFO:apache_beam.utils.subprocess_server:INFO: Getting 1 artifacts for
> job_321fed50-f7ae-4e93-8774-2bf3fe155938.null.
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
> org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2
> finishStaging
> INFO:apache_beam.utils.subprocess_server:INFO: Artifacts fully staged for
> job_321fed50-f7ae-4e93-8774-2bf3fe155938.
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM
> org.apache.beam.runners.flink.FlinkJobInvoker invokeWithExecutor
> INFO:apache_beam.utils.subprocess_server:INFO: Invoking job
> BeamApp-rsianta-1106130652-47ae11d_ad4e7a8b-eb96-45b4-8498-74b2eaa8ad66
> with pipeline runner
> org.apache.beam.runners.flink.FlinkPipelineRunner@48a4ca17
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM
> org.apache.beam.runners.jobsubmission.JobInvocation start
> INFO:apache_beam.utils.subprocess_server:INFO: Starting job invocation
> BeamApp-rsianta-1106130652-47ae11d_ad4e7a8b-eb96-45b4-8498-74b2eaa8ad66
> INFO:apache_beam.runners.portability.portable_runner:Job state changed to
> STOPPED
> INFO:apache_beam.runners.portability.portable_runner:Job state changed to
> STARTING
> INFO:apache_beam.runners.portability.portable_runner:Job state changed to
> RUNNING
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM
> org.apache.beam.runners.flink.FlinkPipelineRunner runPipelineWithTranslator
> INFO:apache_beam.utils.subprocess_server:INFO: Translating pipeline to
> Flink program.
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM
> org.apache.beam.runners.flink.FlinkExecutionEnvironments
> createBatchExecutionEnvironment
> INFO:apache_beam.utils.subprocess_server:INFO: Creating a Batch Execution
> Environment.
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM
> org.apache.beam.runners.flink.FlinkExecutionEnvironments
> createBatchExecutionEnvironment
> INFO:apache_beam.utils.subprocess_server:INFO: Using Flink Master URL
> localhost:8081.
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:54 PM
> org.apache.flink.api.java.utils.PlanGenerator logTypeRegistrationDetails
> INFO:apache_beam.utils.subprocess_server:INFO: The job has 0 registered
> types and 0 default Kryo serializers
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:54 PM
> org.apache.flink.client.program.rest.RestClusterClient lambda$submitJob$7
> INFO:apache_beam.utils.subprocess_server:INFO: Submitting job
> 'BeamApp-rsianta-1106130652-47ae11d' (cb7421fd5da8f581828f4b39e809c0e5).
> INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:07:03 PM
> org.apache.flink.client.program.rest.RestClusterClient lambda$null$6
> INFO:apache_beam.utils.subprocess_server:INFO: Successfully submitted job
> 'BeamApp-rsianta-1106130652-47ae11d' (cb7421fd5da8f581828f4b39e809c0e5) to '
> http://localhost:8081'.
>
> I am observing the outputs of the SDK Harness on the taskmanager console
>
> The SDK Harness stdout is this:
>
> # /opt/apache/beam/boot --worker_pool
> 2024/11/06 13:05:38 Starting worker pool 365: python -m
> apache_beam.runners.worker.worker_pool_main --service_port=50000
> --container_executable=/opt/apache/beam/boot
> Starting worker with command ['/opt/apache/beam/boot', '--id=1-1',
> '--logging_endpoint=localhost:44529',
> '--artifact_endpoint=localhost:43809',
> '--provision_endpoint=localhost:38551',
> '--control_endpoint=localhost:41251']
> 2024/11/06 13:07:07 Provision info:
> pipeline_options:{fields:{key:"beam:option:allow_non_deterministic_key_coders:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:allow_non_restored_state:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:allow_unsafe_triggers:v1"
> value:{bool_value:false}} fields:{key:"beam:option:app_name:v1"
> value:{null_value:NULL_VALUE}} fields:{key:"beam:option:artifact_port:v1"
> value:{string_value:"0"}}
> fields:{key:"beam:option:auto_balance_write_files_sharding_enabled:v1"
> value:{bool_value:false}} fields:{key:"beam:option:beam_services:v1"
> value:{struct_value:{}}} fields:{key:"beam:option:block_on_run:v1"
> value:{bool_value:true}} fields:{key:"beam:option:dataflow_endpoint:v1"
> value:{string_value:"https://dataflow.googleapis.com"}}
> fields:{key:"beam:option:direct_embed_docker_python:v1"
> value:{bool_value:false}} fields:{key:"beam:option:direct_num_workers:v1"
> value:{string_value:"1"}}
> fields:{key:"beam:option:direct_runner_bundle_repeat:v1"
> value:{string_value:"0"}}
> fields:{key:"beam:option:direct_runner_use_stacked_bundle:v1"
> value:{bool_value:true}} fields:{key:"beam:option:direct_running_mode:v1"
> value:{string_value:"in_memory"}}
> fields:{key:"beam:option:direct_test_splits:v1" value:{struct_value:{}}}
> fields:{key:"beam:option:disable_metrics:v1" value:{bool_value:false}}
> fields:{key:"beam:option:dry_run:v1" value:{bool_value:false}}
> fields:{key:"beam:option:enable_artifact_caching:v1"
> value:{bool_value:false}} fields:{key:"beam:option:enable_bundling:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:enable_hot_key_logging:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:enable_streaming_engine:v1"
> value:{bool_value:false}} fields:{key:"beam:option:enforce_encodability:v1"
> value:{bool_value:true}} fields:{key:"beam:option:enforce_immutability:v1"
> value:{bool_value:true}}
> fields:{key:"beam:option:environment_cache_millis:v1"
> value:{string_value:"0"}} fields:{key:"beam:option:environment_config:v1"
> value:{string_value:"localhost:50000"}}
> fields:{key:"beam:option:environment_type:v1"
> value:{string_value:"EXTERNAL"}}
> fields:{key:"beam:option:expansion_port:v1" value:{string_value:"0"}}
> fields:{key:"beam:option:experiments:v1"
> value:{list_value:{values:{string_value:"beam_fn_api"}}}}
> fields:{key:"beam:option:externalized_checkpoints_enabled:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:fail_on_checkpointing_errors:v1"
> value:{bool_value:true}} fields:{key:"beam:option:faster_copy:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:finish_bundle_before_checkpointing:v1"
> value:{bool_value:false}} fields:{key:"beam:option:flink_conf_dir:v1"
> value:{null_value:NULL_VALUE}} fields:{key:"beam:option:flink_master:v1"
> value:{string_value:"http://localhost:8081"}}
> fields:{key:"beam:option:flink_submit_uber_jar:v1"
> value:{bool_value:false}} fields:{key:"beam:option:flink_version:v1"
> value:{string_value:"1.14"}} fields:{key:"beam:option:gcp_oauth_scopes:v1"
> value:{list_value:{values:{string_value:"
> https://www.googleapis.com/auth/bigquery"} values:{string_value:"
> https://www.googleapis.com/auth/cloud-platform"} values:{string_value:"
> https://www.googleapis.com/auth/devstorage.full_control"}
> values:{string_value:"https://www.googleapis.com/auth/userinfo.email"}
> values:{string_value:"https://www.googleapis.com/auth/datastore"}
> values:{string_value:"https://www.googleapis.com/auth/spanner.admin"}
> values:{string_value:"https://www.googleapis.com/auth/spanner.data"}}}}
> fields:{key:"beam:option:gcs_performance_metrics:v1"
> value:{bool_value:false}} fields:{key:"beam:option:hdfs_full_urls:v1"
> value:{bool_value:false}} fields:{key:"beam:option:job_name:v1"
> value:{string_value:"BeamApp-rsianta-1106130652-47ae11d"}}
> fields:{key:"beam:option:job_port:v1" value:{string_value:"0"}}
> fields:{key:"beam:option:job_server_java_launcher:v1"
> value:{string_value:"java"}}
> fields:{key:"beam:option:job_server_jvm_properties:v1"
> value:{list_value:{}}} fields:{key:"beam:option:job_server_timeout:v1"
> value:{string_value:"60"}}
> fields:{key:"beam:option:load_balance_bundles:v1" value:{bool_value:false}}
> fields:{key:"beam:option:max_parallelism:v1" value:{string_value:"-1"}}
> fields:{key:"beam:option:no_auth:v1" value:{bool_value:false}}
> fields:{key:"beam:option:object_reuse:v1" value:{bool_value:false}}
> fields:{key:"beam:option:operator_chaining:v1" value:{bool_value:true}}
> fields:{key:"beam:option:options_id:v1" value:{number_value:2}}
> fields:{key:"beam:option:output_executable_path:v1"
> value:{null_value:NULL_VALUE}} fields:{key:"beam:option:parallelism:v1"
> value:{string_value:"1"}}
> fields:{key:"beam:option:performance_runtime_type_check:v1"
> value:{bool_value:false}} fields:{key:"beam:option:pickle_library:v1"
> value:{string_value:"default"}}
> fields:{key:"beam:option:pipeline_type_check:v1" value:{bool_value:true}}
> fields:{key:"beam:option:profile_cpu:v1" value:{bool_value:false}}
> fields:{key:"beam:option:profile_memory:v1" value:{bool_value:false}}
> fields:{key:"beam:option:profile_sample_rate:v1" value:{number_value:1}}
> fields:{key:"beam:option:re_iterable_group_by_key_result:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:requirements_cache_only_sources:v1"
> value:{bool_value:false}} fields:{key:"beam:option:resource_hints:v1"
> value:{list_value:{}}}
> fields:{key:"beam:option:retain_docker_containers:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:retain_externalized_checkpoints_on_cancellation:v1"
> value:{bool_value:false}} fields:{key:"beam:option:runner:v1"
> value:{null_value:NULL_VALUE}}
> fields:{key:"beam:option:runner_determined_sharding:v1"
> value:{bool_value:true}} fields:{key:"beam:option:runtime_type_check:v1"
> value:{bool_value:false}} fields:{key:"beam:option:s3_disable_ssl:v1"
> value:{bool_value:false}} fields:{key:"beam:option:save_main_session:v1"
> value:{bool_value:false}} fields:{key:"beam:option:sdk_location:v1"
> value:{string_value:"container"}}
> fields:{key:"beam:option:sdk_worker_parallelism:v1"
> value:{string_value:"1"}} fields:{key:"beam:option:setup_file:v1"
> value:{string_value:"./setup.py"}}
> fields:{key:"beam:option:spark_master_url:v1"
> value:{string_value:"local[4]"}}
> fields:{key:"beam:option:spark_submit_uber_jar:v1"
> value:{bool_value:false}} fields:{key:"beam:option:spark_version:v1"
> value:{string_value:"3"}} fields:{key:"beam:option:streaming:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:type_check_additional:v1" value:{string_value:""}}
> fields:{key:"beam:option:type_check_strictness:v1"
> value:{string_value:"DEFAULT_TO_ANY"}} fields:{key:"beam:option:update:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:use_storage_api_connection_pool:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:use_storage_write_api:v1"
> value:{bool_value:false}}
> fields:{key:"beam:option:use_storage_write_api_at_least_once:v1"
> value:{bool_value:false}} fields:{key:"beam:option:verify_row_values:v1"
> value:{bool_value:false}}} logging_endpoint:{url:"localhost:44529"}
> artifact_endpoint:{url:"localhost:43809"}
> control_endpoint:{url:"localhost:41251"}
> dependencies:{type_urn:"beam:artifact:type:file:v1"
> type_payload:"\n\xb4\x01C:\\Users\\rsianta\\AppData\\Local\\Temp\\beam-temptolduaa5\\artifactsbjonha4y\\f7e1317b082363ba3eea86b1de88226bcb4063e22d7fc507188f6b95286f06c6\\1-ref_Environment_default_e-workflow.tar.gz"
> role_urn:"beam:artifact:role:staging_to:v1"
> role_payload:"\n\x0fworkflow.tar.gz"}
> runner_capabilities:"beam:protocol:control_response_elements_embedding:v1"
> 2024/11/06 13:07:18 failed to retrieve staged files: failed to retrieve
> /tmp/staged in 3 attempts: failed to retrieve chunk for
> /tmp/staged/workflow.tar.gz
>         caused by:
> rpc error: code = Unknown desc = ; failed to retrieve chunk for
> /tmp/staged/workflow.tar.gz
>         caused by:
> rpc error: code = Unknown desc = ; failed to retrieve chunk for
> /tmp/staged/workflow.tar.gz
>         caused by:
> rpc error: code = Unknown desc = ; failed to retrieve chunk for
> /tmp/staged/workflow.tar.gz
>         caused by:
> rpc error: code = Unknown desc =
>
>
> It seems as if the harness is unable to reach the artifacts. If I
> understand it correctly, the harness is trying to get the files from
> /tmp/staged/workflow.tar.gz, but in the provided pipeline_options I see the
> location of the artifact to be placed in my local windows filesystem (they
> are really there). The /tmp/staged/workflow.tar.gz exists on the
> taskmanager filesystem, but the file is empty:
> # cd /tmp/staged/
> # ls -l
> total 0
> -rwxr-xr-x 1 root root 0 Nov  4 11:34 workflow.tar.gz
> #
>
>
> But I don't know how to setup everything properly, so the harness is able
> to get the artifacts.
>
> Can anyone provide some tips, please?
>
>
> I will also include my Dockerfile, requirements.txt and
> docker-compose.yaml here:
>
> *Dockerfile:*
> # Use the Beam SDK image as a build stage
> FROM apache/beam_python3.9_sdk:2.48.0 AS beam
>
> # main image starts here
> FROM apache/flink:1.14.4-java8
>
> # Install Python 3.9 and required packages
> RUN apt-get update && apt-get install -y \
>     python3 \
>     python3-dev \
>     python3-distutils \
>     python3-pip \
>     python3-venv \
>     curl \
>     locales \
>     build-essential \
>     git \
>     wget \
>     unzip \
>     python3-setuptools \
>     libffi-dev \
>     libssl-dev \
>     && apt-get clean && rm -rf /var/lib/apt/lists/*
>
> COPY requirements.txt /tmp/requirements.txt
>
> # Upgrade pip to the latest version
> RUN python3 -m pip install --upgrade pip && \
>     python3 -m pip install -r /tmp/requirements.txt
>
> # Install Apache Beam SDK version 2.48.0
> RUN python3 -m pip install apache-beam==2.48.0
>
> # Set python3 as the default python
> RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
>
> # Set up locales
> RUN locale-gen en_US.UTF-8
> ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en' LC_ALL='en_US.UTF-8'
>
> # Copy the boot script and other necessary files from the Beam image
> COPY --from=beam /opt/apache/beam /opt/apache/beam
>
> # Expose Flink web interface port
> EXPOSE 8081
>
> # Set the working directory (optional, as per your needs)
> WORKDIR /home/flink
>
> # Do not change the ENTRYPOINT and CMD from the Flink image
> # The base image's ENTRYPOINT and CMD are preserved
>
> *docker-compose-yaml:*
> version: '3.9'
>
> services:
>   jobmanager:
>     image: flink1.14.4j8_p3.9_beam2.48
>     container_name: flink-beam-jobmanager
>     hostname: jobmanager
>     ports:
>       - "8081:8081"      # Expose Flink Web UI on localhost:8081
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>     volumes:
>     - c:\smc\beam\beam-starter-python\:/beam
>     -
> c:\smc\backend-dev\dev-prostredie\docker-infrastruktura\artifacts\:/artifacts
>
>   taskmanager:
>     image: flink1.14.4j8_p3.9_beam2.48
>     container_name: flink-beam-taskmanager
>     command: taskmanager
>     scale: 1
>     network_mode: "service:jobmanager"
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
>     depends_on:
>       - jobmanager
>
>     volumes:
>     - c:\smc\beam\beam-starter-python\:/beam
>     -
> c:\smc\backend-dev\dev-prostredie\docker-infrastruktura\artifacts\:/artifacts
>
> *requirements.txt:*
> absl-py==1.4.0
> astunparse==1.6.3
> attrs==23.1.0
> beautifulsoup4==4.12.2
> bs4==0.0.1
> cachetools==5.3.0
> certifi==2023.5.7
> cffi==1.15.1
> charset-normalizer==3.1.0
> click==8.1.3
> cloudpickle==2.2.1
> crcmod==1.7
> cryptography==40.0.2
> Cython==0.29.34
> deprecation==2.1.0
> dill==0.3.1.1
> dnspython==2.3.0
> docker==6.1.2
> docopt==0.6.2
> exceptiongroup==1.1.1
> execnet==1.9.0
> fastavro==1.7.4
> fasteners==0.18
> flatbuffers==23.5.9
> freezegun==1.2.2
> future==0.18.3
> gast==0.4.0
> google-api-core==2.11.0
> google-api-python-client==2.86.0
> google-apitools==0.5.31
> google-auth==2.18.0
> google-auth-httplib2==0.1.0
> google-auth-oauthlib==1.0.0
> google-cloud-bigquery==3.10.0
> google-cloud-bigquery-storage==2.19.1
> google-cloud-bigtable==2.17.0
> google-cloud-core==2.3.2
> google-cloud-datastore==2.15.2
> google-cloud-dlp==3.12.1
> google-cloud-language==2.9.1
> google-cloud-profiler==4.0.0
> google-cloud-pubsub==2.17.0
> google-cloud-pubsublite==1.8.2
> google-cloud-recommendations-ai==0.10.3
> google-cloud-spanner==3.34.0
> google-cloud-videointelligence==2.11.1
> google-cloud-vision==3.4.1
> google-crc32c==1.5.0
> google-pasta==0.2.0
> google-resumable-media==2.5.0
> googleapis-common-protos==1.59.0
> greenlet==2.0.2
> grpc-google-iam-v1==0.12.6
> grpcio==1.54.2
> grpcio-status==1.54.2
> guppy3==3.1.3
> h5py==3.8.0
> hdfs==2.7.0
> httplib2==0.22.0
> hypothesis==6.75.3
> idna==3.4
> importlib-metadata==6.6.0
> iniconfig==2.0.0
> jax==0.4.10
> joblib==1.2.0
> keras==2.12.0
> libclang==16.0.0
> Markdown==3.4.3
> MarkupSafe==2.1.2
> ml-dtypes==0.1.0
> mmh3==3.1.0
> mock==5.0.2
> nltk==3.8.1
> nose==1.3.7
> numpy==1.23.5
> oauth2client==4.1.3
> oauthlib==3.2.2
> objsize==0.6.1
> opt-einsum==3.3.0
> orjson==3.8.12
> overrides==6.5.0
> packaging==23.1
> pandas==1.5.3
> parameterized==0.9.0
> pluggy==1.0.0
> proto-plus==1.22.2
> protobuf==4.23.0
> psycopg2-binary==2.9.6
> pyarrow==11.0.0
> pyasn1==0.5.0
> pyasn1-modules==0.3.0
> pycparser==2.21
> pydot==1.4.2
> PyHamcrest==2.0.4
> pymongo==4.3.3
> PyMySQL==1.0.3
> pyparsing==3.0.9
> pytest==7.3.1
> pytest-timeout==2.1.0
> pytest-xdist==3.3.0
> python-dateutil==2.8.2
> python-snappy==0.6.1
> pytz==2023.3
> PyYAML==6.0
> regex==2023.5.5
> requests==2.30.0
> requests-mock==1.10.0
> requests-oauthlib==1.3.1
> rsa==4.9
> scikit-learn==1.2.2
> scipy==1.10.1
> six==1.16.0
> sortedcontainers==2.4.0
> soupsieve==2.4.1
> SQLAlchemy==1.4.48
> sqlparse==0.4.4
> tenacity==8.2.2
> tensorboard==2.12.3
> tensorboard-data-server==0.7.0
> tensorflow==2.12.0
> tensorflow-estimator==2.12.0
> tensorflow-io-gcs-filesystem==0.32.0
> termcolor==2.3.0
> testcontainers==3.7.1
> threadpoolctl==3.1.0
> tomli==2.0.1
> tqdm==4.65.0
> typing_extensions==4.5.0
> uritemplate==4.1.1
> urllib3==1.26.15
> websocket-client==1.5.1
> Werkzeug==2.3.4
> wrapt==1.14.1
> zipp==3.15.0
> zstandard==0.21.0
>
>
>

Reply via email to