Hello, My high level recommendations are not using Docker for a Flink cluster and using WSL. I wrote a couple of posts about Apache Beam local development with/without Kubernetes, and you would find them useful.
Without Kubernetes - Apache Beam Local Development With Python - Part 3 Flink Runner https://jaehyeon.me/blog/2024-04-18-beam-local-dev-3/ With Kubernetes - Deploy Python pipelines on Kubernetes using the Flink runner - https://beam.apache.org/blog/deploy-python-pipeline-on-flink-runner/ Cheers, Jaehyeon On Thu, 7 Nov 2024 at 00:57, Robert Sianta <robert.sia...@gmail.com> wrote: > Hi community! > > This is my first contact with community, and it is related to our first > steps with Beam. > > I am trying to establish a development environment to develop Apache Beam > pipelines, that are running on Flink Runner, and are using Python SDK > harness to handle some of the tasks. I aim to develop the Beam pipelines > using Python. > To establish this environment i am using Docker. > > My current setup is: > Windows 10 64-bit, 32 GB Ram on system > Docker 2.5.0.0 Stable, Engine 19.03.13, Compose 1.27.4 > On local system: Python 3.9, java version "1.8.0_361" > docker beam image: apache/beam_python3.9_sdk:2.48.0 > docker flink image: apache/flink:1.14.4-java8 > > For the way, how to execute the tasks on the python SDK harness, i tryed > environment_type='PROCESS' and 'EXTERNAL'. Where the goal is to use the > 'PROCESS' variant. But to better understand what is going on, i am using > the EXTERNAL option, and the service is launched manually inside of the > taskmanager container executing this "/opt/apache/beam/boot --worker_pool". > The "boot" itself is installed in the image as defined in the Dockerfile. > > My job is a simple hello-world program, containing just a few prints. The > PipelineOptions are: > > ... > > beam_options = PipelineOptions( > save_main_session=False, > setup_file="./setup.py", > runner='FlinkRunner', > flink_master='localhost:8081', > flink_version='1.14', > parallelism=1, > environment_type='EXTERNAL', > environment_config='localhost:50000' > ) > > print("tlacim nejaky hello world vypis") > print(beam_options.get_all_options()) > print("koniec vypisu") > > app.run( > input_text=args.input_text, > beam_options=beam_options, > ) > > where the setup.py is including requirements.txt listed at the bottom of > this email. > > Now when i execute the pipeline, it is successfully builded and deployed > to the Flink runner. > > The building process outputs this: > > C:\Users\rsianta\Miniconda3\envs\BeamNewest\python.exe > C:/smc/beam/beam-starter-python/main.py > tlacim nejaky hello world vypis > {'runner': 'FlinkRunner', 'streaming': False, 'resource_hints': [], > 'beam_services': {}, 'type_check_strictness': 'DEFAULT_TO_ANY', > 'type_check_additional': '', 'pipeline_type_check': True, > 'runtime_type_check': False, 'performance_runtime_type_check': False, > 'allow_non_deterministic_key_coders': False, 'allow_unsafe_triggers': > False, 'direct_runner_use_stacked_bundle': True, > 'direct_runner_bundle_repeat': 0, 'direct_num_workers': 1, > 'direct_running_mode': 'in_memory', 'direct_embed_docker_python': False, > 'direct_test_splits': {}, 'dataflow_endpoint': ' > https://dataflow.googleapis.com', 'project': None, 'job_name': None, > 'staging_location': None, 'temp_location': None, 'region': None, > 'service_account_email': None, 'no_auth': False, 'template_location': None, > 'labels': None, 'update': False, 'transform_name_mapping': None, > 'enable_streaming_engine': False, 'dataflow_kms_key': None, > 'create_from_snapshot': None, 'flexrs_goal': None, > 'dataflow_service_options': None, 'enable_hot_key_logging': False, > 'enable_artifact_caching': False, 'impersonate_service_account': None, > 'gcp_oauth_scopes': ['https://www.googleapis.com/auth/bigquery', ' > https://www.googleapis.com/auth/cloud-platform', ' > https://www.googleapis.com/auth/devstorage.full_control', ' > https://www.googleapis.com/auth/userinfo.email', ' > https://www.googleapis.com/auth/datastore', ' > https://www.googleapis.com/auth/spanner.admin', ' > https://www.googleapis.com/auth/spanner.data'], > 'azure_connection_string': None, 'blob_service_endpoint': None, > 'azure_managed_identity_client_id': None, 'hdfs_host': None, 'hdfs_port': > None, 'hdfs_user': None, 'hdfs_full_urls': False, 'num_workers': None, > 'max_num_workers': None, 'autoscaling_algorithm': None, 'machine_type': > None, 'disk_size_gb': None, 'disk_type': None, 'worker_region': None, > 'worker_zone': None, 'zone': None, 'network': None, 'subnetwork': None, > 'worker_harness_container_image': None, 'sdk_container_image': None, > 'sdk_harness_container_image_overrides': None, > 'default_sdk_harness_log_level': None, 'sdk_harness_log_level_overrides': > None, 'use_public_ips': None, 'min_cpu_platform': None, > 'dataflow_job_file': None, 'experiments': None, > 'number_of_worker_harness_threads': None, 'profile_cpu': False, > 'profile_memory': False, 'profile_location': None, 'profile_sample_rate': > 1.0, 'requirements_file': None, 'requirements_cache': None, > 'requirements_cache_only_sources': False, 'setup_file': './setup.py', > 'beam_plugins': None, 'pickle_library': 'default', 'save_main_session': > False, 'sdk_location': 'default', 'extra_packages': None, > 'prebuild_sdk_container_engine': None, 'prebuild_sdk_container_base_image': > None, 'cloud_build_machine_type': None, 'docker_registry_push_url': None, > 'job_endpoint': None, 'artifact_endpoint': None, 'job_server_timeout': 60, > 'environment_type': 'EXTERNAL', 'environment_config': 'localhost:50000', > 'environment_options': None, 'sdk_worker_parallelism': 1, > 'environment_cache_millis': 0, 'output_executable_path': None, > 'artifacts_dir': None, 'job_port': 0, 'artifact_port': 0, 'expansion_port': > 0, 'job_server_java_launcher': 'java', 'job_server_jvm_properties': [], > 'flink_master': 'localhost:8081', 'flink_version': '1.14', > 'flink_job_server_jar': None, 'flink_submit_uber_jar': False, > 'parallelism': 1, 'max_parallelism': -1, 'spark_master_url': 'local[4]', > 'spark_job_server_jar': None, 'spark_submit_uber_jar': False, > 'spark_rest_url': None, 'spark_version': '3', 'on_success_matcher': None, > 'dry_run': False, 'wait_until_finish_duration': None, 'pubsubRootUrl': > None, 's3_access_key_id': None, 's3_secret_access_key': None, > 's3_session_token': None, 's3_endpoint_url': None, 's3_region_name': None, > 's3_api_version': None, 's3_verify': None, 's3_disable_ssl': False} > koniec vypisu > INFO:root:Default Python SDK image for environment is > apache/beam_python3.9_sdk:2.48.0 > INFO:apache_beam.runners.portability.stager:Executing command: > ['C:\\Users\\rsianta\\Miniconda3\\envs\\BeamNewest\\python.exe', > 'setup.py', 'sdist', '--dist-dir', > 'C:\\Users\\rsianta\\AppData\\Local\\Temp\\tmpggi9cf3y'] > INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== > <function pack_combiners at 0x000001E6656CD670> ==================== > INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== > <function lift_combiners at 0x000001E6656CD700> ==================== > INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== > <function sort_stages at 0x000001E6656CDE50> ==================== > INFO:apache_beam.runners.portability.flink_runner:Adding HTTP protocol > scheme to flink_master parameter: http://localhost:8081 > INFO:apache_beam.utils.subprocess_server:Using cached job server jar from > https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.14-job-server/2.48.0/beam-runners-flink-1.14-job-server-2.48.0.jar > INFO:apache_beam.utils.subprocess_server:Starting service with ['java' > '-jar' > 'C:\\Users\\rsianta/.apache_beam/cache/jars\\beam-runners-flink-1.14-job-server-2.48.0.jar' > '--flink-master' 'http://localhost:8081' '--artifacts-dir' > 'C:\\Users\\rsianta\\AppData\\Local\\Temp\\beam-temptolduaa5\\artifactsbjonha4y' > '--job-port' '57745' '--artifact-port' '0' '--expansion-port' '0'] > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM > org.apache.beam.runners.jobsubmission.JobServerDriver > createArtifactStagingService > INFO:apache_beam.utils.subprocess_server:INFO: ArtifactStagingService > started on localhost:57778 > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM > org.apache.beam.runners.jobsubmission.JobServerDriver createExpansionService > INFO:apache_beam.utils.subprocess_server:INFO: Java ExpansionService > started on localhost:57779 > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM > org.apache.beam.runners.jobsubmission.JobServerDriver createJobServer > INFO:apache_beam.utils.subprocess_server:INFO: JobService started on > localhost:57745 > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM > org.apache.beam.runners.jobsubmission.JobServerDriver run > INFO:apache_beam.utils.subprocess_server:INFO: Job server now running, > terminate with Ctrl+C > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM > org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2 onNext > INFO:apache_beam.utils.subprocess_server:INFO: Staging artifacts for > job_321fed50-f7ae-4e93-8774-2bf3fe155938. > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM > org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2 > resolveNextEnvironment > INFO:apache_beam.utils.subprocess_server:INFO: Resolving artifacts for > job_321fed50-f7ae-4e93-8774-2bf3fe155938.ref_Environment_default_environment_1. > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM > org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2 onNext > INFO:apache_beam.utils.subprocess_server:INFO: Getting 1 artifacts for > job_321fed50-f7ae-4e93-8774-2bf3fe155938.null. > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM > org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2 > finishStaging > INFO:apache_beam.utils.subprocess_server:INFO: Artifacts fully staged for > job_321fed50-f7ae-4e93-8774-2bf3fe155938. > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM > org.apache.beam.runners.flink.FlinkJobInvoker invokeWithExecutor > INFO:apache_beam.utils.subprocess_server:INFO: Invoking job > BeamApp-rsianta-1106130652-47ae11d_ad4e7a8b-eb96-45b4-8498-74b2eaa8ad66 > with pipeline runner > org.apache.beam.runners.flink.FlinkPipelineRunner@48a4ca17 > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM > org.apache.beam.runners.jobsubmission.JobInvocation start > INFO:apache_beam.utils.subprocess_server:INFO: Starting job invocation > BeamApp-rsianta-1106130652-47ae11d_ad4e7a8b-eb96-45b4-8498-74b2eaa8ad66 > INFO:apache_beam.runners.portability.portable_runner:Job state changed to > STOPPED > INFO:apache_beam.runners.portability.portable_runner:Job state changed to > STARTING > INFO:apache_beam.runners.portability.portable_runner:Job state changed to > RUNNING > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM > org.apache.beam.runners.flink.FlinkPipelineRunner runPipelineWithTranslator > INFO:apache_beam.utils.subprocess_server:INFO: Translating pipeline to > Flink program. > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM > org.apache.beam.runners.flink.FlinkExecutionEnvironments > createBatchExecutionEnvironment > INFO:apache_beam.utils.subprocess_server:INFO: Creating a Batch Execution > Environment. > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM > org.apache.beam.runners.flink.FlinkExecutionEnvironments > createBatchExecutionEnvironment > INFO:apache_beam.utils.subprocess_server:INFO: Using Flink Master URL > localhost:8081. > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:54 PM > org.apache.flink.api.java.utils.PlanGenerator logTypeRegistrationDetails > INFO:apache_beam.utils.subprocess_server:INFO: The job has 0 registered > types and 0 default Kryo serializers > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:54 PM > org.apache.flink.client.program.rest.RestClusterClient lambda$submitJob$7 > INFO:apache_beam.utils.subprocess_server:INFO: Submitting job > 'BeamApp-rsianta-1106130652-47ae11d' (cb7421fd5da8f581828f4b39e809c0e5). > INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:07:03 PM > org.apache.flink.client.program.rest.RestClusterClient lambda$null$6 > INFO:apache_beam.utils.subprocess_server:INFO: Successfully submitted job > 'BeamApp-rsianta-1106130652-47ae11d' (cb7421fd5da8f581828f4b39e809c0e5) to ' > http://localhost:8081'. > > I am observing the outputs of the SDK Harness on the taskmanager console > > The SDK Harness stdout is this: > > # /opt/apache/beam/boot --worker_pool > 2024/11/06 13:05:38 Starting worker pool 365: python -m > apache_beam.runners.worker.worker_pool_main --service_port=50000 > --container_executable=/opt/apache/beam/boot > Starting worker with command ['/opt/apache/beam/boot', '--id=1-1', > '--logging_endpoint=localhost:44529', > '--artifact_endpoint=localhost:43809', > '--provision_endpoint=localhost:38551', > '--control_endpoint=localhost:41251'] > 2024/11/06 13:07:07 Provision info: > pipeline_options:{fields:{key:"beam:option:allow_non_deterministic_key_coders:v1" > value:{bool_value:false}} > fields:{key:"beam:option:allow_non_restored_state:v1" > value:{bool_value:false}} > fields:{key:"beam:option:allow_unsafe_triggers:v1" > value:{bool_value:false}} fields:{key:"beam:option:app_name:v1" > value:{null_value:NULL_VALUE}} fields:{key:"beam:option:artifact_port:v1" > value:{string_value:"0"}} > fields:{key:"beam:option:auto_balance_write_files_sharding_enabled:v1" > value:{bool_value:false}} fields:{key:"beam:option:beam_services:v1" > value:{struct_value:{}}} fields:{key:"beam:option:block_on_run:v1" > value:{bool_value:true}} fields:{key:"beam:option:dataflow_endpoint:v1" > value:{string_value:"https://dataflow.googleapis.com"}} > fields:{key:"beam:option:direct_embed_docker_python:v1" > value:{bool_value:false}} fields:{key:"beam:option:direct_num_workers:v1" > value:{string_value:"1"}} > fields:{key:"beam:option:direct_runner_bundle_repeat:v1" > value:{string_value:"0"}} > fields:{key:"beam:option:direct_runner_use_stacked_bundle:v1" > value:{bool_value:true}} fields:{key:"beam:option:direct_running_mode:v1" > value:{string_value:"in_memory"}} > fields:{key:"beam:option:direct_test_splits:v1" value:{struct_value:{}}} > fields:{key:"beam:option:disable_metrics:v1" value:{bool_value:false}} > fields:{key:"beam:option:dry_run:v1" value:{bool_value:false}} > fields:{key:"beam:option:enable_artifact_caching:v1" > value:{bool_value:false}} fields:{key:"beam:option:enable_bundling:v1" > value:{bool_value:false}} > fields:{key:"beam:option:enable_hot_key_logging:v1" > value:{bool_value:false}} > fields:{key:"beam:option:enable_streaming_engine:v1" > value:{bool_value:false}} fields:{key:"beam:option:enforce_encodability:v1" > value:{bool_value:true}} fields:{key:"beam:option:enforce_immutability:v1" > value:{bool_value:true}} > fields:{key:"beam:option:environment_cache_millis:v1" > value:{string_value:"0"}} fields:{key:"beam:option:environment_config:v1" > value:{string_value:"localhost:50000"}} > fields:{key:"beam:option:environment_type:v1" > value:{string_value:"EXTERNAL"}} > fields:{key:"beam:option:expansion_port:v1" value:{string_value:"0"}} > fields:{key:"beam:option:experiments:v1" > value:{list_value:{values:{string_value:"beam_fn_api"}}}} > fields:{key:"beam:option:externalized_checkpoints_enabled:v1" > value:{bool_value:false}} > fields:{key:"beam:option:fail_on_checkpointing_errors:v1" > value:{bool_value:true}} fields:{key:"beam:option:faster_copy:v1" > value:{bool_value:false}} > fields:{key:"beam:option:finish_bundle_before_checkpointing:v1" > value:{bool_value:false}} fields:{key:"beam:option:flink_conf_dir:v1" > value:{null_value:NULL_VALUE}} fields:{key:"beam:option:flink_master:v1" > value:{string_value:"http://localhost:8081"}} > fields:{key:"beam:option:flink_submit_uber_jar:v1" > value:{bool_value:false}} fields:{key:"beam:option:flink_version:v1" > value:{string_value:"1.14"}} fields:{key:"beam:option:gcp_oauth_scopes:v1" > value:{list_value:{values:{string_value:" > https://www.googleapis.com/auth/bigquery"} values:{string_value:" > https://www.googleapis.com/auth/cloud-platform"} values:{string_value:" > https://www.googleapis.com/auth/devstorage.full_control"} > values:{string_value:"https://www.googleapis.com/auth/userinfo.email"} > values:{string_value:"https://www.googleapis.com/auth/datastore"} > values:{string_value:"https://www.googleapis.com/auth/spanner.admin"} > values:{string_value:"https://www.googleapis.com/auth/spanner.data"}}}} > fields:{key:"beam:option:gcs_performance_metrics:v1" > value:{bool_value:false}} fields:{key:"beam:option:hdfs_full_urls:v1" > value:{bool_value:false}} fields:{key:"beam:option:job_name:v1" > value:{string_value:"BeamApp-rsianta-1106130652-47ae11d"}} > fields:{key:"beam:option:job_port:v1" value:{string_value:"0"}} > fields:{key:"beam:option:job_server_java_launcher:v1" > value:{string_value:"java"}} > fields:{key:"beam:option:job_server_jvm_properties:v1" > value:{list_value:{}}} fields:{key:"beam:option:job_server_timeout:v1" > value:{string_value:"60"}} > fields:{key:"beam:option:load_balance_bundles:v1" value:{bool_value:false}} > fields:{key:"beam:option:max_parallelism:v1" value:{string_value:"-1"}} > fields:{key:"beam:option:no_auth:v1" value:{bool_value:false}} > fields:{key:"beam:option:object_reuse:v1" value:{bool_value:false}} > fields:{key:"beam:option:operator_chaining:v1" value:{bool_value:true}} > fields:{key:"beam:option:options_id:v1" value:{number_value:2}} > fields:{key:"beam:option:output_executable_path:v1" > value:{null_value:NULL_VALUE}} fields:{key:"beam:option:parallelism:v1" > value:{string_value:"1"}} > fields:{key:"beam:option:performance_runtime_type_check:v1" > value:{bool_value:false}} fields:{key:"beam:option:pickle_library:v1" > value:{string_value:"default"}} > fields:{key:"beam:option:pipeline_type_check:v1" value:{bool_value:true}} > fields:{key:"beam:option:profile_cpu:v1" value:{bool_value:false}} > fields:{key:"beam:option:profile_memory:v1" value:{bool_value:false}} > fields:{key:"beam:option:profile_sample_rate:v1" value:{number_value:1}} > fields:{key:"beam:option:re_iterable_group_by_key_result:v1" > value:{bool_value:false}} > fields:{key:"beam:option:requirements_cache_only_sources:v1" > value:{bool_value:false}} fields:{key:"beam:option:resource_hints:v1" > value:{list_value:{}}} > fields:{key:"beam:option:retain_docker_containers:v1" > value:{bool_value:false}} > fields:{key:"beam:option:retain_externalized_checkpoints_on_cancellation:v1" > value:{bool_value:false}} fields:{key:"beam:option:runner:v1" > value:{null_value:NULL_VALUE}} > fields:{key:"beam:option:runner_determined_sharding:v1" > value:{bool_value:true}} fields:{key:"beam:option:runtime_type_check:v1" > value:{bool_value:false}} fields:{key:"beam:option:s3_disable_ssl:v1" > value:{bool_value:false}} fields:{key:"beam:option:save_main_session:v1" > value:{bool_value:false}} fields:{key:"beam:option:sdk_location:v1" > value:{string_value:"container"}} > fields:{key:"beam:option:sdk_worker_parallelism:v1" > value:{string_value:"1"}} fields:{key:"beam:option:setup_file:v1" > value:{string_value:"./setup.py"}} > fields:{key:"beam:option:spark_master_url:v1" > value:{string_value:"local[4]"}} > fields:{key:"beam:option:spark_submit_uber_jar:v1" > value:{bool_value:false}} fields:{key:"beam:option:spark_version:v1" > value:{string_value:"3"}} fields:{key:"beam:option:streaming:v1" > value:{bool_value:false}} > fields:{key:"beam:option:type_check_additional:v1" value:{string_value:""}} > fields:{key:"beam:option:type_check_strictness:v1" > value:{string_value:"DEFAULT_TO_ANY"}} fields:{key:"beam:option:update:v1" > value:{bool_value:false}} > fields:{key:"beam:option:use_storage_api_connection_pool:v1" > value:{bool_value:false}} > fields:{key:"beam:option:use_storage_write_api:v1" > value:{bool_value:false}} > fields:{key:"beam:option:use_storage_write_api_at_least_once:v1" > value:{bool_value:false}} fields:{key:"beam:option:verify_row_values:v1" > value:{bool_value:false}}} logging_endpoint:{url:"localhost:44529"} > artifact_endpoint:{url:"localhost:43809"} > control_endpoint:{url:"localhost:41251"} > dependencies:{type_urn:"beam:artifact:type:file:v1" > type_payload:"\n\xb4\x01C:\\Users\\rsianta\\AppData\\Local\\Temp\\beam-temptolduaa5\\artifactsbjonha4y\\f7e1317b082363ba3eea86b1de88226bcb4063e22d7fc507188f6b95286f06c6\\1-ref_Environment_default_e-workflow.tar.gz" > role_urn:"beam:artifact:role:staging_to:v1" > role_payload:"\n\x0fworkflow.tar.gz"} > runner_capabilities:"beam:protocol:control_response_elements_embedding:v1" > 2024/11/06 13:07:18 failed to retrieve staged files: failed to retrieve > /tmp/staged in 3 attempts: failed to retrieve chunk for > /tmp/staged/workflow.tar.gz > caused by: > rpc error: code = Unknown desc = ; failed to retrieve chunk for > /tmp/staged/workflow.tar.gz > caused by: > rpc error: code = Unknown desc = ; failed to retrieve chunk for > /tmp/staged/workflow.tar.gz > caused by: > rpc error: code = Unknown desc = ; failed to retrieve chunk for > /tmp/staged/workflow.tar.gz > caused by: > rpc error: code = Unknown desc = > > > It seems as if the harness is unable to reach the artifacts. If I > understand it correctly, the harness is trying to get the files from > /tmp/staged/workflow.tar.gz, but in the provided pipeline_options I see the > location of the artifact to be placed in my local windows filesystem (they > are really there). The /tmp/staged/workflow.tar.gz exists on the > taskmanager filesystem, but the file is empty: > # cd /tmp/staged/ > # ls -l > total 0 > -rwxr-xr-x 1 root root 0 Nov 4 11:34 workflow.tar.gz > # > > > But I don't know how to setup everything properly, so the harness is able > to get the artifacts. > > Can anyone provide some tips, please? > > > I will also include my Dockerfile, requirements.txt and > docker-compose.yaml here: > > *Dockerfile:* > # Use the Beam SDK image as a build stage > FROM apache/beam_python3.9_sdk:2.48.0 AS beam > > # main image starts here > FROM apache/flink:1.14.4-java8 > > # Install Python 3.9 and required packages > RUN apt-get update && apt-get install -y \ > python3 \ > python3-dev \ > python3-distutils \ > python3-pip \ > python3-venv \ > curl \ > locales \ > build-essential \ > git \ > wget \ > unzip \ > python3-setuptools \ > libffi-dev \ > libssl-dev \ > && apt-get clean && rm -rf /var/lib/apt/lists/* > > COPY requirements.txt /tmp/requirements.txt > > # Upgrade pip to the latest version > RUN python3 -m pip install --upgrade pip && \ > python3 -m pip install -r /tmp/requirements.txt > > # Install Apache Beam SDK version 2.48.0 > RUN python3 -m pip install apache-beam==2.48.0 > > # Set python3 as the default python > RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1 > > # Set up locales > RUN locale-gen en_US.UTF-8 > ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en' LC_ALL='en_US.UTF-8' > > # Copy the boot script and other necessary files from the Beam image > COPY --from=beam /opt/apache/beam /opt/apache/beam > > # Expose Flink web interface port > EXPOSE 8081 > > # Set the working directory (optional, as per your needs) > WORKDIR /home/flink > > # Do not change the ENTRYPOINT and CMD from the Flink image > # The base image's ENTRYPOINT and CMD are preserved > > *docker-compose-yaml:* > version: '3.9' > > services: > jobmanager: > image: flink1.14.4j8_p3.9_beam2.48 > container_name: flink-beam-jobmanager > hostname: jobmanager > ports: > - "8081:8081" # Expose Flink Web UI on localhost:8081 > command: jobmanager > environment: > - | > FLINK_PROPERTIES= > jobmanager.rpc.address: jobmanager > volumes: > - c:\smc\beam\beam-starter-python\:/beam > - > c:\smc\backend-dev\dev-prostredie\docker-infrastruktura\artifacts\:/artifacts > > taskmanager: > image: flink1.14.4j8_p3.9_beam2.48 > container_name: flink-beam-taskmanager > command: taskmanager > scale: 1 > network_mode: "service:jobmanager" > environment: > - | > FLINK_PROPERTIES= > jobmanager.rpc.address: jobmanager > taskmanager.numberOfTaskSlots: 2 > depends_on: > - jobmanager > > volumes: > - c:\smc\beam\beam-starter-python\:/beam > - > c:\smc\backend-dev\dev-prostredie\docker-infrastruktura\artifacts\:/artifacts > > *requirements.txt:* > absl-py==1.4.0 > astunparse==1.6.3 > attrs==23.1.0 > beautifulsoup4==4.12.2 > bs4==0.0.1 > cachetools==5.3.0 > certifi==2023.5.7 > cffi==1.15.1 > charset-normalizer==3.1.0 > click==8.1.3 > cloudpickle==2.2.1 > crcmod==1.7 > cryptography==40.0.2 > Cython==0.29.34 > deprecation==2.1.0 > dill==0.3.1.1 > dnspython==2.3.0 > docker==6.1.2 > docopt==0.6.2 > exceptiongroup==1.1.1 > execnet==1.9.0 > fastavro==1.7.4 > fasteners==0.18 > flatbuffers==23.5.9 > freezegun==1.2.2 > future==0.18.3 > gast==0.4.0 > google-api-core==2.11.0 > google-api-python-client==2.86.0 > google-apitools==0.5.31 > google-auth==2.18.0 > google-auth-httplib2==0.1.0 > google-auth-oauthlib==1.0.0 > google-cloud-bigquery==3.10.0 > google-cloud-bigquery-storage==2.19.1 > google-cloud-bigtable==2.17.0 > google-cloud-core==2.3.2 > google-cloud-datastore==2.15.2 > google-cloud-dlp==3.12.1 > google-cloud-language==2.9.1 > google-cloud-profiler==4.0.0 > google-cloud-pubsub==2.17.0 > google-cloud-pubsublite==1.8.2 > google-cloud-recommendations-ai==0.10.3 > google-cloud-spanner==3.34.0 > google-cloud-videointelligence==2.11.1 > google-cloud-vision==3.4.1 > google-crc32c==1.5.0 > google-pasta==0.2.0 > google-resumable-media==2.5.0 > googleapis-common-protos==1.59.0 > greenlet==2.0.2 > grpc-google-iam-v1==0.12.6 > grpcio==1.54.2 > grpcio-status==1.54.2 > guppy3==3.1.3 > h5py==3.8.0 > hdfs==2.7.0 > httplib2==0.22.0 > hypothesis==6.75.3 > idna==3.4 > importlib-metadata==6.6.0 > iniconfig==2.0.0 > jax==0.4.10 > joblib==1.2.0 > keras==2.12.0 > libclang==16.0.0 > Markdown==3.4.3 > MarkupSafe==2.1.2 > ml-dtypes==0.1.0 > mmh3==3.1.0 > mock==5.0.2 > nltk==3.8.1 > nose==1.3.7 > numpy==1.23.5 > oauth2client==4.1.3 > oauthlib==3.2.2 > objsize==0.6.1 > opt-einsum==3.3.0 > orjson==3.8.12 > overrides==6.5.0 > packaging==23.1 > pandas==1.5.3 > parameterized==0.9.0 > pluggy==1.0.0 > proto-plus==1.22.2 > protobuf==4.23.0 > psycopg2-binary==2.9.6 > pyarrow==11.0.0 > pyasn1==0.5.0 > pyasn1-modules==0.3.0 > pycparser==2.21 > pydot==1.4.2 > PyHamcrest==2.0.4 > pymongo==4.3.3 > PyMySQL==1.0.3 > pyparsing==3.0.9 > pytest==7.3.1 > pytest-timeout==2.1.0 > pytest-xdist==3.3.0 > python-dateutil==2.8.2 > python-snappy==0.6.1 > pytz==2023.3 > PyYAML==6.0 > regex==2023.5.5 > requests==2.30.0 > requests-mock==1.10.0 > requests-oauthlib==1.3.1 > rsa==4.9 > scikit-learn==1.2.2 > scipy==1.10.1 > six==1.16.0 > sortedcontainers==2.4.0 > soupsieve==2.4.1 > SQLAlchemy==1.4.48 > sqlparse==0.4.4 > tenacity==8.2.2 > tensorboard==2.12.3 > tensorboard-data-server==0.7.0 > tensorflow==2.12.0 > tensorflow-estimator==2.12.0 > tensorflow-io-gcs-filesystem==0.32.0 > termcolor==2.3.0 > testcontainers==3.7.1 > threadpoolctl==3.1.0 > tomli==2.0.1 > tqdm==4.65.0 > typing_extensions==4.5.0 > uritemplate==4.1.1 > urllib3==1.26.15 > websocket-client==1.5.1 > Werkzeug==2.3.4 > wrapt==1.14.1 > zipp==3.15.0 > zstandard==0.21.0 > > >