Hi community!

This is my first contact with community, and it is related to our first
steps with Beam.

I am trying to establish a development environment to develop Apache Beam
pipelines, that are running on Flink Runner, and are using Python SDK
harness to handle some of the tasks. I aim to develop the Beam pipelines
using Python.
To establish this environment i am using Docker.

My current setup is:
Windows 10 64-bit, 32 GB Ram on system
Docker 2.5.0.0 Stable, Engine 19.03.13, Compose 1.27.4
On local system: Python 3.9, java version "1.8.0_361"
docker beam image: apache/beam_python3.9_sdk:2.48.0
docker flink image: apache/flink:1.14.4-java8

For the way, how to execute the tasks on the python SDK harness, i tryed
environment_type='PROCESS' and 'EXTERNAL'. Where the goal is to use the
'PROCESS' variant. But to better understand what is going on, i am using
the EXTERNAL option, and the service is launched manually inside of the
taskmanager container executing this "/opt/apache/beam/boot --worker_pool".
The "boot" itself is installed in the image as defined in the Dockerfile.

My job is a simple hello-world program, containing just a few prints. The
PipelineOptions are:

...

beam_options = PipelineOptions(
    save_main_session=False,
    setup_file="./setup.py",
    runner='FlinkRunner',
    flink_master='localhost:8081',
    flink_version='1.14',
    parallelism=1,
    environment_type='EXTERNAL',
    environment_config='localhost:50000'
)

print("tlacim nejaky hello world vypis")
print(beam_options.get_all_options())
print("koniec vypisu")

app.run(
    input_text=args.input_text,
    beam_options=beam_options,
)

where the setup.py is including requirements.txt listed at the bottom of
this email.

Now when i execute the pipeline, it is successfully builded and deployed to
the Flink runner.

The building process outputs this:

C:\Users\rsianta\Miniconda3\envs\BeamNewest\python.exe
C:/smc/beam/beam-starter-python/main.py
tlacim nejaky hello world vypis
{'runner': 'FlinkRunner', 'streaming': False, 'resource_hints': [],
'beam_services': {}, 'type_check_strictness': 'DEFAULT_TO_ANY',
'type_check_additional': '', 'pipeline_type_check': True,
'runtime_type_check': False, 'performance_runtime_type_check': False,
'allow_non_deterministic_key_coders': False, 'allow_unsafe_triggers':
False, 'direct_runner_use_stacked_bundle': True,
'direct_runner_bundle_repeat': 0, 'direct_num_workers': 1,
'direct_running_mode': 'in_memory', 'direct_embed_docker_python': False,
'direct_test_splits': {}, 'dataflow_endpoint': '
https://dataflow.googleapis.com', 'project': None, 'job_name': None,
'staging_location': None, 'temp_location': None, 'region': None,
'service_account_email': None, 'no_auth': False, 'template_location': None,
'labels': None, 'update': False, 'transform_name_mapping': None,
'enable_streaming_engine': False, 'dataflow_kms_key': None,
'create_from_snapshot': None, 'flexrs_goal': None,
'dataflow_service_options': None, 'enable_hot_key_logging': False,
'enable_artifact_caching': False, 'impersonate_service_account': None,
'gcp_oauth_scopes': ['https://www.googleapis.com/auth/bigquery', '
https://www.googleapis.com/auth/cloud-platform', '
https://www.googleapis.com/auth/devstorage.full_control', '
https://www.googleapis.com/auth/userinfo.email', '
https://www.googleapis.com/auth/datastore', '
https://www.googleapis.com/auth/spanner.admin', '
https://www.googleapis.com/auth/spanner.data'], 'azure_connection_string':
None, 'blob_service_endpoint': None, 'azure_managed_identity_client_id':
None, 'hdfs_host': None, 'hdfs_port': None, 'hdfs_user': None,
'hdfs_full_urls': False, 'num_workers': None, 'max_num_workers': None,
'autoscaling_algorithm': None, 'machine_type': None, 'disk_size_gb': None,
'disk_type': None, 'worker_region': None, 'worker_zone': None, 'zone':
None, 'network': None, 'subnetwork': None,
'worker_harness_container_image': None, 'sdk_container_image': None,
'sdk_harness_container_image_overrides': None,
'default_sdk_harness_log_level': None, 'sdk_harness_log_level_overrides':
None, 'use_public_ips': None, 'min_cpu_platform': None,
'dataflow_job_file': None, 'experiments': None,
'number_of_worker_harness_threads': None, 'profile_cpu': False,
'profile_memory': False, 'profile_location': None, 'profile_sample_rate':
1.0, 'requirements_file': None, 'requirements_cache': None,
'requirements_cache_only_sources': False, 'setup_file': './setup.py',
'beam_plugins': None, 'pickle_library': 'default', 'save_main_session':
False, 'sdk_location': 'default', 'extra_packages': None,
'prebuild_sdk_container_engine': None, 'prebuild_sdk_container_base_image':
None, 'cloud_build_machine_type': None, 'docker_registry_push_url': None,
'job_endpoint': None, 'artifact_endpoint': None, 'job_server_timeout': 60,
'environment_type': 'EXTERNAL', 'environment_config': 'localhost:50000',
'environment_options': None, 'sdk_worker_parallelism': 1,
'environment_cache_millis': 0, 'output_executable_path': None,
'artifacts_dir': None, 'job_port': 0, 'artifact_port': 0, 'expansion_port':
0, 'job_server_java_launcher': 'java', 'job_server_jvm_properties': [],
'flink_master': 'localhost:8081', 'flink_version': '1.14',
'flink_job_server_jar': None, 'flink_submit_uber_jar': False,
'parallelism': 1, 'max_parallelism': -1, 'spark_master_url': 'local[4]',
'spark_job_server_jar': None, 'spark_submit_uber_jar': False,
'spark_rest_url': None, 'spark_version': '3', 'on_success_matcher': None,
'dry_run': False, 'wait_until_finish_duration': None, 'pubsubRootUrl':
None, 's3_access_key_id': None, 's3_secret_access_key': None,
's3_session_token': None, 's3_endpoint_url': None, 's3_region_name': None,
's3_api_version': None, 's3_verify': None, 's3_disable_ssl': False}
koniec vypisu
INFO:root:Default Python SDK image for environment is
apache/beam_python3.9_sdk:2.48.0
INFO:apache_beam.runners.portability.stager:Executing command:
['C:\\Users\\rsianta\\Miniconda3\\envs\\BeamNewest\\python.exe',
'setup.py', 'sdist', '--dist-dir',
'C:\\Users\\rsianta\\AppData\\Local\\Temp\\tmpggi9cf3y']
INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
<function pack_combiners at 0x000001E6656CD670> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
<function lift_combiners at 0x000001E6656CD700> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:====================
<function sort_stages at 0x000001E6656CDE50> ====================
INFO:apache_beam.runners.portability.flink_runner:Adding HTTP protocol
scheme to flink_master parameter: http://localhost:8081
INFO:apache_beam.utils.subprocess_server:Using cached job server jar from
https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.14-job-server/2.48.0/beam-runners-flink-1.14-job-server-2.48.0.jar
INFO:apache_beam.utils.subprocess_server:Starting service with ['java'
'-jar'
'C:\\Users\\rsianta/.apache_beam/cache/jars\\beam-runners-flink-1.14-job-server-2.48.0.jar'
'--flink-master' 'http://localhost:8081' '--artifacts-dir'
'C:\\Users\\rsianta\\AppData\\Local\\Temp\\beam-temptolduaa5\\artifactsbjonha4y'
'--job-port' '57745' '--artifact-port' '0' '--expansion-port' '0']
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
org.apache.beam.runners.jobsubmission.JobServerDriver
createArtifactStagingService
INFO:apache_beam.utils.subprocess_server:INFO: ArtifactStagingService
started on localhost:57778
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
org.apache.beam.runners.jobsubmission.JobServerDriver createExpansionService
INFO:apache_beam.utils.subprocess_server:INFO: Java ExpansionService
started on localhost:57779
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
org.apache.beam.runners.jobsubmission.JobServerDriver createJobServer
INFO:apache_beam.utils.subprocess_server:INFO: JobService started on
localhost:57745
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
org.apache.beam.runners.jobsubmission.JobServerDriver run
INFO:apache_beam.utils.subprocess_server:INFO: Job server now running,
terminate with Ctrl+C
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2 onNext
INFO:apache_beam.utils.subprocess_server:INFO: Staging artifacts for
job_321fed50-f7ae-4e93-8774-2bf3fe155938.
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2
resolveNextEnvironment
INFO:apache_beam.utils.subprocess_server:INFO: Resolving artifacts for
job_321fed50-f7ae-4e93-8774-2bf3fe155938.ref_Environment_default_environment_1.
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2 onNext
INFO:apache_beam.utils.subprocess_server:INFO: Getting 1 artifacts for
job_321fed50-f7ae-4e93-8774-2bf3fe155938.null.
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:52 PM
org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$2
finishStaging
INFO:apache_beam.utils.subprocess_server:INFO: Artifacts fully staged for
job_321fed50-f7ae-4e93-8774-2bf3fe155938.
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM
org.apache.beam.runners.flink.FlinkJobInvoker invokeWithExecutor
INFO:apache_beam.utils.subprocess_server:INFO: Invoking job
BeamApp-rsianta-1106130652-47ae11d_ad4e7a8b-eb96-45b4-8498-74b2eaa8ad66
with pipeline runner
org.apache.beam.runners.flink.FlinkPipelineRunner@48a4ca17
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM
org.apache.beam.runners.jobsubmission.JobInvocation start
INFO:apache_beam.utils.subprocess_server:INFO: Starting job invocation
BeamApp-rsianta-1106130652-47ae11d_ad4e7a8b-eb96-45b4-8498-74b2eaa8ad66
INFO:apache_beam.runners.portability.portable_runner:Job state changed to
STOPPED
INFO:apache_beam.runners.portability.portable_runner:Job state changed to
STARTING
INFO:apache_beam.runners.portability.portable_runner:Job state changed to
RUNNING
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM
org.apache.beam.runners.flink.FlinkPipelineRunner runPipelineWithTranslator
INFO:apache_beam.utils.subprocess_server:INFO: Translating pipeline to
Flink program.
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM
org.apache.beam.runners.flink.FlinkExecutionEnvironments
createBatchExecutionEnvironment
INFO:apache_beam.utils.subprocess_server:INFO: Creating a Batch Execution
Environment.
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:53 PM
org.apache.beam.runners.flink.FlinkExecutionEnvironments
createBatchExecutionEnvironment
INFO:apache_beam.utils.subprocess_server:INFO: Using Flink Master URL
localhost:8081.
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:54 PM
org.apache.flink.api.java.utils.PlanGenerator logTypeRegistrationDetails
INFO:apache_beam.utils.subprocess_server:INFO: The job has 0 registered
types and 0 default Kryo serializers
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:06:54 PM
org.apache.flink.client.program.rest.RestClusterClient lambda$submitJob$7
INFO:apache_beam.utils.subprocess_server:INFO: Submitting job
'BeamApp-rsianta-1106130652-47ae11d' (cb7421fd5da8f581828f4b39e809c0e5).
INFO:apache_beam.utils.subprocess_server:nov 06, 2024 2:07:03 PM
org.apache.flink.client.program.rest.RestClusterClient lambda$null$6
INFO:apache_beam.utils.subprocess_server:INFO: Successfully submitted job
'BeamApp-rsianta-1106130652-47ae11d' (cb7421fd5da8f581828f4b39e809c0e5) to '
http://localhost:8081'.

I am observing the outputs of the SDK Harness on the taskmanager console

The SDK Harness stdout is this:

# /opt/apache/beam/boot --worker_pool
2024/11/06 13:05:38 Starting worker pool 365: python -m
apache_beam.runners.worker.worker_pool_main --service_port=50000
--container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1',
'--logging_endpoint=localhost:44529',
'--artifact_endpoint=localhost:43809',
'--provision_endpoint=localhost:38551',
'--control_endpoint=localhost:41251']
2024/11/06 13:07:07 Provision info:
pipeline_options:{fields:{key:"beam:option:allow_non_deterministic_key_coders:v1"
value:{bool_value:false}}
fields:{key:"beam:option:allow_non_restored_state:v1"
value:{bool_value:false}}
fields:{key:"beam:option:allow_unsafe_triggers:v1"
value:{bool_value:false}} fields:{key:"beam:option:app_name:v1"
value:{null_value:NULL_VALUE}} fields:{key:"beam:option:artifact_port:v1"
value:{string_value:"0"}}
fields:{key:"beam:option:auto_balance_write_files_sharding_enabled:v1"
value:{bool_value:false}} fields:{key:"beam:option:beam_services:v1"
value:{struct_value:{}}} fields:{key:"beam:option:block_on_run:v1"
value:{bool_value:true}} fields:{key:"beam:option:dataflow_endpoint:v1"
value:{string_value:"https://dataflow.googleapis.com"}}
fields:{key:"beam:option:direct_embed_docker_python:v1"
value:{bool_value:false}} fields:{key:"beam:option:direct_num_workers:v1"
value:{string_value:"1"}}
fields:{key:"beam:option:direct_runner_bundle_repeat:v1"
value:{string_value:"0"}}
fields:{key:"beam:option:direct_runner_use_stacked_bundle:v1"
value:{bool_value:true}} fields:{key:"beam:option:direct_running_mode:v1"
value:{string_value:"in_memory"}}
fields:{key:"beam:option:direct_test_splits:v1" value:{struct_value:{}}}
fields:{key:"beam:option:disable_metrics:v1" value:{bool_value:false}}
fields:{key:"beam:option:dry_run:v1" value:{bool_value:false}}
fields:{key:"beam:option:enable_artifact_caching:v1"
value:{bool_value:false}} fields:{key:"beam:option:enable_bundling:v1"
value:{bool_value:false}}
fields:{key:"beam:option:enable_hot_key_logging:v1"
value:{bool_value:false}}
fields:{key:"beam:option:enable_streaming_engine:v1"
value:{bool_value:false}} fields:{key:"beam:option:enforce_encodability:v1"
value:{bool_value:true}} fields:{key:"beam:option:enforce_immutability:v1"
value:{bool_value:true}}
fields:{key:"beam:option:environment_cache_millis:v1"
value:{string_value:"0"}} fields:{key:"beam:option:environment_config:v1"
value:{string_value:"localhost:50000"}}
fields:{key:"beam:option:environment_type:v1"
value:{string_value:"EXTERNAL"}}
fields:{key:"beam:option:expansion_port:v1" value:{string_value:"0"}}
fields:{key:"beam:option:experiments:v1"
value:{list_value:{values:{string_value:"beam_fn_api"}}}}
fields:{key:"beam:option:externalized_checkpoints_enabled:v1"
value:{bool_value:false}}
fields:{key:"beam:option:fail_on_checkpointing_errors:v1"
value:{bool_value:true}} fields:{key:"beam:option:faster_copy:v1"
value:{bool_value:false}}
fields:{key:"beam:option:finish_bundle_before_checkpointing:v1"
value:{bool_value:false}} fields:{key:"beam:option:flink_conf_dir:v1"
value:{null_value:NULL_VALUE}} fields:{key:"beam:option:flink_master:v1"
value:{string_value:"http://localhost:8081"}}
fields:{key:"beam:option:flink_submit_uber_jar:v1"
value:{bool_value:false}} fields:{key:"beam:option:flink_version:v1"
value:{string_value:"1.14"}} fields:{key:"beam:option:gcp_oauth_scopes:v1"
value:{list_value:{values:{string_value:"
https://www.googleapis.com/auth/bigquery"} values:{string_value:"
https://www.googleapis.com/auth/cloud-platform"} values:{string_value:"
https://www.googleapis.com/auth/devstorage.full_control"}
values:{string_value:"https://www.googleapis.com/auth/userinfo.email"}
values:{string_value:"https://www.googleapis.com/auth/datastore"}
values:{string_value:"https://www.googleapis.com/auth/spanner.admin"}
values:{string_value:"https://www.googleapis.com/auth/spanner.data"}}}}
fields:{key:"beam:option:gcs_performance_metrics:v1"
value:{bool_value:false}} fields:{key:"beam:option:hdfs_full_urls:v1"
value:{bool_value:false}} fields:{key:"beam:option:job_name:v1"
value:{string_value:"BeamApp-rsianta-1106130652-47ae11d"}}
fields:{key:"beam:option:job_port:v1" value:{string_value:"0"}}
fields:{key:"beam:option:job_server_java_launcher:v1"
value:{string_value:"java"}}
fields:{key:"beam:option:job_server_jvm_properties:v1"
value:{list_value:{}}} fields:{key:"beam:option:job_server_timeout:v1"
value:{string_value:"60"}}
fields:{key:"beam:option:load_balance_bundles:v1" value:{bool_value:false}}
fields:{key:"beam:option:max_parallelism:v1" value:{string_value:"-1"}}
fields:{key:"beam:option:no_auth:v1" value:{bool_value:false}}
fields:{key:"beam:option:object_reuse:v1" value:{bool_value:false}}
fields:{key:"beam:option:operator_chaining:v1" value:{bool_value:true}}
fields:{key:"beam:option:options_id:v1" value:{number_value:2}}
fields:{key:"beam:option:output_executable_path:v1"
value:{null_value:NULL_VALUE}} fields:{key:"beam:option:parallelism:v1"
value:{string_value:"1"}}
fields:{key:"beam:option:performance_runtime_type_check:v1"
value:{bool_value:false}} fields:{key:"beam:option:pickle_library:v1"
value:{string_value:"default"}}
fields:{key:"beam:option:pipeline_type_check:v1" value:{bool_value:true}}
fields:{key:"beam:option:profile_cpu:v1" value:{bool_value:false}}
fields:{key:"beam:option:profile_memory:v1" value:{bool_value:false}}
fields:{key:"beam:option:profile_sample_rate:v1" value:{number_value:1}}
fields:{key:"beam:option:re_iterable_group_by_key_result:v1"
value:{bool_value:false}}
fields:{key:"beam:option:requirements_cache_only_sources:v1"
value:{bool_value:false}} fields:{key:"beam:option:resource_hints:v1"
value:{list_value:{}}}
fields:{key:"beam:option:retain_docker_containers:v1"
value:{bool_value:false}}
fields:{key:"beam:option:retain_externalized_checkpoints_on_cancellation:v1"
value:{bool_value:false}} fields:{key:"beam:option:runner:v1"
value:{null_value:NULL_VALUE}}
fields:{key:"beam:option:runner_determined_sharding:v1"
value:{bool_value:true}} fields:{key:"beam:option:runtime_type_check:v1"
value:{bool_value:false}} fields:{key:"beam:option:s3_disable_ssl:v1"
value:{bool_value:false}} fields:{key:"beam:option:save_main_session:v1"
value:{bool_value:false}} fields:{key:"beam:option:sdk_location:v1"
value:{string_value:"container"}}
fields:{key:"beam:option:sdk_worker_parallelism:v1"
value:{string_value:"1"}} fields:{key:"beam:option:setup_file:v1"
value:{string_value:"./setup.py"}}
fields:{key:"beam:option:spark_master_url:v1"
value:{string_value:"local[4]"}}
fields:{key:"beam:option:spark_submit_uber_jar:v1"
value:{bool_value:false}} fields:{key:"beam:option:spark_version:v1"
value:{string_value:"3"}} fields:{key:"beam:option:streaming:v1"
value:{bool_value:false}}
fields:{key:"beam:option:type_check_additional:v1" value:{string_value:""}}
fields:{key:"beam:option:type_check_strictness:v1"
value:{string_value:"DEFAULT_TO_ANY"}} fields:{key:"beam:option:update:v1"
value:{bool_value:false}}
fields:{key:"beam:option:use_storage_api_connection_pool:v1"
value:{bool_value:false}}
fields:{key:"beam:option:use_storage_write_api:v1"
value:{bool_value:false}}
fields:{key:"beam:option:use_storage_write_api_at_least_once:v1"
value:{bool_value:false}} fields:{key:"beam:option:verify_row_values:v1"
value:{bool_value:false}}} logging_endpoint:{url:"localhost:44529"}
artifact_endpoint:{url:"localhost:43809"}
control_endpoint:{url:"localhost:41251"}
dependencies:{type_urn:"beam:artifact:type:file:v1"
type_payload:"\n\xb4\x01C:\\Users\\rsianta\\AppData\\Local\\Temp\\beam-temptolduaa5\\artifactsbjonha4y\\f7e1317b082363ba3eea86b1de88226bcb4063e22d7fc507188f6b95286f06c6\\1-ref_Environment_default_e-workflow.tar.gz"
role_urn:"beam:artifact:role:staging_to:v1"
role_payload:"\n\x0fworkflow.tar.gz"}
runner_capabilities:"beam:protocol:control_response_elements_embedding:v1"
2024/11/06 13:07:18 failed to retrieve staged files: failed to retrieve
/tmp/staged in 3 attempts: failed to retrieve chunk for
/tmp/staged/workflow.tar.gz
        caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for
/tmp/staged/workflow.tar.gz
        caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for
/tmp/staged/workflow.tar.gz
        caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for
/tmp/staged/workflow.tar.gz
        caused by:
rpc error: code = Unknown desc =


It seems as if the harness is unable to reach the artifacts. If I
understand it correctly, the harness is trying to get the files from
/tmp/staged/workflow.tar.gz, but in the provided pipeline_options I see the
location of the artifact to be placed in my local windows filesystem (they
are really there). The /tmp/staged/workflow.tar.gz exists on the
taskmanager filesystem, but the file is empty:
# cd /tmp/staged/
# ls -l
total 0
-rwxr-xr-x 1 root root 0 Nov  4 11:34 workflow.tar.gz
#


But I don't know how to setup everything properly, so the harness is able
to get the artifacts.

Can anyone provide some tips, please?


I will also include my Dockerfile, requirements.txt and docker-compose.yaml
here:

*Dockerfile:*
# Use the Beam SDK image as a build stage
FROM apache/beam_python3.9_sdk:2.48.0 AS beam

# main image starts here
FROM apache/flink:1.14.4-java8

# Install Python 3.9 and required packages
RUN apt-get update && apt-get install -y \
    python3 \
    python3-dev \
    python3-distutils \
    python3-pip \
    python3-venv \
    curl \
    locales \
    build-essential \
    git \
    wget \
    unzip \
    python3-setuptools \
    libffi-dev \
    libssl-dev \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

COPY requirements.txt /tmp/requirements.txt

# Upgrade pip to the latest version
RUN python3 -m pip install --upgrade pip && \
    python3 -m pip install -r /tmp/requirements.txt

# Install Apache Beam SDK version 2.48.0
RUN python3 -m pip install apache-beam==2.48.0

# Set python3 as the default python
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1

# Set up locales
RUN locale-gen en_US.UTF-8
ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en' LC_ALL='en_US.UTF-8'

# Copy the boot script and other necessary files from the Beam image
COPY --from=beam /opt/apache/beam /opt/apache/beam

# Expose Flink web interface port
EXPOSE 8081

# Set the working directory (optional, as per your needs)
WORKDIR /home/flink

# Do not change the ENTRYPOINT and CMD from the Flink image
# The base image's ENTRYPOINT and CMD are preserved

*docker-compose-yaml:*
version: '3.9'

services:
  jobmanager:
    image: flink1.14.4j8_p3.9_beam2.48
    container_name: flink-beam-jobmanager
    hostname: jobmanager
    ports:
      - "8081:8081"      # Expose Flink Web UI on localhost:8081
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
    volumes:
    - c:\smc\beam\beam-starter-python\:/beam
    -
c:\smc\backend-dev\dev-prostredie\docker-infrastruktura\artifacts\:/artifacts

  taskmanager:
    image: flink1.14.4j8_p3.9_beam2.48
    container_name: flink-beam-taskmanager
    command: taskmanager
    scale: 1
    network_mode: "service:jobmanager"
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
        taskmanager.numberOfTaskSlots: 2
    depends_on:
      - jobmanager

    volumes:
    - c:\smc\beam\beam-starter-python\:/beam
    -
c:\smc\backend-dev\dev-prostredie\docker-infrastruktura\artifacts\:/artifacts

*requirements.txt:*
absl-py==1.4.0
astunparse==1.6.3
attrs==23.1.0
beautifulsoup4==4.12.2
bs4==0.0.1
cachetools==5.3.0
certifi==2023.5.7
cffi==1.15.1
charset-normalizer==3.1.0
click==8.1.3
cloudpickle==2.2.1
crcmod==1.7
cryptography==40.0.2
Cython==0.29.34
deprecation==2.1.0
dill==0.3.1.1
dnspython==2.3.0
docker==6.1.2
docopt==0.6.2
exceptiongroup==1.1.1
execnet==1.9.0
fastavro==1.7.4
fasteners==0.18
flatbuffers==23.5.9
freezegun==1.2.2
future==0.18.3
gast==0.4.0
google-api-core==2.11.0
google-api-python-client==2.86.0
google-apitools==0.5.31
google-auth==2.18.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==1.0.0
google-cloud-bigquery==3.10.0
google-cloud-bigquery-storage==2.19.1
google-cloud-bigtable==2.17.0
google-cloud-core==2.3.2
google-cloud-datastore==2.15.2
google-cloud-dlp==3.12.1
google-cloud-language==2.9.1
google-cloud-profiler==4.0.0
google-cloud-pubsub==2.17.0
google-cloud-pubsublite==1.8.2
google-cloud-recommendations-ai==0.10.3
google-cloud-spanner==3.34.0
google-cloud-videointelligence==2.11.1
google-cloud-vision==3.4.1
google-crc32c==1.5.0
google-pasta==0.2.0
google-resumable-media==2.5.0
googleapis-common-protos==1.59.0
greenlet==2.0.2
grpc-google-iam-v1==0.12.6
grpcio==1.54.2
grpcio-status==1.54.2
guppy3==3.1.3
h5py==3.8.0
hdfs==2.7.0
httplib2==0.22.0
hypothesis==6.75.3
idna==3.4
importlib-metadata==6.6.0
iniconfig==2.0.0
jax==0.4.10
joblib==1.2.0
keras==2.12.0
libclang==16.0.0
Markdown==3.4.3
MarkupSafe==2.1.2
ml-dtypes==0.1.0
mmh3==3.1.0
mock==5.0.2
nltk==3.8.1
nose==1.3.7
numpy==1.23.5
oauth2client==4.1.3
oauthlib==3.2.2
objsize==0.6.1
opt-einsum==3.3.0
orjson==3.8.12
overrides==6.5.0
packaging==23.1
pandas==1.5.3
parameterized==0.9.0
pluggy==1.0.0
proto-plus==1.22.2
protobuf==4.23.0
psycopg2-binary==2.9.6
pyarrow==11.0.0
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycparser==2.21
pydot==1.4.2
PyHamcrest==2.0.4
pymongo==4.3.3
PyMySQL==1.0.3
pyparsing==3.0.9
pytest==7.3.1
pytest-timeout==2.1.0
pytest-xdist==3.3.0
python-dateutil==2.8.2
python-snappy==0.6.1
pytz==2023.3
PyYAML==6.0
regex==2023.5.5
requests==2.30.0
requests-mock==1.10.0
requests-oauthlib==1.3.1
rsa==4.9
scikit-learn==1.2.2
scipy==1.10.1
six==1.16.0
sortedcontainers==2.4.0
soupsieve==2.4.1
SQLAlchemy==1.4.48
sqlparse==0.4.4
tenacity==8.2.2
tensorboard==2.12.3
tensorboard-data-server==0.7.0
tensorflow==2.12.0
tensorflow-estimator==2.12.0
tensorflow-io-gcs-filesystem==0.32.0
termcolor==2.3.0
testcontainers==3.7.1
threadpoolctl==3.1.0
tomli==2.0.1
tqdm==4.65.0
typing_extensions==4.5.0
uritemplate==4.1.1
urllib3==1.26.15
websocket-client==1.5.1
Werkzeug==2.3.4
wrapt==1.14.1
zipp==3.15.0
zstandard==0.21.0

Reply via email to