Hi Michael,

Your problems in 1. and 2. are related to the artifact staging workflow,
where Beam tries to copy your pipeline’s dependencies to the workers. When
artifacts cannot be fetched because of file system or other issues, the
workers cannot be started successfully. In this case, your pipeline depends
on the pickled main session from estimate_pi.py [1].

In order for artifact staging to work, the job server’s --artifacts-dir
must be accessible by the Spark worker.* Since you start your job server in
a Docker container, /users/mkuchnik/staging is hidden inside that Docker
container’s filesystem, which is not accessible from your network
filesystem. You mentioned in 2. that you tried mounting the directory to
the [worker (?)] containers, but have you tried mounting that directory to
the job server container?

Thanks,
Kyle

* It looks like this is unclear in current documentation, so I will edit it.

[1]
https://github.com/apache/beam/blob/2c619c81082839e054f16efee9311b9f74b6e436/sdks/python/apache_beam/examples/complete/estimate_pi.py#L118

Reply via email to