I only believe @OrielResearch Eila Arich-Landkof
<e...@orielresearch.org> potentially
doing applied work with custom containers (there must be others)!

For a plug for her and @BeamSummit --  I think enough related will be
talked about in (with Conda specifics) -->
https://2020.beamsummit.org/sessions/workshop-using-conda-on-beam/

I'm sure others will have more things to say that are actually helpful,
on-list, before that occurs (~3 weeks).



On Fri, Aug 7, 2020 at 6:32 PM Eugene Kirpichov <ekirpic...@gmail.com>
wrote:

> Hi old Beam friends,
>
> I left Google to work on climate change
> <https://www.linkedin.com/posts/eugenekirpichov_i-am-leaving-google-heres-a-snipped-to-activity-6683408492444962816-Mw5U>
> and am now doing a short engagement with Pachama <https://pachama.com/>.
> Right now I'm trying to get a Beam Python pipeline to work; the pipeline
> will use fancy requirements and native dependencies, and we plan to run it
> on Cloud Dataflow (so custom containers are not yet an option), so I'm
> going straight for the direct PortableRunner as per
> https://beam.apache.org/documentation/runtime/environments/.
>
> Basically I can't get a minimal Beam program with a minimal
> requirements.txt file to work - the .tar.gz of the dependency mysteriously
> ends up being ungzipped and non-installable inside the Docker container
> running the worker. Details below.
>
> === main.py ===
> import argparse
> import logging
>
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.options.pipeline_options import SetupOptions
>
> def run(argv=None):
>     parser = argparse.ArgumentParser()
>     known_args, pipeline_args = parser.parse_known_args(argv)
>
>     pipeline_options = PipelineOptions(pipeline_args)
>     pipeline_options.view_as(SetupOptions).save_main_session = True
>
>     with beam.Pipeline(options=pipeline_options) as p:
>         (p | 'Create' >> beam.Create(['Hello'])
>            | 'Write' >> beam.io.WriteToText('/tmp'))
>
>
> if __name__ == '__main__':
>     logging.getLogger().setLevel(logging.INFO)
>     run()
>
> === requirements.txt ===
> alembic
>
> When I run the program:
> $ python3 main.py
> --runner=PortableRunner --job_endpoint=embed 
> --requirements_file=requirements.txt
>
>
> I get some normal output and then:
>
> INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:b'
>  File
> "/usr/local/lib/python3.7/site-packages/pip/_internal/utils/unpacking.py",
> line 261, in unpack_file\n    untar_file(filename, location)\n  File
> "/usr/local/lib/python3.7/site-packages/pip/_internal/utils/unpacking.py",
> line 177, in untar_file\n    tar = tarfile.open(filename, mode)\n  File
> "/usr/local/lib/python3.7/tarfile.py", line 1591, in open\n    return
> func(name, filemode, fileobj, **kwargs)\n  File
> "/usr/local/lib/python3.7/tarfile.py", line 1648, in gzopen\n    raise
> ReadError("not a gzip file")\ntarfile.ReadError: not a gzip
> file\n2020/08/08 01:17:07 Failed to install required packages: failed to
> install requirements: exit status 2\n'
>
> This greatly puzzled me and, after some looking, I found something really
> surprising. Here is the package in the *directory to be staged*:
>
> $ file
> /var/folders/07/j09mnhmd2q9_kw40xrbfvcg80000gn/T/dataflow-requirements-cache/alembic-1.4.2.tar.gz
> ...: gzip compressed data, was "dist/alembic-1.4.2.tar", last modified:
> Thu Mar 19 21:48:31 2020, max compression, original size modulo 2^32 4730880
> $ ls -l
> /var/folders/07/j09mnhmd2q9_kw40xrbfvcg80000gn/T/dataflow-requirements-cache/alembic-1.4.2.tar.gz
> -rw-r--r--  1 jkff  staff  1092045 Aug  7 16:56 ...
>
> So far so good. But here is the same file inside the Docker container (I ssh'd
> into the dead container
> <https://thorsten-hans.com/how-to-run-commands-in-stopped-docker-containers>
> ):
>
> # file /tmp/staged/alembic-1.4.2.tar.gz
> /tmp/staged/alembic-1.4.2.tar.gz: POSIX tar archive (GNU)
> # ls -l /tmp/staged/alembic-1.4.2.tar.gz
> -rwxr-xr-x 1 root root 4730880 Aug  8 01:17
> /tmp/staged/alembic-1.4.2.tar.gz
>
> The file has clearly been unzipped and now of course pip can't install it!
> What's going on here? Am I using the direct/portable runner combination
> wrong?
>
> Thanks!
>
> --
> Eugene Kirpichov
> http://www.linkedin.com/in/eugenekirpichov
>

Reply via email to