I have a simple apache beam project using python 3 to transform some data
and write to big query, it uses a package called texstat, if I run locally
everything works, but when I run on dataflow I get the following error:
NameError: name 'textstat' is not defined [while running
'generatedPtransform-441']
This is my current setup.py file:
import setuptools REQUIRED_PACKAGES = ['textstat==0.5.6'] PACKAGE_NAME =
'my_package' PACKAGE_VERSION = '0.0.1' setuptools.setup( name=PACKAGE_NAME,
version=PACKAGE_VERSION, description='Example project',
install_requires=REQUIRED_PACKAGES, packages=setuptools.find_packages(), )
and this are my pipeline args
pipeline_args = [ '--project={}'.format('etl-example'),
'--runner={}'.format('Dataflow'), '--temp_location=gs://dataflowtemporal/',
'--setup_file=./setup.py', ]
and I run it like this
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(StandardOptions).streaming = True pipeline =
beam.Pipeline(options=pipeline_options) ... pipeline.run()
I also tried with running this on the terminal before running the job:
python setup.py sdist --formats=gztar
but I get the same results of texstat not being found. Another thing I
tries was without setup.py and only with the argument
--requirements_file=./requirements.txt
But again, texstat is not found
At this point I don't know what else to try.
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/google-appengine/44460c42-5e16-4131-9281-9d78ec0a3086%40googlegroups.com.