Hello dear friends,

I hope everyone is doing fine and staying safe.


This query is for SPARK 3.0.1.


The following works:

> pyspark --py-files s3://gourav-bucket/spark_nlp_display-1.7-py3.7.egg
> >>> import sparknlp_display
> >>>


But when I start python, and then create a spark session then it gives an
error even if I do not add the configuration spark.yarn.dist.pyFiles:


>>>spark = SparkSession.builder.master("yarn") \
> .config("spark.submit.pyFiles",
> "s3://gourav-bucket/spark_nlp_display-1.7-py3.7.egg") \
> .config("spark.yarn.dist.pyFiles",
> "s3://gourav-bucket/spark_nlp_display-1.7-py3.7.egg") \
> getOrCreate()
> >>> import sparknlp_display
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "<frozen importlib._bootstrap>", line 983, in _find_and_load
>   File "<frozen importlib._bootstrap>", line 967, in
> _find_and_load_unlocked
>   File "<frozen importlib._bootstrap>", line 668, in _load_unlocked
>   File "<frozen importlib._bootstrap>", line 638, in
> _load_backward_compatible
>   File
> "/mnt/tmp/spark-692956ec-cc89-4b11-a1de-ced1a164ef1b/userFiles-ed25968c-cc16-4900-b910-75d933db3afb/spark_nlp_display-1.7-py3.7.egg/sparknlp_display/__init__.py",
> line 16, in <module>
>     __version__ = get_version()
>   File
> "/mnt/tmp/spark-692956ec-cc89-4b11-a1de-ced1a164ef1b/userFiles-ed25968c-cc16-4900-b910-75d933db3afb/spark_nlp_display-1.7-py3.7.egg/sparknlp_display/__init__.py",
> line 12, in get_version
>     with open(os.path.join(here, "VERSION"), "r") as fh:
> NotADirectoryError: [Errno 20] Not a directory:
> '/mnt/tmp/spark-692956ec-cc89-4b11-a1de-ced1a164ef1b/userFiles-ed25968c-cc16-4900-b910-75d933db3afb/spark_nlp_display-1.7-py3.7.egg/sparknlp_display/VERSION'


When I do ls I can see that the following is present:
/mnt/tmp/spark-692956ec-cc89-4b11-a1de-ced1a164ef1b/userFiles-ed25968c-cc16-4900-b910-75d933db3afb/spark_nlp_display-1.7-py3.7.egg


When I unzip the egg file I do see the following files there under
sparknlp_display folder:

> VERSION
> __init__.py
> __pycache__
> assertion.py
> dep_updates.py
> dependency_parser.py
> entity_resolution.py
> fonts
> label_colors
> ner.py
> re_updates.py
> relation_extraction.py
> retemp.py
> style.css
> style_utils.py


I will be grateful if someone could kindly let me know what am I doing
wrong here.




Regards,

Gourav Sengupta

Reply via email to