[GitHub] [zeppelin] zjffdu commented on pull request #4097: [ZEPPELIN-5330]. Support conda env for python interpreter in yarn mode

GitBox Tue, 18 May 2021 08:45:56 -0700


zjffdu commented on pull request #4097:
URL: https://github.com/apache/zeppelin/pull/4097#issuecomment-843283503



   
   > I was aware of this, but it seems that downloading dependencies several 
times is the way of `spark.archives`. It is clear that this is not optimal.
   
   If I read spark code correctly, spark only download `spark.archives` in 
driver side and distribute to executors via its internal mechanism between 
driver and executor. 
   
   > By local, do you mean the local file system of the Zeppelin server? In my 
environment, the Zeppelin user does not have access to the local file system of 
the Zeppelin server. Therefore, I would prefer a remote endpoint that is under 
the control of the Zeppelin user.
   > I understand your development approach and it sounds great, but I think 
this is not suitable for a production environment.
   
   Let me clarify it, actually it is not only local file system, it could be 
any hadoop compatible file system, such as hdfs, s3. 
   
   > Maybe we can support the download in `JupyterKernelInterpreter.java` with 
an additional property. Then it should not matter whether the files were 
provided by YARN or the download.
   
   Actually for spark yarn mode, it would still use yarn cache mechanism to 
distribute archives [1]. Of course, for k8s mode, we can use other property to 
download the archive in `JupyterKernelInterpreter.java` for pure python 
interpreter, but for pyspark, it is not necessary, because it is already done 
by SparkSubmit [2]
   
   * [1] 
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1663
   * [2] 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L391
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [zeppelin] zjffdu commented on pull request #4097: [ZEPPELIN-5330]. Support conda env for python interpreter in yarn mode

Reply via email to