from:"Christian Pfarr"

Re: Choice of IDE for Spark

2021-10-02 Thread Christian Pfarr

We use Jupyter on Hadoop https://jupyterhub-on-hadoop.readthedocs.io/en/latest/ for developing spark jobs directly inside the Cluster it should run. With that you have direct access to yarn and hdfs (fully secured) without any migration steps. You can control the size of your Jupyter yarn

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Christian Pfarr

Does anyone know where the data for this benchmark was stored? Spark on YARN gets performance because of data locality via co-allocation of YARN Nodemanager and HDFS Datanode, not because of the job scheduler, right? Regards, z0ltrix \ Original-Nachric