Hi Jun Kim - Cluster SSH is cool. I've used it before to manage a small serverfarm. There is also a way to broadcast input to multiple terminals from a single terminal emulator in some modern emulators. The requirement within Apache Zeppelin will be slightly different. The cluster environment should have some level of auto-detection. I should not have to enter hostnames of individual instances if possible. Ideally, a hosts file on the cluster master can be provided as an argument to this interpreter so it can execute all commands that follow into the cluster. If I am using Yarn, the resource manager knows the nodes that are part of the Cluster. Similarly a Mesos backend may have something similar. Another simple way could be to use ansible which works over OpenSSH but that might be out of scope for a basic cluster SSH interpreter and not super useful for a typical Apache Zeppelin user.
PySpark itself provides only a SparkContext and interface to spark specific functions. In some deployment models, Spark is not aware always of the cluster it sits on eg: YARN / Mesos is the true cluster manager. On Sat, Oct 22, 2016 at 11:21 AM, Jun Kim <i2r....@gmail.com> wrote: > Hi Prasanna Santhanam > > As far as I know, there is no cluster-ssh interpreter Zeppelin > provides.(If not, please someone let me know) > > In my case, I use *clusterssh(cssh).* > > The screenshot below is it.(Copied from the Internet) > > There is another tool called parallel-ssh(pssh), but I prefer cssh. Since > I can watch every node's output. > > Or, maybe you can consider building *NFS(Network File System). *So that > every node has same Python environment. > > But actually, the two solutions above have a lot to do. > > Is there any other way just using PySpark features? Please help if there > is someone knows. > > By the way, I think cluster-ssh interpreter is a cool feature. > > > > 2016년 10월 22일 (토) 오후 12:31, Prasanna Santhanam <t...@apache.org>님이 작성: > >> Hello All, >> >> I've been using Apache Zeppelin against Apache Spark clusters and with >> PySpark. One of the things I often tend to do is install libraries and >> packages on my cluster. For instance I would like numpy, scipy and other >> data science libraries present on my cluster for data analysis. However, >> the %sh interpreter only works on my Zeppelin host for any pip install >> commands. >> >> - How are other users tackling this problem? >> - Do you have a base set of libraries always installed? >> - Is there a clustered shell interpreter over SSH that Apache Zeppelin >> provides? >> *(*I looked but didn't find any issues/pull requests related to this >> ask*)* >> >> Thanks, >> > -- > Taejun Kim > > Data Mining Lab. > School of Electrical and Computer Engineering > University of Seoul >