Hi Jun Kim - Cluster SSH is cool.

I've used it before to manage a small serverfarm. There is also a way to
broadcast input to multiple terminals from a single terminal emulator in
some modern emulators. The requirement within Apache Zeppelin will be
slightly different. The cluster environment should have some level of
auto-detection. I should not have to enter hostnames of individual
instances if possible. Ideally, a hosts file on the cluster master can be
provided as an argument to this interpreter so it can execute all commands
that follow into the cluster. If I am using Yarn, the resource manager
knows the nodes that are part of the Cluster. Similarly a Mesos backend may
have something similar. Another simple way could be to use ansible which
works over OpenSSH but that might be out of scope for a basic cluster SSH
interpreter and not super useful for a typical Apache Zeppelin user.

PySpark itself provides only a SparkContext and interface to spark specific
functions. In some deployment models, Spark is not aware always of the
cluster it sits on eg: YARN / Mesos is the true cluster manager.



On Sat, Oct 22, 2016 at 11:21 AM, Jun Kim <i2r....@gmail.com> wrote:

> Hi Prasanna Santhanam
>
> As far as I know, there is no cluster-ssh interpreter Zeppelin
> provides.(If not, please someone let me know)
>
> In my case, I use *clusterssh(cssh).*
>
> The screenshot below is it.(Copied from the Internet)
>
> There is another tool called parallel-ssh(pssh), but I prefer cssh. Since
> I can watch every node's output.
>
> Or, maybe you can consider building *NFS(Network File System). *So that
> every node has same Python environment.
>
> But actually, the two solutions above have a lot to do.
>
> Is there any other way just using PySpark features? Please help if there
> is someone knows.
>
> By the way, I think cluster-ssh interpreter is a cool feature.
>
>
>
> 2016년 10월 22일 (토) 오후 12:31, Prasanna Santhanam <t...@apache.org>님이 작성:
>
>> Hello All,
>>
>> I've been using Apache Zeppelin against Apache Spark clusters and with
>> PySpark. One of the things I often tend to do is install libraries and
>> packages on my cluster. For instance I would like numpy, scipy and other
>> data science libraries present on my cluster for data analysis. However,
>> the %sh interpreter only works on my Zeppelin host for any pip install
>> commands.
>>
>> - How are other users tackling this problem?
>> - Do you have a base set of libraries always installed?
>> - Is there a clustered shell interpreter over SSH that Apache Zeppelin
>> provides?
>> *(*I looked but didn't find any issues/pull requests related to this
>> ask*)*
>>
>> Thanks,
>>
> --
> Taejun Kim
>
> Data Mining Lab.
> School of Electrical and Computer Engineering
> University of Seoul
>

Reply via email to