[Pharo-users] Pharo and Hadoop

p...@highoctane.be Wed, 29 Apr 2015 05:58:43 -0700

I am involved in some Hadoop deployments and there is a very interesting
possiblity for Pharo in that ecosystem.


Namely, there is a YARN thing in there which is a scheduler for
distributing computing on a cluster of nodes.

It is possible to deploy all kinds of technologies on the nodes (e.g.
Python, R, Java) and Pharo images and VM (in headless mode) could be
deployed as well.

The deployed node can communicate back to what is called an
AppllicationManager via REST callbacks (easy game in Pharo). There is also
a C API (now, this is FFI or a plugin -
http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html)

There is also an Hadoop component named ZooKeeper that focuses on acting as
a distributed configuration repository.

One can talk to it with REST too (
https://github.com/apache/zookeeper/tree/trunk/src/contrib/rest)

Given the fact that we also can use some Java calls (using the JNI module
with 32-bits Java), we can integrate well enough on YARN I'd say.

There is also another project which is very nice and this is SLIDER (on
YARN).
This is about deploying stuff in an elastic way, (see
http://slider.incubator.apache.org/)

The next logical thing is to have docker containers (containing a pharo
stack) deployed dynamically on the cluster using Slider (like this:
http://www.slideshare.net/hortonworks/docker-on-slider-45493303)

First step here would be to have a basic YARN-Pharo application and a PoC
for talking to ZooKeeper.

This would open interesting gates for Pharo given its strengths.
Even more when we'll get a 64-bit VM.

What is cool with Pharo is that an image can be very small and self
containing vs Java application (which have tons of Jar files attached).

Access to the data on the HDFS thing can happen through NFSv3 so, we can go
that route.
There is also a REST API to it (
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html)

Tell me what you think!

Phil

[Pharo-users] Pharo and Hadoop

Reply via email to