Re: Spark enables us to process Big Data on an ARM cluster !!

Eustache DIEMERT Thu, 20 Mar 2014 07:41:31 -0700

Hey, do you have a blog post or url I can share ?

This is a quite cool experiment !


E/


2014-03-20 15:01 GMT+01:00 Chanwit Kaewkasi <chan...@gmail.com>:

> Hi Chester,
>
> It is on our todo-list but it doesn't work at the moment. The
> Parallela cores can not be utilized by the JVM. So, Spark will just
> use its ARM cores. We'll be looking at Parallela again when the JVM
> supports it.
>
> Best regards,
>
> -chanwit
>
> --
> Chanwit Kaewkasi
> linkedin.com/in/chanwit
>
>
> On Thu, Mar 20, 2014 at 8:52 PM, Chester <chesterxgc...@yahoo.com> wrote:
> > I am curious  to see if you have tried on Parallela supercomputer (16 or
> 64 cores) cluster, run spark on that should be fun.
> >
> > Chester
> >
> > Sent from my iPad
> >
> > On Mar 19, 2014, at 9:18 AM, Chanwit Kaewkasi <chan...@gmail.com> wrote:
> >
> >> Hi Koert,
> >>
> >> There's some NAND flash built-in each node. We mount the NAND flash as
> >> a local directory for Spark to spill data out.
> >> A DZone article, also written by me, will tell more about the cluster.
> >> We really appreciate the design of Spark's RDD done by the Spark team.
> >> It turned out to be perfect for ARM clusters.
> >>
> >> http://www.dzone.com/articles/big-data-processing-arm-0
> >>
> >> Another great thing is that our cluster can operate at the room
> >> temperature (25C / 77F) too.
> >>
> >> The board is Cubieboard here it is:
> >> https://en.wikipedia.org/wiki/Cubieboard#Specification
> >>
> >> Best regards,
> >>
> >> -chanwit
> >>
> >> --
> >> Chanwit Kaewkasi
> >> linkedin.com/in/chanwit
> >>
> >>
> >> On Wed, Mar 19, 2014 at 9:43 PM, Koert Kuipers <ko...@tresata.com>
> wrote:
> >>> i dont know anything about arm clusters.... but it looks great. what
> are the
> >>> specs? the nodes have no local disk at all?
> >>>
> >>>
> >>> On Tue, Mar 18, 2014 at 10:36 PM, Chanwit Kaewkasi <chan...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> We are a small team doing a research on low-power (and low-cost) ARM
> >>>> clusters. We built a 20-node ARM cluster that be able to start Hadoop.
> >>>> But as all of you've known, Hadoop is performing on-disk operations,
> >>>> so it's not suitable for a constraint machine powered by ARM.
> >>>>
> >>>> We then switched to Spark and had to say wow!!
> >>>>
> >>>> Spark / HDFS enables us to crush Wikipedia articles (of year 2012) of
> >>>> size 34GB in 1h50m. We have identified the bottleneck and it's our
> >>>> 100M network.
> >>>>
> >>>> Here's the cluster:
> >>>>
> https://dl.dropboxusercontent.com/u/381580/aiyara_cluster/Mk-I_SSD.png
> >>>>
> >>>> And this is what we got from Spark's shell:
> >>>>
> https://dl.dropboxusercontent.com/u/381580/aiyara_cluster/result_00.png
> >>>>
> >>>> I think it's the first ARM cluster that can process a non-trivial size
> >>>> of Big Data.
> >>>> (Please correct me if I'm wrong)
> >>>> I really want to thank the Spark team that makes this possible !!
> >>>>
> >>>> Best regards,
> >>>>
> >>>> -chanwit
> >>>>
> >>>> --
> >>>> Chanwit Kaewkasi
> >>>> linkedin.com/in/chanwit
> >>>
> >>>
>

Re: Spark enables us to process Big Data on an ARM cluster !!

Reply via email to