Re: Akka usage in Spark

Mayur Rustagi Thu, 21 Aug 2014 00:06:12 -0700

The stream receiver seems to leverage actor receivers
     http://spark.apache.org/docs/0.8.1/streaming-custom-receivers.html
But spark system doesnt lend itself to a messaging kind of a structure..
more of a DAG kind
Just curious are you looking for the actor subsystem to act on messages or
just looking to use them as a local/distributed message bus


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Aug 21, 2014 at 4:04 AM, Debasish Das <debasish.da...@gmail.com>
wrote:

> Yeah that's the one we discussed...sorry I pointed to a different one that
> I was reading...
>
>
> On Wed, Aug 20, 2014 at 3:28 PM, DB Tsai <dbt...@dbtsai.com> wrote:
>
> > To be specific, I was discussing this PR with Debasish which reduces
> > lots of issues when sending big objects to executors without using
> > broadcast explicitly.
> >
> > Broadcast RDD object once per TaskSet (instead of sending it for every
> > task)
> > https://issues.apache.org/jira/browse/SPARK-2521
> >
> > Sincerely,
> >
> > DB Tsai
> > -------------------------------------------------------
> > My Blog: https://www.dbtsai.com
> > LinkedIn: https://www.linkedin.com/in/dbtsai
> >
> >
> > On Wed, Aug 20, 2014 at 3:19 PM, Debasish Das <debasish.da...@gmail.com>
> > wrote:
> > > Hi Patrick,
> > >
> > > Last few days I came across some bugs which got exposed due to ALS runs
> > on
> > > large scale data...although it was not related to the akka changes but
> > > during the debug I found across some akka related changes that might
> have
> > > an impact of overall performance...one example is the following:
> > >
> > > https://github.com/apache/spark/pull/1907
> > >
> > > @dbtsai explained it to me a bit yesterday that in 1.1 RDDs are no
> longer
> > > sent through akka msgs but over http-channels...If there is a document
> > > detailing the architecture that is currently in-place (like how the
> core
> > > changed from 1.0 to 1.1) it will help a lot in debugging the jobs which
> > are
> > > built upon the libraries like mllib and optimize them further for
> > > efficiency...
> > >
> > > For using the Spark actor system directly:
> > >
> > > I spent few weeks December 2013 to make the Scalafish code (
> > > https://github.com/azymnis/scalafish) operational on 10 nodes...It
> uses
> > > scalding for matrix partitioning and actorSystem to coordinate the
> > > updates...It is a cool use of akka but getting an actor system
> > operational
> > > is difficult...
> > >
> > > Since Spark already has tested version of actor system running on both
> > > standalone and yarn modes, I am planning to port scalafish to spark
> using
> > > actor model...That's one of the use-cases I am looking for...
> > >
> > > Another use-case that I am considering is to send msgs directly from
> > kafka
> > > queues to spark actorSystem for processing to get Storm like
> > > latency...basically window sizes of 1-2 ms and no overhead of using an
> > RDD
> > > if possible...
> > >
> > > Thanks.
> > > Deb
> > >
> > >
> > > On Wed, Aug 20, 2014 at 1:42 PM, Patrick Wendell <pwend...@gmail.com>
> > wrote:
> > >
> > >> Hey Deb,
> > >>
> > >> Can you be specific what changes you are mentioning? We have not, to
> my
> > >> knowledge, made major architectural changes around akka use.
> > >>
> > >> I think in general we don't want people to be using Spark's actor
> system
> > >> directly - it is an internal communication component in Spark and
> could
> > >> e.g. be re-factored later to not use akka at all. Could you elaborate
> a
> > bit
> > >> more on your use case?
> > >>
> > >> - Patrick
> > >>
> > >>
> > >> On Wed, Aug 20, 2014 at 9:02 AM, Debasish Das <
> debasish.da...@gmail.com
> > >
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> There have been some recent changes in the way akka is used in spark
> > and I
> > >>> feel they are major changes...
> > >>>
> > >>> Is there a design document / JIRA / experiment on large datasets that
> > >>> highlight the impact of changes (1.0 vs 1.1) ? Basically it will be
> > great
> > >>> to understand where akka is used in the code base...
> > >>>
> > >>> If I don't have to broadcast big variables but use akka's programming
> > >>> model
> > >>> (use actors directly) on Spark's actorsystem is that allowed ? I
> > >>> understand
> > >>> that it might look hacky :-)
> > >>>
> > >>> Thanks.
> > >>> Deb
> > >>>
> > >>
> > >>
> >
>

Re: Akka usage in Spark

Reply via email to