Re: Akka usage in Spark

DB Tsai Wed, 20 Aug 2014 15:29:20 -0700

To be specific, I was discussing this PR with Debasish which reduces
lots of issues when sending big objects to executors without using
broadcast explicitly.


Broadcast RDD object once per TaskSet (instead of sending it for every task)
https://issues.apache.org/jira/browse/SPARK-2521

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Wed, Aug 20, 2014 at 3:19 PM, Debasish Das <[email protected]> wrote:
> Hi Patrick,
>
> Last few days I came across some bugs which got exposed due to ALS runs on
> large scale data...although it was not related to the akka changes but
> during the debug I found across some akka related changes that might have
> an impact of overall performance...one example is the following:
>
> https://github.com/apache/spark/pull/1907
>
> @dbtsai explained it to me a bit yesterday that in 1.1 RDDs are no longer
> sent through akka msgs but over http-channels...If there is a document
> detailing the architecture that is currently in-place (like how the core
> changed from 1.0 to 1.1) it will help a lot in debugging the jobs which are
> built upon the libraries like mllib and optimize them further for
> efficiency...
>
> For using the Spark actor system directly:
>
> I spent few weeks December 2013 to make the Scalafish code (
> https://github.com/azymnis/scalafish) operational on 10 nodes...It uses
> scalding for matrix partitioning and actorSystem to coordinate the
> updates...It is a cool use of akka but getting an actor system operational
> is difficult...
>
> Since Spark already has tested version of actor system running on both
> standalone and yarn modes, I am planning to port scalafish to spark using
> actor model...That's one of the use-cases I am looking for...
>
> Another use-case that I am considering is to send msgs directly from kafka
> queues to spark actorSystem for processing to get Storm like
> latency...basically window sizes of 1-2 ms and no overhead of using an RDD
> if possible...
>
> Thanks.
> Deb
>
>
> On Wed, Aug 20, 2014 at 1:42 PM, Patrick Wendell <[email protected]> wrote:
>
>> Hey Deb,
>>
>> Can you be specific what changes you are mentioning? We have not, to my
>> knowledge, made major architectural changes around akka use.
>>
>> I think in general we don't want people to be using Spark's actor system
>> directly - it is an internal communication component in Spark and could
>> e.g. be re-factored later to not use akka at all. Could you elaborate a bit
>> more on your use case?
>>
>> - Patrick
>>
>>
>> On Wed, Aug 20, 2014 at 9:02 AM, Debasish Das <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> There have been some recent changes in the way akka is used in spark and I
>>> feel they are major changes...
>>>
>>> Is there a design document / JIRA / experiment on large datasets that
>>> highlight the impact of changes (1.0 vs 1.1) ? Basically it will be great
>>> to understand where akka is used in the code base...
>>>
>>> If I don't have to broadcast big variables but use akka's programming
>>> model
>>> (use actors directly) on Spark's actorsystem is that allowed ? I
>>> understand
>>> that it might look hacky :-)
>>>
>>> Thanks.
>>> Deb
>>>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Akka usage in Spark

Reply via email to