Thanks.

I tried persist() on the RDD. The runtimes appear to have doubled now
(without persist() it was ~7s per iteration and now its ~15s). I am running
standalone Spark on a 8-core machine.
Any thoughts on why the increase in runtime?

On Thu, Feb 26, 2015 at 4:27 PM, Imran Rashid <iras...@cloudera.com> wrote:

>
> val grouped = R.groupBy[VertexId](G).persist(StorageLeve.MEMORY_ONLY_SER)
> // or whatever persistence makes more sense for you ...
> while(true) {
>   val res = grouped.flatMap(F)
>   res.collect.foreach(func)
>   if(criteria)
>      break
> }
>
> On Thu, Feb 26, 2015 at 10:56 AM, Vijayasarathy Kannan <kvi...@vt.edu>
> wrote:
>
>> Hi,
>>
>> I have the following use case.
>>
>> (1) I have an RDD of edges of a graph (say R).
>> (2) do a groupBy on R (by say source vertex) and call a function F on
>> each group.
>> (3) collect the results from Fs and do some computation
>> (4) repeat the above steps until some criteria is met
>>
>> In (2), the groups are always going to be the same (since R is grouped by
>> source vertex).
>>
>> Question:
>> Is R distributed every iteration (when in (2)) or is it distributed only
>> once when it is created?
>>
>> A sample code snippet is below.
>>
>> while(true) {
>>   val res = R.groupBy[VertexId](G).flatMap(F)
>>   res.collect.foreach(func)
>>   if(criteria)
>>      break
>> }
>>
>> Since the groups remain the same, what is the best way to go about
>> implementing the above logic?
>>
>
>

Reply via email to