that DStreams are some sort of different type of RDDs
From: Tathagata Das [mailto:t...@databricks.com]
Sent: Wednesday, April 15, 2015 11:11 PM
To: Evo Eftimov
Cc: user
Subject: Re: RAM management during cogroup and join
Well, DStream joins are nothing but RDD joins at its core. However
gt; *From:* Tathagata Das [mailto:t...@databricks.com]
> *Sent:* Wednesday, April 15, 2015 9:48 PM
>
> *To:* Evo Eftimov
> *Cc:* user
> *Subject:* Re: RAM management during cogroup and join
>
>
>
> Agreed.
>
>
>
> On Wed, Apr 15, 2015 at 1:29 PM, Evo Eftimov
Subject: Re: RAM management during cogroup and join
Agreed.
On Wed, Apr 15, 2015 at 1:29 PM, Evo Eftimov wrote:
That has been done Sir and represents further optimizations – the objective
here was to confirm whether cogroup always results in the previously described
“greedy” explosion of
5 9:25 PM
> *To:* Evo Eftimov
> *Cc:* user
> *Subject:* Re: RAM management during cogroup and join
>
>
>
> Significant optimizations can be made by doing the joining/cogroup in a
> smart way. If you have to join streaming RDDs with the same batch RDD, then
> you can first par
change the total number of elements
included in the result RDD and RAM allocated – right?
From: Tathagata Das [mailto:t...@databricks.com]
Sent: Wednesday, April 15, 2015 9:25 PM
To: Evo Eftimov
Cc: user
Subject: Re: RAM management during cogroup and join
Significant optimizations can be made
Significant optimizations can be made by doing the joining/cogroup in a
smart way. If you have to join streaming RDDs with the same batch RDD, then
you can first partition the batch RDDs using a partitions and cache it, and
then use the same partitioner on the streaming RDDs. That would make sure
t