Re: indexing an RDD [Python]

Sven Krasser Wed, 29 Apr 2015 13:17:54 -0700

Hey Roberto,

You will likely want to use a cogroup() then, but it hinges all on how your
data looks, i.e. if you have the index in the key. Here's an example:
http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#cogroup
.


Clone: RDDs are immutable, so if you need to make changes to it, those will
result in a new RDD.

Best,
-Sven


On Fri, Apr 24, 2015 at 4:49 PM, Pagliari, Roberto <rpagli...@appcomsci.com>
wrote:

> Hi,
>
> I may need to read many values. The list [0,4,5,6,8] is the locations of
> the rows I’d like to extract from the RDD (of labledPoints). Could you
> possibly provide a quick example?
>
>
>
> Also, I’m not quite sure how this work, but the resulting RDD should be a
> clone, as I may need to modify the values and preserve the original ones.
>
>
>
> Thank you,
>
>
>
>
>
> *From:* Sven Krasser [mailto:kras...@gmail.com]
> *Sent:* Friday, April 24, 2015 5:56 PM
> *To:* Pagliari, Roberto
> *Cc:* user@spark.apache.org
> *Subject:* Re: indexing an RDD [Python]
>
>
>
> The solution depends largely on your use case. I assume the index is in
> the key. In that case, you can make a second RDD out of the list of indices
> and then use cogroup() on both.
>
> If the list of indices is small, just using filter() will work well.
>
> If you need to read back a few select values to the driver, take a look at
> lookup().
>
>
>
> On Fri, Apr 24, 2015 at 1:51 PM, Pagliari, Roberto <
> rpagli...@appcomsci.com> wrote:
>
> I have an RDD of LabledPoints.
> Is it possible to select a subset of it based on a list of indeces?
> For example with idx=[0,4,5,6,8], I'd like to be able to create a new RDD
> with elements 0,4,5,6 and 8.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
> --
>
> www.skrasser.com <http://www.skrasser.com/?utm_source=sig>
>



-- 
www.skrasser.com <http://www.skrasser.com/?utm_source=sig>

Re: indexing an RDD [Python]

Reply via email to