Hey Roberto, You will likely want to use a cogroup() then, but it hinges all on how your data looks, i.e. if you have the index in the key. Here's an example: http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#cogroup .
Clone: RDDs are immutable, so if you need to make changes to it, those will result in a new RDD. Best, -Sven On Fri, Apr 24, 2015 at 4:49 PM, Pagliari, Roberto <rpagli...@appcomsci.com> wrote: > Hi, > > I may need to read many values. The list [0,4,5,6,8] is the locations of > the rows I’d like to extract from the RDD (of labledPoints). Could you > possibly provide a quick example? > > > > Also, I’m not quite sure how this work, but the resulting RDD should be a > clone, as I may need to modify the values and preserve the original ones. > > > > Thank you, > > > > > > *From:* Sven Krasser [mailto:kras...@gmail.com] > *Sent:* Friday, April 24, 2015 5:56 PM > *To:* Pagliari, Roberto > *Cc:* user@spark.apache.org > *Subject:* Re: indexing an RDD [Python] > > > > The solution depends largely on your use case. I assume the index is in > the key. In that case, you can make a second RDD out of the list of indices > and then use cogroup() on both. > > If the list of indices is small, just using filter() will work well. > > If you need to read back a few select values to the driver, take a look at > lookup(). > > > > On Fri, Apr 24, 2015 at 1:51 PM, Pagliari, Roberto < > rpagli...@appcomsci.com> wrote: > > I have an RDD of LabledPoints. > Is it possible to select a subset of it based on a list of indeces? > For example with idx=[0,4,5,6,8], I'd like to be able to create a new RDD > with elements 0,4,5,6 and 8. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > > -- > > www.skrasser.com <http://www.skrasser.com/?utm_source=sig> > -- www.skrasser.com <http://www.skrasser.com/?utm_source=sig>