For testing purposes you can take a sample of your data with take() and then 
transform that smaller dataset into an rdd.

-----Original Message-----
From: Tim Chou [timchou....@gmail.com<mailto:timchou....@gmail.com>]
Sent: Thursday, November 13, 2014 06:41 PM Eastern Standard Time
To: Ganelin, Ilya
Subject: Re: Spark- How can I run MapReduce only on one partition in an RDD?

Hi Ganelin,
Thank you for your reply. I can actually get partitions information with 
partitions(). But I cannot change partition to a new RDD which I want to use.

I know it doesn't make sense if I only want to use one partition but create a 
large RDD.
I just want to map each partition one by one. So I can quickly get the early 
map result from the RDD.

That's why I want to read a file on HDFS to create multiple RDDs.

Any suggestions?

Thanks,
Tim

2014-11-13 17:05 GMT-06:00 Ganelin, Ilya 
<ilya.gane...@capitalone.com<mailto:ilya.gane...@capitalone.com>>:
Why do you only want the third partition? You can access individual partitions 
using the partitions() function. You can also filter your data using the 
filter() function to only contain the data you care about. Moreover, when you 
create your RDDs unless you define a custom partitioner you have no way of 
controlling what data is in partition #3. Therefore, there is almost no reason 
to want to operate on an individual partition.


-----Original Message-----
From: Tim Chou [timchou....@gmail.com<mailto:timchou....@gmail.com>]
Sent: Thursday, November 13, 2014 06:01 PM Eastern Standard Time
To: u...@spark.apache.org<mailto:u...@spark.apache.org>
Subject: Spark- How can I run MapReduce only on one partition in an RDD?

Hi All,

I use textFile to create a RDD. However, I don't want to handle the whole data 
in this RDD. For example, maybe I only want to solve the data in 3rd partition 
of the RDD.

How can I do it? Here are some possible solutions that I'm thinking:
1. Create multiple RDDs when reading the file
2.  Run MapReduce functions with the specific partition for an RDD.

However, I cannot find any appropriate function.

Thank you and wait for your suggestions.

Best,
Tim

________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Reply via email to