Thanks a lot for your reply.I have also worked it out by some other ways. In 
fact, firstly I was thinking about using filter to do it but failed.  


     On Monday, November 9, 2015 9:52 PM, Akhil Das 
<ak...@sigmoidanalytics.com> wrote:
   

 ​There's multiple way to achieve this:
1. Read the N lines from the driver and then do a sc.parallelize(nlines) to 
create an RDD out of it.2. Create an RDD with N+M, do a take on N and then 
broadcast or parallelize the returning list.3. Something like this if the file 
is in hdfs:
    val n_f = (5,file_name)     val n_lines = sc.parallelize(Array(n_f))     
val n_linesRDD = n_lines.map(n => {     //Read and return 5 lines (n._1) from 
the file (n._2)
     }) ​
ThanksBest Regards
On Thu, Oct 29, 2015 at 9:51 PM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> 
wrote:

Hi All,
There is some file with line number N + M,, as I need to read the first N lines 
into one RDD .
1. i) read all the N + M lines as one RDD, ii) select the RDD's top N rows, may 
be some one solution;2. if introduced some broadcast variable set N, then it is 
used to decide while map the file RDD. Only map its first N rows, this may 
notwork, however.
Is there some better solution?
Thank you,Zhiliang




  

Reply via email to