Thanks a lot for your reply.I have also worked it out by some other ways. In fact, firstly I was thinking about using filter to do it but failed.
On Monday, November 9, 2015 9:52 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: There's multiple way to achieve this: 1. Read the N lines from the driver and then do a sc.parallelize(nlines) to create an RDD out of it.2. Create an RDD with N+M, do a take on N and then broadcast or parallelize the returning list.3. Something like this if the file is in hdfs: val n_f = (5,file_name) val n_lines = sc.parallelize(Array(n_f)) val n_linesRDD = n_lines.map(n => { //Read and return 5 lines (n._1) from the file (n._2) }) ThanksBest Regards On Thu, Oct 29, 2015 at 9:51 PM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: Hi All, There is some file with line number N + M,, as I need to read the first N lines into one RDD . 1. i) read all the N + M lines as one RDD, ii) select the RDD's top N rows, may be some one solution;2. if introduced some broadcast variable set N, then it is used to decide while map the file RDD. Only map its first N rows, this may notwork, however. Is there some better solution? Thank you,Zhiliang