Thanks a lot for your reply.I have also worked it out by some other ways. In
fact, firstly I was thinking about using filter to do it but failed.
On Monday, November 9, 2015 9:52 PM, Akhil Das
<[email protected]> wrote:
There's multiple way to achieve this:
1. Read the N lines from the driver and then do a sc.parallelize(nlines) to
create an RDD out of it.2. Create an RDD with N+M, do a take on N and then
broadcast or parallelize the returning list.3. Something like this if the file
is in hdfs:
val n_f = (5,file_name) val n_lines = sc.parallelize(Array(n_f))
val n_linesRDD = n_lines.map(n => { //Read and return 5 lines (n._1) from
the file (n._2)
})
ThanksBest Regards
On Thu, Oct 29, 2015 at 9:51 PM, Zhiliang Zhu <[email protected]>
wrote:
Hi All,
There is some file with line number N + M,, as I need to read the first N lines
into one RDD .
1. i) read all the N + M lines as one RDD, ii) select the RDD's top N rows, may
be some one solution;2. if introduced some broadcast variable set N, then it is
used to decide while map the file RDD. Only map its first N rows, this may
notwork, however.
Is there some better solution?
Thank you,Zhiliang