Doesn’t that just read in all the values? The count isn’t pre-computed? It’s not the end of the world if it’s not but would be faster.
On Mon, Jan 12, 2015 at 8:09 PM, Ganelin, Ilya <[email protected]> wrote: > Use the mapPartitions function. It returns an iterator to each > partition. Then just get that length by converting to an array. > > > > Sent with Good (www.good.com) > > > > -----Original Message----- > *From: *Kevin Burton [[email protected]] > *Sent: *Monday, January 12, 2015 09:55 PM Eastern Standard Time > *To: *[email protected] > *Subject: *quickly counting the number of rows in a partition? > > Is there a way to compute the total number of records in each RDD > partition? > > So say I had 4 partitions.. I’d want to have > > partition 0: 100 records > partition 1: 104 records > partition 2: 90 records > partition 3: 140 records > > Kevin > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> > > ------------------------------ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the > intended recipient, you are hereby notified that any review, > retransmission, dissemination, distribution, copying or other use of, or > taking of any action in reliance upon this information is strictly > prohibited. If you have received this communication in error, please > contact the sender and delete the material from your computer. > -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com>
