Re: Segmented fold count

2014-08-20 Thread fil
ion on a partition > is a task, which will be sent to a node in the cluster. > So is a partition sometimes a chunk of data that relates to a single key - or is this only ever by coincidence? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Segmented-fold-count-tp12278p12478.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Segmented fold count

2014-08-18 Thread Davies Liu
ition is a task, which will be sent to a node in the cluster. > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Segmented-fold-count-tp12278p12342.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > -

Re: Segmented fold count

2014-08-18 Thread fil
g of partition: work segmentation, and key groups. Care to clarify anyone - when are partitions used to describe chunks of data for different nodes in the cluster (ie. large), and when are they used to describe groups of items in data (ie. small).. -- View this message in context: http://apache-spa

Re: Segmented fold count

2014-08-18 Thread fil
7;s the Spark'iest way to do this efficiently? :) Regards, Fil. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Segmented-fold-count-tp12278p12295.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Segmented fold count

2014-08-17 Thread Davies Liu
; On Sun, Aug 17, 2014 at 10:34 PM, fil wrote: >> > Can anyone assist with a scan of the following kind (Python preferred, >> > but >> > whatever..)? I'm looking for a kind of segmented fold count. >> > >> > Input: [1,1,1,2,2,3,4,4,5,1] >> > Output:

Re: Segmented fold count

2014-08-17 Thread Andrew Ash
If you have an RDD, you can use RDD.mapPartitions(groupCount).collect() > > On Sun, Aug 17, 2014 at 10:34 PM, fil wrote: > > Can anyone assist with a scan of the following kind (Python preferred, > but > > whatever..)? I'm looking for a kind of segmented fold count

Re: Segmented fold count

2014-08-17 Thread Davies Liu
(n, sum(1 for _ in it)), gs) If you have an RDD, you can use RDD.mapPartitions(groupCount).collect() On Sun, Aug 17, 2014 at 10:34 PM, fil wrote: > Can anyone assist with a scan of the following kind (Python preferred, but > whatever..)? I'm looking for a kind of segmented fold c

Segmented fold count

2014-08-17 Thread fil
Can anyone assist with a scan of the following kind (Python preferred, but whatever..)? I'm looking for a kind of segmented fold count. Input: [1,1,1,2,2,3,4,4,5,1] Output: [(1,3), (2, 2), (3, 1), (4, 2), (5, 1), (1,1)] or preferably two output columns: id: [1,2,3,4,5,1] count: [3,2,1,