ion on a partition
> is a task, which will be sent to a node in the cluster.
>
So is a partition sometimes a chunk of data that relates to a single key -
or is this only ever by coincidence?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Segmented-fold-count-tp12278p12478.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
ition
is a task, which will be sent to a node in the cluster.
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Segmented-fold-count-tp12278p12342.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
g of partition: work
segmentation, and key groups. Care to clarify anyone - when are partitions
used to describe chunks of data for different nodes in the cluster (ie.
large), and when are they used to describe groups of items in data (ie.
small)..
--
View this message in context:
http://apache-spa
7;s the Spark'iest way to do this efficiently? :)
Regards, Fil.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Segmented-fold-count-tp12278p12295.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
; On Sun, Aug 17, 2014 at 10:34 PM, fil wrote:
>> > Can anyone assist with a scan of the following kind (Python preferred,
>> > but
>> > whatever..)? I'm looking for a kind of segmented fold count.
>> >
>> > Input: [1,1,1,2,2,3,4,4,5,1]
>> > Output:
If you have an RDD, you can use RDD.mapPartitions(groupCount).collect()
>
> On Sun, Aug 17, 2014 at 10:34 PM, fil wrote:
> > Can anyone assist with a scan of the following kind (Python preferred,
> but
> > whatever..)? I'm looking for a kind of segmented fold count
(n, sum(1 for _ in it)), gs)
If you have an RDD, you can use RDD.mapPartitions(groupCount).collect()
On Sun, Aug 17, 2014 at 10:34 PM, fil wrote:
> Can anyone assist with a scan of the following kind (Python preferred, but
> whatever..)? I'm looking for a kind of segmented fold c
Can anyone assist with a scan of the following kind (Python preferred, but
whatever..)? I'm looking for a kind of segmented fold count.
Input: [1,1,1,2,2,3,4,4,5,1]
Output: [(1,3), (2, 2), (3, 1), (4, 2), (5, 1), (1,1)]
or preferably two output columns:
id: [1,2,3,4,5,1]
count: [3,2,1,