Can anyone assist with a scan of the following kind (Python preferred, but whatever..)? I'm looking for a kind of segmented fold count.
Input: [1,1,1,2,2,3,4,4,5,1] Output: [(1,3), (2, 2), (3, 1), (4, 2), (5, 1), (1,1)] or preferably two output columns: id: [1,2,3,4,5,1] count: [3,2,1,2,1,1] I can use a groupby/count, except for the fact that I just want to scan - not resort. Ideally this would be as low-level as possible and perform in a simple single scan. It also needs to retain the original sort order. Thoughts? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Segmented-fold-count-tp12278.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org