Thanks, Mohit. It sounds like we're on the same page -- I used a similar
approach.
On Thu, Jul 2, 2015 at 12:27 PM, Mohit Jaggi wrote:
> if you are joining successive lines together based on a predicate, then
> you are doing a "flatMap" not an "aggregate". you are on the right track
> with a mu
if you are joining successive lines together based on a predicate, then you
are doing a "flatMap" not an "aggregate". you are on the right track with a
multi-pass solution. i had the same challenge when i needed a sliding
window over an RDD(see below).
[ i had suggested that the sliding window API
That's an interesting idea! I hadn't considered that. However, looking at
the Partitioner interface, I would need to know from looking at a single
key which doesn't fit my case, unfortunately. For my case, I need to
compare successive pairs of keys. (I'm trying to re-join lines that were
split
could you use a custom partitioner to preserve boundaries such that all related
tuples end up on the same partition?
On Jun 30, 2015, at 12:00 PM, RJ Nowling wrote:
> Thanks, Reynold. I still need to handle incomplete groups that fall between
> partition boundaries. So, I need a two-pass appr
Thanks, Reynold. I still need to handle incomplete groups that fall
between partition boundaries. So, I need a two-pass approach. I came up
with a somewhat hacky way to handle those using the partition indices and
key-value pairs as a second pass after the first.
OCaml's std library provides a fu
Try mapPartitions, which gives you an iterator, and you can produce an
iterator back.
On Tue, Jun 30, 2015 at 11:01 AM, RJ Nowling wrote:
> Hi all,
>
> I have a problem where I have a RDD of elements:
>
> Item1 Item2 Item3 Item4 Item5 Item6 ...
>
> and I want to run a function over them to deci
Hi all,
I have a problem where I have a RDD of elements:
Item1 Item2 Item3 Item4 Item5 Item6 ...
and I want to run a function over them to decide which runs of elements to
group together:
[Item1 Item2] [Item3] [Item4 Item5 Item6] ...
Technically, I could use aggregate to do this, but I would h