Put the queue in a static variable that is first referenced on the
workers (inside an rdd closure). That way it will be created on each
of the workers, not the driver.
Easiest way to do that is with a lazy val in a companion object.
On Mon, Aug 1, 2016 at 3:22 PM, Martin Le wrote:
> How to do t
How to do that? if I put the queue inside .transform operation, it
doesn't work.
On Mon, Aug 1, 2016 at 6:43 PM, Cody Koeninger wrote:
> Can you keep a queue per executor in memory?
>
> On Mon, Aug 1, 2016 at 11:24 AM, Martin Le
> wrote:
> > Hi Cody and all,
> >
> > Thank you for your answer. I
Can you keep a queue per executor in memory?
On Mon, Aug 1, 2016 at 11:24 AM, Martin Le wrote:
> Hi Cody and all,
>
> Thank you for your answer. I implement simple random sampling (SRS) for
> DStream using transform method, and it works fine.
> However, I have a problem when I implement reservoir
Hi Cody and all,
Thank you for your answer. I implement simple random sampling (SRS) for
DStream using transform method, and it works fine.
However, I have a problem when I implement reservoir sampling (RS). In RS,
I need to maintain a reservoir (a queue) to store selected data items
(RDDs). If I
Most stream systems you're still going to incur the cost of reading
each message... I suppose you could rotate among reading just the
latest messages from a single partition of a Kafka topic if they were
evenly balanced.
But once you've read the messages, nothing's stopping you from
filtering most