Hi all,
I perform sampling on a DStream by taking samples from RDDs in the DStream.
I have used two sampling mechanisms: simple random sampling and stratified
sampling.
Simple random sampling: inputStream.transform(x => x.sample(false,
fraction)).
Stratified sampling: inputStream.transform(x =>
How to do that? if I put the queue inside .transform operation, it
doesn't work.
On Mon, Aug 1, 2016 at 6:43 PM, Cody Koeninger wrote:
> Can you keep a queue per executor in memory?
>
> On Mon, Aug 1, 2016 at 11:24 AM, Martin Le
> wrote:
> > Hi Cody and all,
> >
balanced.
>
> But once you've read the messages, nothing's stopping you from
> filtering most of them out before doing further processing. The
> dstream .transform method will let you do any filtering / sampling you
> could have done on an rdd.
>
> On Fri, Jul 29,
Hi all,
I have to handle high-speed rate data stream. To reduce the heavy load, I
want to use sampling techniques for each stream window. It means that I
want to process a subset of data instead of whole window data. I saw Spark
support sampling operations for RDD, but for DStream, Spark supports