How to do that? if I put the queue inside .transform operation, it doesn't work.
On Mon, Aug 1, 2016 at 6:43 PM, Cody Koeninger <c...@koeninger.org> wrote: > Can you keep a queue per executor in memory? > > On Mon, Aug 1, 2016 at 11:24 AM, Martin Le <martin.leq...@gmail.com> > wrote: > > Hi Cody and all, > > > > Thank you for your answer. I implement simple random sampling (SRS) for > > DStream using transform method, and it works fine. > > However, I have a problem when I implement reservoir sampling (RS). In > RS, I > > need to maintain a reservoir (a queue) to store selected data items > (RDDs). > > If I define a large stream window, the queue also increases and it > leads to > > the driver run out of memory. I explain my problem in detail here: > > > https://docs.google.com/document/d/1YBV_eHH6U_dVF1giiajuG4ayoVN5R8Qw3IthoKvW5ok > > > > Could you please give me some suggestions or advice to fix this problem? > > > > Thanks > > > > On Fri, Jul 29, 2016 at 6:28 PM, Cody Koeninger <c...@koeninger.org> > wrote: > >> > >> Most stream systems you're still going to incur the cost of reading > >> each message... I suppose you could rotate among reading just the > >> latest messages from a single partition of a Kafka topic if they were > >> evenly balanced. > >> > >> But once you've read the messages, nothing's stopping you from > >> filtering most of them out before doing further processing. The > >> dstream .transform method will let you do any filtering / sampling you > >> could have done on an rdd. > >> > >> On Fri, Jul 29, 2016 at 9:57 AM, Martin Le <martin.leq...@gmail.com> > >> wrote: > >> > Hi all, > >> > > >> > I have to handle high-speed rate data stream. To reduce the heavy > load, > >> > I > >> > want to use sampling techniques for each stream window. It means that > I > >> > want > >> > to process a subset of data instead of whole window data. I saw Spark > >> > support sampling operations for RDD, but for DStream, Spark supports > >> > sampling operation as well? If not, could you please give me a > >> > suggestion > >> > how to implement it? > >> > > >> > Thanks, > >> > Martin > > > > >