Re-reading your description - I guess you could potentially make your input source to connect for 10 seconds, pause for 50 and then reconnect.
On Thu, Aug 6, 2015 at 10:32 AM, Dimitris Kouzis - Loukas <look...@gmail.com > wrote: > Hi, - yes - it's great that you wrote it yourself - it means you have more > control. I have the feeling that the most efficient point to discard as > much data as possible - or even modify your subscription protocol to - your > spark input source - not even receive the other 50 seconds of data is the > most efficient point. After you deliver data to DStream - you might filter > them as much as you want - but you will still be subject to garbage > collection and/or potential shuffles/and HDD checkpoints. > > On Thu, Aug 6, 2015 at 1:31 AM, Heath Guo <heath...@fb.com> wrote: > >> Hi Dimitris, >> >> Thanks for your reply. Just wondering – are you asking about my streaming >> input source? I implemented a custom receiver and have been using that. >> Thanks. >> >> From: Dimitris Kouzis - Loukas <look...@gmail.com> >> Date: Wednesday, August 5, 2015 at 5:27 PM >> To: Heath Guo <heath...@fb.com> >> Cc: "user@spark.apache.org" <user@spark.apache.org> >> Subject: Re: Pause Spark Streaming reading or sampling streaming data >> >> What driver do you use? Sounds like something you should do before the >> driver... >> >> On Thu, Aug 6, 2015 at 12:50 AM, Heath Guo <heath...@fb.com> wrote: >> >>> Hi, I have a question about sampling Spark Streaming data, or getting >>> part of the data. For every minute, I only want the data read in during the >>> first 10 seconds, and discard all data in the next 50 seconds. Is there any >>> way to pause reading and discard data in that period? I'm doing this to >>> sample from a stream of huge amount of data, which saves processing time in >>> the real-time program. Thanks! >>> >> >> >