Re: window every n elements instead of time based

2014-10-07 Thread Michael Allman
Yes, I meant batch interval. Thanks for clarifying. Cheers, Michael On Oct 7, 2014, at 11:14 PM, jayant [via Apache Spark User List] wrote: > Hi Michael, > > I think you are meaning batch interval instead of windowing. It can be > helpful for cases when you do not want to process very smal

Re: window every n elements instead of time based

2014-10-07 Thread Jayant Shekhar
Hi Michael, I think you are meaning batch interval instead of windowing. It can be helpful for cases when you do not want to process very small batch sizes. HDFS sink in Flume has the concept of rolling files based on time, number of events or size. https://flume.apache.org/FlumeUserGuide.html#hd

Re: window every n elements instead of time based

2014-10-07 Thread Michael Allman
Hi Andrew, The use case I have in mind is batch data serialization to HDFS, where sizing files to a certain HDFS block size is desired. In my particular use case, I want to process 10GB batches of data at a time. I'm not sure this is a sensible use case for spark streaming, and I was trying to

Re: window every n elements instead of time based

2014-10-05 Thread Andrew Ash
Hi Michael, I couldn't find anything in Jira for it -- https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20%22window%22%20AND%20component%20%3D%20Streaming Could you or Adrian please file a Jira ticket explaining the functionality and maybe a proposed API? This wi

Re: window every n elements instead of time based

2014-10-03 Thread Michael Allman
Hi, I also have a use for count-based windowing. I'd like to process data batches by size as opposed to time. Is this feature on the development roadmap? Is there a JIRA ticket for it? Thank you, Michael -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/win

Re: window every n elements instead of time based

2014-02-27 Thread Tathagata Das
go by window count like moving average. > > > > Thanks > > A > > > > *From:* Tathagata Das [mailto:tathagata.das1...@gmail.com] > *Sent:* February-26-14 2:05 PM > *To:* user@spark.apache.org > *Cc:* u...@spark.incubator.apache.org > *Subject:* Re: window every

RE: window every n elements instead of time based

2014-02-27 Thread Adrian Mocanu
: u...@spark.incubator.apache.org Subject: Re: window every n elements instead of time based Currently, all in-built DStream operation is time-based windowing. We may provide count-based windowing in the future. On Wed, Feb 26, 2014 at 9:34 AM, Adrian Mocanu mailto:amoc...@verticalscope.com

Re: window every n elements instead of time based

2014-02-26 Thread Tathagata Das
Currently, all in-built DStream operation is time-based windowing. We may provide count-based windowing in the future. On Wed, Feb 26, 2014 at 9:34 AM, Adrian Mocanu wrote: > Hi > > Is there a way to do window processing but not based on time but every 6 > items going through the stream? > > >