Yes, I meant batch interval. Thanks for clarifying.
Cheers,
Michael
On Oct 7, 2014, at 11:14 PM, jayant [via Apache Spark User List]
wrote:
> Hi Michael,
>
> I think you are meaning batch interval instead of windowing. It can be
> helpful for cases when you do not want to process very smal
Hi Michael,
I think you are meaning batch interval instead of windowing. It can be
helpful for cases when you do not want to process very small batch sizes.
HDFS sink in Flume has the concept of rolling files based on time, number
of events or size.
https://flume.apache.org/FlumeUserGuide.html#hd
Hi Andrew,
The use case I have in mind is batch data serialization to HDFS, where sizing
files to a certain HDFS block size is desired. In my particular use case, I
want to process 10GB batches of data at a time. I'm not sure this is a sensible
use case for spark streaming, and I was trying to
Hi Michael,
I couldn't find anything in Jira for it --
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20%22window%22%20AND%20component%20%3D%20Streaming
Could you or Adrian please file a Jira ticket explaining the functionality
and maybe a proposed API? This wi
Hi,
I also have a use for count-based windowing. I'd like to process data
batches by size as opposed to time. Is this feature on the development
roadmap? Is there a JIRA ticket for it?
Thank you,
Michael
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/win
go by window count like moving average.
>
>
>
> Thanks
>
> A
>
>
>
> *From:* Tathagata Das [mailto:tathagata.das1...@gmail.com]
> *Sent:* February-26-14 2:05 PM
> *To:* user@spark.apache.org
> *Cc:* u...@spark.incubator.apache.org
> *Subject:* Re: window every
: u...@spark.incubator.apache.org
Subject: Re: window every n elements instead of time based
Currently, all in-built DStream operation is time-based windowing. We may
provide count-based windowing in the future.
On Wed, Feb 26, 2014 at 9:34 AM, Adrian Mocanu
mailto:amoc...@verticalscope.com
Currently, all in-built DStream operation is time-based windowing. We may
provide count-based windowing in the future.
On Wed, Feb 26, 2014 at 9:34 AM, Adrian Mocanu wrote:
> Hi
>
> Is there a way to do window processing but not based on time but every 6
> items going through the stream?
>
>
>