In general working out minimum or max of say prices (I do not know your use
case) is pretty straight forward.
For example
val maxValue = price.reduceByWindow((x:Double,y:Double) => if(x > y) x else
y,Seconds(windowLength), Seconds(slidingInterval))
maxValue.print()
The average values are running
Hi Matthias,
Say with the following
you have
"Batch interval" is the basic interval at which the system with receive the
data in batches.
val ssc = new StreamingContext(sparkConf, Seconds(n))
// window length - The duration of the window below that must be multiple
of batch interval n in = > Str
[+spark list again] (I did not want to send "commercial spam" to the list :-))
The reduce function for CMSs is element-wise addition, and the reverse
reduce function is element-wise subtraction.
The heavy hitters list does not have monoid addition, but you can
cheat. I suggest creating a heavy hi
I am not aware of any open source examples. If you search for usages
of stream-lib or Algebird, you might be lucky. Twitter uses CMSs, so
they might have shared some code or presentation.
We created a proprietary prototype of the solution I described, but I
am not at liberty to share code.
We did
Lar, can you please point to an example?
On Mar 23, 2016 2:16 AM, "Lars Albertsson" wrote:
> @Jatin, I touched that case briefly in the linked presentation.
>
> You will have to decide on a time slot size, and then aggregate slots
> to form windows. E.g. if you select a time slot of an hour, you
@Jatin, I touched that case briefly in the linked presentation.
You will have to decide on a time slot size, and then aggregate slots
to form windows. E.g. if you select a time slot of an hour, you build
a CMS and a heavy hitter list for the current hour slot, and start new
ones at 00 minutes. In
I am sorry, the signature of compare() is different. It should be:
implicit val order = new scala.Ordering[(String, Long)] {
override def compare(a1: (String, Long), a2: (String, Long)): Int = {
a1._2.compareTo(a2._2)
}
}
--
Thanks
Jatin
On Tue, Mar 22, 2016 at 5:53 PM, J
Hello Yakubovich,
I have been looking into a similar problem. @Lars please note that he wants
to maintain the top N products over a sliding window, whereas the
CountMinSketh algorithm is useful if we want to maintain global top N
products list. Please correct me if I am wrong here.
I tried using
Hi Alexy,
We are also trying to solve similar problems using approximation. Would
like to hear more about your usage. We can discuss this offline without
boring others. :)
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)
https://in.linkedin.com/in/rishiteshmishra
On Tue, Mar
Hi,
If you can accept approximate top N results, there is a neat solution
for this problem: Use an approximate Map structure called
Count-Min Sketch, in combination with a list of the M top items, where
M > N. When you encounter an item not in the top M, you look up its
count in the Count-Min Sket
Understood. Thanks for your great help
Cheers
Guillaume
On 2 July 2015 at 23:23, Feynman Liang wrote:
> Consider an example dataset [a, b, c, d, e, f]
>
> After sliding(3), you get [(a,b,c), (b,c,d), (c, d, e), (d, e, f)]
>
> After zipWithIndex: [((a,b,c), 0), ((b,c,d), 1), ((c, d, e), 2), ((d,
Consider an example dataset [a, b, c, d, e, f]
After sliding(3), you get [(a,b,c), (b,c,d), (c, d, e), (d, e, f)]
After zipWithIndex: [((a,b,c), 0), ((b,c,d), 1), ((c, d, e), 2), ((d, e,
f), 3)]
After filter: [((a,b,c), 0), ((d, e, f), 3)], which is what I'm assuming
you want (non-overlapping bu
Well it did reduce the length of my serie of events. I will have to dig
what it did actually ;-)
I would assume that it took one out of 3 value, is that correct ?
Would it be possible to control a bit more how the value assigned to the
bucket is computed for example take the first element, the min
How about:
events.sliding(3).zipWithIndex.filter(_._2 % 3 == 0)
That would group the RDD into adjacent buckets of size 3.
On Thu, Jul 2, 2015 at 2:33 PM, tog wrote:
> Was complaining about the Seq ...
>
> Moved it to
> val eventsfiltered = events.sliding(3).map(s => Event(s(0).time,
> (s(0).x
Was complaining about the Seq ...
Moved it to
val eventsfiltered = events.sliding(3).map(s => Event(s(0).time,
(s(0).x+s(1).x+s(2).x)/3.0 (s(0).vztot+s(1).vztot+s(2).vztot)/3.0))
and that is working.
Anyway this is not what I wanted to do, my goal was more to implement
bucketing to shorten the
What's the error you are getting?
On Thu, Jul 2, 2015 at 9:37 AM, tog wrote:
> Hi
>
> Sorry for this scala/spark newbie question. I am creating RDD which
> represent large time series this way:
> val data = sc.textFile("somefile.csv")
>
> case class Event(
> time: Double,
> x:
Hello Sanjay,
Yes, your understanding of lazy semantics is correct. But ideally
every batch should read based on the batch interval provided in the
StreamingContext. Can you open a JIRA on this?
On Mon, Mar 24, 2014 at 7:45 AM, Sanjay Awatramani
wrote:
> Hi All,
>
> I found out why this problem
Hi All,
I found out why this problem exists. Consider the following scenario:
- a DStream is created from any source. (I've checked with file and socket)
- No actions are applied to this DStream
- Sliding Window operation is applied to this DStream and an action is applied
to the sliding window.
18 matches
Mail list logo