[+spark list again] (I did not want to send "commercial spam" to the list :-))
The reduce function for CMSs is element-wise addition, and the reverse
reduce function is element-wise subtraction.
The heavy hitters list does not have monoid addition, but you can
cheat. I suggest creating a heavy hi
I am not aware of any open source examples. If you search for usages
of stream-lib or Algebird, you might be lucky. Twitter uses CMSs, so
they might have shared some code or presentation.
We created a proprietary prototype of the solution I described, but I
am not at liberty to share code.
We did
Lar, can you please point to an example?
On Mar 23, 2016 2:16 AM, "Lars Albertsson" wrote:
> @Jatin, I touched that case briefly in the linked presentation.
>
> You will have to decide on a time slot size, and then aggregate slots
> to form windows. E.g. if you select a time slot of an hour, you
@Jatin, I touched that case briefly in the linked presentation.
You will have to decide on a time slot size, and then aggregate slots
to form windows. E.g. if you select a time slot of an hour, you build
a CMS and a heavy hitter list for the current hour slot, and start new
ones at 00 minutes. In
I am sorry, the signature of compare() is different. It should be:
implicit val order = new scala.Ordering[(String, Long)] {
override def compare(a1: (String, Long), a2: (String, Long)): Int = {
a1._2.compareTo(a2._2)
}
}
--
Thanks
Jatin
On Tue, Mar 22, 2016 at 5:53 PM, J
Hello Yakubovich,
I have been looking into a similar problem. @Lars please note that he wants
to maintain the top N products over a sliding window, whereas the
CountMinSketh algorithm is useful if we want to maintain global top N
products list. Please correct me if I am wrong here.
I tried using
Hi Alexy,
We are also trying to solve similar problems using approximation. Would
like to hear more about your usage. We can discuss this offline without
boring others. :)
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)
https://in.linkedin.com/in/rishiteshmishra
On Tue, Mar
Hi,
If you can accept approximate top N results, there is a neat solution
for this problem: Use an approximate Map structure called
Count-Min Sketch, in combination with a list of the M top items, where
M > N. When you encounter an item not in the top M, you look up its
count in the Count-Min Sket