Re: sliding Top N window

2016-03-27 Thread Lars Albertsson
[+spark list again] (I did not want to send "commercial spam" to the list :-)) The reduce function for CMSs is element-wise addition, and the reverse reduce function is element-wise subtraction. The heavy hitters list does not have monoid addition, but you can cheat. I suggest creating a heavy hi

Re: sliding Top N window

2016-03-24 Thread Lars Albertsson
I am not aware of any open source examples. If you search for usages of stream-lib or Algebird, you might be lucky. Twitter uses CMSs, so they might have shared some code or presentation. We created a proprietary prototype of the solution I described, but I am not at liberty to share code. We did

Re: sliding Top N window

2016-03-22 Thread Jatin Kumar
Lar, can you please point to an example? On Mar 23, 2016 2:16 AM, "Lars Albertsson" wrote: > @Jatin, I touched that case briefly in the linked presentation. > > You will have to decide on a time slot size, and then aggregate slots > to form windows. E.g. if you select a time slot of an hour, you

Re: sliding Top N window

2016-03-22 Thread Lars Albertsson
@Jatin, I touched that case briefly in the linked presentation. You will have to decide on a time slot size, and then aggregate slots to form windows. E.g. if you select a time slot of an hour, you build a CMS and a heavy hitter list for the current hour slot, and start new ones at 00 minutes. In

Re: sliding Top N window

2016-03-22 Thread Jatin Kumar
I am sorry, the signature of compare() is different. It should be: implicit val order = new scala.Ordering[(String, Long)] { override def compare(a1: (String, Long), a2: (String, Long)): Int = { a1._2.compareTo(a2._2) } } -- Thanks Jatin On Tue, Mar 22, 2016 at 5:53 PM, J

Re: sliding Top N window

2016-03-22 Thread Jatin Kumar
Hello Yakubovich, I have been looking into a similar problem. @Lars please note that he wants to maintain the top N products over a sliding window, whereas the CountMinSketh algorithm is useful if we want to maintain global top N products list. Please correct me if I am wrong here. I tried using

Re: sliding Top N window

2016-03-21 Thread Rishi Mishra
Hi Alexy, We are also trying to solve similar problems using approximation. Would like to hear more about your usage. We can discuss this offline without boring others. :) Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra On Tue, Mar

Re: sliding Top N window

2016-03-21 Thread Lars Albertsson
Hi, If you can accept approximate top N results, there is a neat solution for this problem: Use an approximate Map structure called Count-Min Sketch, in combination with a list of the M top items, where M > N. When you encounter an item not in the top M, you look up its count in the Count-Min Sket