this looks like a mistake in FrequentItems to me. if the map is full (map.size==size) then it should still add the new item (after removing items from the map and decrementing counts).
if its not a mistake then at least it looks to me like the algo is different than described in the paper. is this maybe on purpose? On Thu, Jul 30, 2015 at 4:26 PM, Yucheng <yl2...@nyu.edu> wrote: > Hi all, > > I'm reading the code in spark-sql-execution-stat-FrequentItems.scala, and > I'm a little confused about the "add" method in the FreqItemCounter class. > Please see the link here: > > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/FrequentItems.scala > > My question is when the baseMap does not contain the key, and the size of > the baseMap is not less than size, why should we just keep the key/value > pairs whose value is greater than count? > > Just like this example: > Now the baseMap is Map(1 -> 3, 2 -> 3, 3 -> 4), and the size is 3. I want > to > add Map(4 -> 25) into this baseMap, so it will retain the key/values whose > value is greater than 25, and in that way, the baseMap will be null. > However, I think we should at least add 4 -> 25 into the baseMap. Could > anybody help me with this problem? > > Best, > Yucheng > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/FrequentItems-in-spark-sql-execution-stat-tp13527.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >