Re: FrequentItems in spark-sql-execution-stat

Koert Kuipers Fri, 31 Jul 2015 11:45:06 -0700

this looks like a mistake in FrequentItems to me. if the map is full
(map.size==size) then it should still add the new item (after removing
items from the map and decrementing counts).


if its not a mistake then at least it looks to me like the algo is
different than described in the paper. is this maybe on purpose?

On Thu, Jul 30, 2015 at 4:26 PM, Yucheng <yl2...@nyu.edu> wrote:

> Hi all,
>
> I'm reading the code in spark-sql-execution-stat-FrequentItems.scala, and
> I'm a little confused about the "add" method in the FreqItemCounter class.
> Please see the link here:
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/stat/FrequentItems.scala
>
> My question is when the baseMap does not contain the key, and the size of
> the baseMap is not less than size, why should we just keep the key/value
> pairs whose value is greater than count?
>
> Just like this example:
> Now the baseMap is Map(1 -> 3, 2 -> 3, 3 -> 4), and the size is 3. I want
> to
> add Map(4 -> 25) into this baseMap, so it will retain the key/values whose
> value is greater than 25, and in that way, the baseMap will be null.
> However, I think we should at least add 4 -> 25 into the baseMap. Could
> anybody help me with this problem?
>
> Best,
> Yucheng
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/FrequentItems-in-spark-sql-execution-stat-tp13527.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: FrequentItems in spark-sql-execution-stat

Reply via email to