Thanks for pinging me.
Seems to me we should not make assumption about the value of spark.sql.execution.topKSortFallbackThreshold config. Once it is changed, the global sort + limit can produce wrong result for now. I will make a PR for this. cloud0fan wrote > + Liang-Chi and Herman, > > I think this is a common requirement to get top N records. For now we > guarantee it by the `TakeOrderedAndProject` operator. However, this > operator may not be used if the > spark.sql.execution.topKSortFallbackThreshold config has a small value. > > Shall we reconsider > https://github.com/apache/spark/commit/5c27b0d4f8d378bd7889d26fb358f478479b9996 > ? Or we don't expect users to set a small value for > spark.sql.execution.topKSortFallbackThreshold? > > > On Wed, Sep 5, 2018 at 11:24 AM Chetan Khatri < > chetan.opensource@ > > > wrote: > >> Thanks >> >> On Wed 5 Sep, 2018, 2:15 AM Russell Spitzer, < > russell.spitzer@ > > >> wrote: >> >>> RDD: Top >>> >>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@top(num:Int)(implicitord:Ordering[T]):Array[T >>> ] >>> Which is pretty much what Sean suggested >>> >>> For Dataframes I think doing a order and limit would be equivalent after >>> optimizations. >>> >>> On Tue, Sep 4, 2018 at 2:28 PM Sean Owen < > srowen@ > > wrote: >>> >>>> Sort and take head(n)? >>>> >>>> On Tue, Sep 4, 2018 at 12:07 PM Chetan Khatri < >>>> > chetan.opensource@ >> wrote: >>>> >>>>> Dear Spark dev, anything equivalent in spark ? >>>>> >>>> -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org