Re: Select top (100) percent equivalent in spark

2018-09-05 Thread Liang-Chi Hsieh
Thanks for pinging me. Seems to me we should not make assumption about the value of spark.sql.execution.topKSortFallbackThreshold config. Once it is changed, the global sort + limit can produce wrong result for now. I will make a PR for this. cloud0fan wrote > + Liang-Chi and Herman, > > I th

Re: Select top (100) percent equivalent in spark

2018-09-05 Thread Chetan Khatri
Sean, Thank you. Do you think, tempDF.orderBy($"invoice_id".desc).limit(100) this can give same result , I think so. Thanks On Wed, Sep 5, 2018 at 12:58 AM Sean Owen wrote: > Sort and take head(n)? > > On Tue, Sep 4, 2018 at 12:07 PM Chetan Khatri > wrote: > >> Dear Spark dev, anything equival

Re: Select top (100) percent equivalent in spark

2018-09-04 Thread Wenchen Fan
+ Liang-Chi and Herman, I think this is a common requirement to get top N records. For now we guarantee it by the `TakeOrderedAndProject` operator. However, this operator may not be used if the spark.sql.execution.topKSortFallbackThreshold config has a small value. Shall we reconsider https://git

Re: Select top (100) percent equivalent in spark

2018-09-04 Thread Chetan Khatri
Thanks On Wed 5 Sep, 2018, 2:15 AM Russell Spitzer, wrote: > RDD: Top > > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@top(num:Int)(implicitord:Ordering[T]):Array[T > ] > Which is pretty much what Sean suggested > > For Dataframes I think doing a order and li

Re: Select top (100) percent equivalent in spark

2018-09-04 Thread Russell Spitzer
RDD: Top http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@top(num:Int)(implicitord:Ordering[T]):Array[T ] Which is pretty much what Sean suggested For Dataframes I think doing a order and limit would be equivalent after optimizations. On Tue, Sep 4, 2018 at 2:28 P

Re: Select top (100) percent equivalent in spark

2018-09-04 Thread Sean Owen
Sort and take head(n)? On Tue, Sep 4, 2018 at 12:07 PM Chetan Khatri wrote: > Dear Spark dev, anything equivalent in spark ? >