Thanks for pinging me.
Seems to me we should not make assumption about the value of
spark.sql.execution.topKSortFallbackThreshold config. Once it is changed,
the global sort + limit can produce wrong result for now. I will make a PR
for this.
cloud0fan wrote
> + Liang-Chi and Herman,
>
> I th
Sean, Thank you.
Do you think, tempDF.orderBy($"invoice_id".desc).limit(100)
this can give same result , I think so.
Thanks
On Wed, Sep 5, 2018 at 12:58 AM Sean Owen wrote:
> Sort and take head(n)?
>
> On Tue, Sep 4, 2018 at 12:07 PM Chetan Khatri
> wrote:
>
>> Dear Spark dev, anything equival
+ Liang-Chi and Herman,
I think this is a common requirement to get top N records. For now we
guarantee it by the `TakeOrderedAndProject` operator. However, this
operator may not be used if the
spark.sql.execution.topKSortFallbackThreshold config has a small value.
Shall we reconsider
https://git
Thanks
On Wed 5 Sep, 2018, 2:15 AM Russell Spitzer,
wrote:
> RDD: Top
>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@top(num:Int)(implicitord:Ordering[T]):Array[T
> ]
> Which is pretty much what Sean suggested
>
> For Dataframes I think doing a order and li
RDD: Top
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@top(num:Int)(implicitord:Ordering[T]):Array[T
]
Which is pretty much what Sean suggested
For Dataframes I think doing a order and limit would be equivalent after
optimizations.
On Tue, Sep 4, 2018 at 2:28 P
Sort and take head(n)?
On Tue, Sep 4, 2018 at 12:07 PM Chetan Khatri
wrote:
> Dear Spark dev, anything equivalent in spark ?
>