Re: DataFrame API and Ordering

Maciej Szymkiewicz Fri, 19 Feb 2016 16:10:20 -0800

I am not sure. Spark SQL, DataFrames and Datasets Guide already has a
section about NaN semantics. This could be a good place to add at least
some basic description.


For the rest InterpretedOrdering could be a good choice.

On 02/19/2016 12:35 AM, Reynold Xin wrote:
> You are correct and we should document that.
>
> Any suggestions on where we should document this? In DoubleType and
> FloatType?
>
> On Tuesday, February 16, 2016, Maciej Szymkiewicz
> <mszymkiew...@gmail.com <mailto:mszymkiew...@gmail.com>> wrote:
>
>     I am not sure if I've missed something obvious but as far as I can
>     tell
>     DataFrame API doesn't provide a clearly defined ordering rules
>     excluding
>     NaN handling. Methods like DataFrame.sort or sql.functions like min /
>     max provide only general description. Discrepancy between
>     functions.max
>     (min) and GroupedData.max where the latter one supports only numeric
>     makes current situation even more confusing. With growing number of
>     orderable types I believe that documentation should clearly define
>     ordering rules including:
>
>     - NULL behavior
>     - collation
>     - behavior on complex types (structs, arrays)
>
>     While this information can extracted from the source it is not easily
>     accessible and without explicit specification it is not clear if
>     current
>     behavior is contractual. It can be also confusing if user expects an
>     order depending on a current locale (R).
>
>     Best,
>     Maciej
>

signature.asc
Description: OpenPGP digital signature

Re: DataFrame API and Ordering

Reply via email to