I am not sure. Spark SQL, DataFrames and Datasets Guide already has a section about NaN semantics. This could be a good place to add at least some basic description.
For the rest InterpretedOrdering could be a good choice. On 02/19/2016 12:35 AM, Reynold Xin wrote: > You are correct and we should document that. > > Any suggestions on where we should document this? In DoubleType and > FloatType? > > On Tuesday, February 16, 2016, Maciej Szymkiewicz > <mszymkiew...@gmail.com <mailto:mszymkiew...@gmail.com>> wrote: > > I am not sure if I've missed something obvious but as far as I can > tell > DataFrame API doesn't provide a clearly defined ordering rules > excluding > NaN handling. Methods like DataFrame.sort or sql.functions like min / > max provide only general description. Discrepancy between > functions.max > (min) and GroupedData.max where the latter one supports only numeric > makes current situation even more confusing. With growing number of > orderable types I believe that documentation should clearly define > ordering rules including: > > - NULL behavior > - collation > - behavior on complex types (structs, arrays) > > While this information can extracted from the source it is not easily > accessible and without explicit specification it is not clear if > current > behavior is contractual. It can be also confusing if user expects an > order depending on a current locale (R). > > Best, > Maciej >
signature.asc
Description: OpenPGP digital signature