JFinis commented on code in PR #514:
URL: https://github.com/apache/parquet-format/pull/514#discussion_r2272735742


##########
src/main/thrift/parquet.thrift:
##########
@@ -1170,6 +1263,19 @@ struct ColumnIndex {
    * Such more compact values must still be valid values within the column's
    * logical type. Readers must make sure that list entries are populated 
before
    * using them by inspecting null_pages.
+   * For columns of physical type FLOAT or DOUBLE, or logical type FLOAT16,
+   * NaN values are not to be included in these bounds. If all non-null values
+   * of a page are NaN, then a writer must do the following:
+   * - If the order of this column is TypeDefinedOrder, then no column index
+   *   must be written for this column chunk. While this is unfortunate for
+   *   performance, it is necessary to avoid conflict with legacy files that
+   *   still included NaN in min_values and max_values even if the page had
+   *   non-NaN values. To mitigate this, IEEE754_TOTAL_ORDER is recommended.
+   * - If the order of this column is IEEE754_TOTAL_ORDER, then min_values[i]
+   * * If IEEE754_TOTAL_ORDER is used for the column and all non-null values
+   *   of a page are NaN, then min_values[i] and max_values[i] must be set to
+   *   the smallest and largest NaN value contained in the page, as defined
+   *   by the IEEE 754 total order.

Review Comment:
   My intentaion was here that if we already use an order that orders NaNs in a 
well defined manner, we can also use it. I agree we could also settle for "any 
NaN values is okay in this case", but that is somewhat going against the spirit 
of IEEE total order.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to