Putting aside for a moment the question of hashing -0 and +0, I wonder if this could be addressed by ordering floating point numbers using the totalOrder predicate, but when there is a NaN in a file, omit the field it is in from manifest_entry.data_file.{sort_columns, lower_bounds, upper_bounds}.
The logic here is that, though ham-fisted, this would also prevent engines from misinterpreting these fields. A natural follow-up question is, "should we populate these values in some other way less likely to be misinterpreted by compute engines?" IIRC, parquet's transition from {min,max} to {min_value,max_value} was motivated by an ambiguity or bug in the spec. This starts to get a bit arcane, but maybe we WANT a speed bump to stop engines from prune or search by using the non-total-order operators like <=. Thoughts?