Re: Clarification on sorting floating-point numbers

2025-02-28 Thread Devin Smith
al order for > floating-point numbers on the Parquet side: [1][2]. > > [1] https://github.com/apache/parquet-format/pull/221 > [2] https://github.com/apache/parquet-format/pull/196 > > On Thu, Feb 27, 2025 at 4:24 AM Devin Smith > wrote: > >> The spec https:/

Clarification on sorting floating-point numbers

2025-02-26 Thread Devin Smith
The spec https://iceberg.apache.org/spec/#sorting says Sorting floating-point numbers should produce the following behavior: -NaN > < -Infinity < -value < -0 < 0 < value < Infinity < NaN. This aligns with > the implementation of Java floating-point types comparisons. As far as I know, this does

Re: pre-proposal: schema_id on DataFile

2025-02-18 Thread Devin Smith
to scan all metadata to > remove schemas. > > I think my preference is to instead include the highest field ID in the > schema used to write a file. That enables the `new_data_column` filter > logic above, but never requires keeping schemas around. > > As Fokko said, this probab

Re: pre-proposal: schema_id on DataFile

2025-02-14 Thread Devin Smith
n the Manifest Avro header: > https://iceberg.apache.org/spec/#manifests Also the schema itself is > stored there. Would that help your situation? I think this makes adding it > to the data file redundant. > > Kind regards, > Fokko > > Op vr 14 feb 2025 om 17:56 schreef De

pre-proposal: schema_id on DataFile

2025-02-14 Thread Devin Smith
I want to make sure I'm not missing something that already exists; otherwise, hoping to get a quick thumbs up / thumbs down on a potential proposal before spending more time on it. It would be nice to know what Iceberg schema a writer used (/assumed) when writing a DataFile. Oftentimes, this infor