advancedxy commented on issue #4590:
URL:
https://github.com/apache/datafusion-comet/issues/4590#issuecomment-4677860731
```
val dataFile = org.apache.iceberg.DataFiles
.builder(table.spec())
.withPath(sourceParquetFile.getAbsolutePath)
.withFormat(org.apache.iceberg.FileFormat.PARQUET)
.withFileSizeInBytes(sourceParquetFile.length())
.withRecordCount(1)
.build()
```
I think this might be the problem part. For Iceberg parquet files produced
query engines and iceberg connector(such as spark), the split offset is infer
and generated when committing, see [ref
1](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/io/DataWriter.java#L93).
The example data file metadata is manually created without split offsets. It
will split by the request split size rather using the actual row group split
offsets.
Anyway, this is still a valid data file, and the problem should be fixed in
the iceberg-rust side.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]